Replace with RegExp only outside tags in the string

Replace with RegExp only outside tags in the string - javascript

I have a strings where some html tags could present, like
this is a nice day for bowling <b>bbbb</b>
how can I replace with RegExp all b symbols, for example, with :blablabla: (for example) but ONLY outside html tags?
So in that case the resulting string should become
this is a nice day for :blablabla:owling <b>bbbb</b>
EDIT: I would like to be more specific, based on the answers I have received. So first of all I have just a string, not DOM element, or anything else. The string may or may not contain tags (opening and closing). The main idea is to be able to replace anywhere in the text except inside tags. For example if I have a string like
not feeling well today :/ check out this link http://example.com
the regexp should replace only first :/ with real smiley image, but should not replace second and third, because they are inside (and part of) tag. Here's an example snippet using the regexp from one of the answer.
var s = 'not feeling well today :/ check out this link http://example.com';
var replaced = s.replace(/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g, "smiley_image_here");
document.querySelector("pre").textContent = replaced;
<pre></pre>
It is strange but the DEMO shows that it captured the correct group, but the same regexp in replace function seem not to be working.

The regex itself to replace all bs with :blablabla: is not that hard:
.replace(/b/g, ":blablabla:")
It is a bit tricky to get the text nodes where we need to perform search and replace.
Here is a DOM-based example:
function replaceTextOutsideTags(input) {
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = input;
doc.appendChild( wrapper );
return textNodesUnder(doc);
}
function textNodesUnder(el){
var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode())
{
if (n.parentNode.nodeName.toLowerCase() === 'myelt')
n.nodeValue = n.nodeValue.replace(/:\/(?!\/)/g, "smiley_here");
}
return el.firstChild.innerHTML;
}
var s = 'not feeling well today :/ check out this link http://example.com';
console.log(replaceTextOutsideTags(s));
Here, we only modify the text nodes that are direct children of the custom-created element named myelt.
Result:
not feeling well today smiley_here check out this link http://example.com

var input = "this is a nice day for bowling <b>bbbb</b>";
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:')
+ c;
});
document.querySelector("pre").textContent = result;
<pre></pre>
You can do this:
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:') // you may do something else here
+ c;
});
Note that in most (no all but most) real complex use cases, it's much more convenient to manipulate a parsed DOM rather than just a string. If you're starting with a HTML page, you might use a library (some, like my one, accept regexes to do so).

I think you can use a regex like this : (Just for a simple data not a nested one)
/<[^\/]*?b.*?<\/.*?>|(b)/ig
[Regex Demo]
If you wanna use a regex I can suggest you use below regex to remove all tags recursively until all tags removed:
/<[^\/][^<]*>[^<]*<\/.*?>/g
then use a replace for finding any b.

Related

Using exec to find multiple matches fails when we are trying to replace with the match itself

I want to replace the sequence ab with a red ab. The example is taken from https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec. The problem I face is that when I use as replacement the ${myArray[0]} it fails while it works when I use as replacement the "XX", for example. Any idea anyone why it does not work? Any other recommendation is welcome. Thank you all in advance.
Some extra help for the community: The answer given is connected with Using $0 to refer to entire match in Javascript's String.replace. There, user166390 proposes $& instead of $0. And then it works fine !!!
<div id="Demonstration"></div>
<script>
// INPUT
// abbcdefabh
// WANTED OUTPUT
// XXcdefXXh
let myRe = /ab*/g;
let str = 'abbcdefabh';
let myArray;
while ((myArray = myRe.exec(str)) !== null) {
// IT WORKS !!!
str = str.replace(`${myArray[0]}`, "XX");
// IT DOES NOT WORK !!!
str = str.replace(`${myArray[0]}`, `<span style = "color:red;">${myArray[0]}</span>`);
}
document.getElementById('Demonstration').innerHTML = str;
</script>

The problem I face is that when I use as replacement the ${myArray[0]} it fails
That's because you are changing the string while searching it. You are keeping the ab* occurrences and since their positions changed, the search will continue to match them and the string grows forever. Try debugging with a breakpoint in the loop and watch the str and myRe.lastIndex/myArray.index expressions.
I would like to try it exclusively with the exec().
Don't. The proper tool for this job is just replace, with a replacer string:
….innerHTML = str.replace(myRe, '<span style = "color:red;">$0</span>');

Is it possible to move substrings to a specific location with RegEx?

Background: I used quill.js to get some rich text input. The result I want is quite similar to HTML so I went with the quill.container.firstChild.innerHTML approach instead of actually serializing the data. But when it comes to anchor, instead of
Anchor
I actually want
Anchor{{link:test.html}}
With .replace() method I easily got {{link:test.html}}Anchor</a> but I need to put the link description after the Anchor text. Is there a way to swap {{link:test.html}} with the next </a> so I can get the desired result? There can be multiple anchors in the string, like:
str = 'This is a test. And another one here.'
I would like it to become:
str = 'This is a test{{link:test1.html}}. And another one{{link:test2.html}} here.'

You could also use dom methods. The dom is a better html parser than regex. This is a fairly simple replaceWith
str = 'This is a test. And another one here.'
var div = document.createElement('div');
div.innerHTML = str;
div.querySelectorAll('a').forEach(a=>{
a.replaceWith(`${a.textContent}{{link:${a.getAttribute('href')}}}`)
})
console.log(div.innerHTML)

Yes, you can use capture groups and placeholders in the replacement string, provided it really is in exactly the format you've shown:
const str = 'This is a test. And another one here.';
const result = str.replace(/<a href="([^"]+)">([^<]+)<\/a>/g, "$2{{link:$1}}");
console.log(result);
This is very fragile, which is why famously you don't use regular expressions to parse HTML. For instance, it would fail with this input string:
const str = 'This is a test <span>blah</span>. And another one here.';
...because of the <span>blah</span>.
But if the format is as simple and consistent as you appear to be getting from quill.js, you can apply a regular expression to it.
That said, if you're doing this on a browser or otherwise have a DOM parser available to you, use the DOM as charlietfl demonstrates;

Trimming whitespace without affecting strings

So, I recently found this example on trimming whitespace, but I've found that it also affects strings in code. For instance, say I'm doing a lesson on string comparison, and to demonstrate that "Hello World!" and "Hello World!" are different, I need the code compression to not have any effect on those two strings.
I'm using the whitespace compression so that people with different formatting styles won't be punished for using something that I don't use. For instance, I like to format my functions like this:
function foo(){
return 0;
};
While others may format it like this:
function foo()
{
return 0;
};
So I use whitespace compression around punctuation to make sure it always comes out the same, but I don't want it to affect anything within a string. Is there a way to add exceptions in JavaScript's replace() function?

UPDATE:
check this jsfiddle
var str='dfgdfg fdgfd fd gfd g print("Hello World!"); sadfds dsfgsgdf'
var regex=/(?:(".*"))|(\s+)/g;
var newStr=str.replace(regex, '$1 ');
console.log(newStr);
console.log(str);
In this code it will process everything except the quoted strings
to play with the code more comfortably you can see how the regex is working :
https://regex101.com/r/tG5qH2/1

I made a jsfiddle here: https://jsfiddle.net/cuywha8t/2/
var stringSplitRegExp = /(".+?"|'.+?')/g;
var whitespaceRegExp = /\s+\{/g;
var whitespaceReplacement = "{"
var exampleCode = `var str = "test test test" + 'asdasd "sd"';\n`+
`var test2 = function()\n{\nconsole.log("This is a string with 'single quotes'")\n}\n`+
`console.log('this is a string with "double quotes"')`;
console.log(exampleCode)
var separatedStrings =(exampleCode.split(stringSplitRegExp))
for(var i = 0; i < separatedStrings.length; i++){
if (i%2 === 1){
continue;
}
var oldString = separatedStrings[i];
separatedStrings[i] = oldString.replace(whitespaceRegExp, whitespaceReplacement)
}
console.log(separatedStrings.join(""))
I believe this is what you are looking for. it handles cases where a string contains the double quotes, etc. without modifying. This example just does the formatting of the curly-braces as you mentioned in your post.
Basically, the behavior of split allows the inclusion of the splitter in the array. And since you know the split is always between two non-string elements you can leverage this by looping over and modifying only every even-indexed array element.
If you want to do general whitespace replacement you can of course modify the regexp or do multiple passes, etc.

How do I add prevent my regex from parsing certain parts of a string?

As per this answer, I am making use of a replaceAll() function to swap arbitrary strings in my javascript (Node) application:
this.replaceAll = function(str1, str2, ignoreCase)
{
return this.replace(new RegExp(str1.replace(/([\/\,\!\\\^\$\{\}\[\]\(\)\.\*\+\?\|\<\>\-\&])/g,"\\$&"),(ignoreCase?"gi":"g")),(typeof(str2)=="string")?str2.replace(/\$/g,"$$$$"):str2);
}
I would like to expand this regex, such that it does not try to match anything inside a set of <span>...</span> tags. (I need to add HTML Span tags around parts of some strings, and I don't want to wrap anything in a span twice, if the patterns would duplicate [I.e., 'Foo' and 'Foobar' in the string "Foobarbaz"])
I am running multiple regex search/replaces on a single string, and I want to make sure that nothing gets processed multiple times.
My understanding is that I'll need a [<SPAN>] ... [</SPAN>] somehow, but I'm not quite sure on the specifics. Any advice?

I don't understand what your regexp does, but in general the technique is like this:
html = "replace this and this but not <span> this one </span> or <b>this</b> but this is fine"
// remove parts you don't want to be touched and place them in a buffer
tags = []
html = html.replace(/<(\w+).+?<\/\1>/g, function($0) {
tags.push($0)
return '##' + (tags.length - 1)
})
// do the actual replacement
html = html.replace(/this/g, "that")
// put preserved parts back
html = html.replace(/##(\d+)/g, function($0, $1) {
return tags[$1]
})

JavaScript RegEx to match punctuation NOT part of any HTML tags

Okay, I know there's much controversy with matching and parsing HTML within a RegEx, but I was wondering if I could have some help. Case and Point.
I need to match any punctuation characters e.g . , " ' but I don't want to ruin any HTML, so ideally it should occur between a > and a < - essentially my query isn't so much about parsing HTML, as avoiding it.
I'm going to attempt to replace wrap each instance in a <span></span> - but having absolutely no experience in RegEx, I'm not sure I'm able to do it.
I've figured character sets [\.\,\'\"\?\!] but I'm not sure how to match character sets that only occur between certain characters. Can anybody help?

To start off, here's a X-browser dom-parser function:
var parseXML = (function(w,undefined)
{
'use strict';
var parser,ie = false;
switch (true)
{
case w.DOMParser !== undefined:
parser = new w.DOMParser();
break;
case new w.ActiveXObject("Microsoft.XMLDOM") !== undefined:
parser = new w.ActiveXObject("Microsoft.XMLDOM");
parser.async = false;
ie = true;
break;
default :
throw new Error('No parser found');
}
return function(xmlString)
{
if (ie === true)
{//return DOM
parser.loadXML(xmlString);
return parser;
}
return parser.parseFromString(xmlString,'text/xml');
};
})(this);
//usage:
var newDom = parseXML(yourString);
var allTags = newDom.getElementsByTagName('*');
for(var i=0;i<allTags.length;i++)
{
if (allTags[i].tagName.toLowerCase() === 'span')
{//if all you want to work with are the spans:
if (allTags[i].hasChildNodes())
{
//this span has nodes inside, don't apply regex:
continue;
}
allTags[i].innerHTML = allTags[i].innerHTML.replace(/[.,?!'"]+/g,'');
}
}
This should help you on your way. You still have access to the DOM, so whenever you find a string that needs filtering/replacing, you can reference the node using allTags[i] and replace the contents.Note that looping through all elements isn't to be recommended, but I didn't really feel like doing all of the work for you ;-). You'll have to check what kind of node you're handling:
if (allTags[i].tagName.toLowerCase() === 'span')
{//do certain things
}
if (allTags[i].tagName.toLowerCase() === 'html')
{//skip
continue;
}
And that sort of stuff...Note that this code is not tested, but it's a simplified version of my answer to a previous question. The parser-bit should work just fine, in fact here's a fiddle I've set up for that other question, that also shows you how you might want to alter this code to better suite your needs

Edit As Elias pointed out, native JScript doesn't support the lookaheads. I'll leave this up in case someone else looks for something similar, just be aware.
Here is the regex I got to work, it requires lookaheads and lookbehinds and I'm not familiar enough with Javascript to know if those are supported or not. Either way, here is the regex:
(?<=>.*?)[,."'](?=.*<)
Breakdown:
1. (?<=>.*?) --> The match(es) must have ">" followed by any characters
2. [,."'] --> Matches for the characters: , . " '
3. (?=.*<) --> The match(es) must have any characters then "<" before it
This essentially means it will match any of the characters you want in between a set of > <.
That being said, I would suggest as Point mentioned in the comments to parse the HTML with a tool designed for that, and search through the results with the regex [,."'].

Dan, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
The Dom parser solution was great. With all the disclaimers about using regex to parse html, I'd like to add a simple way to do what you wanted with regex in Javascript.
The regex is very simple:
<[^>]*>|([.,"'])
The left side of the alternation matches complete tags. We will ignore these matches. The right side matches and captures punctuation to Group 1, and we know they are the right punctuation because they were not matched by the expression on the left.
On this demo, looking at the lower right pane, you can see that only the right punctuation is captured to Group 1.
You said you wanted to embed the punctuation in a <span>. This Javascript code will do it.
I've replaced the <tags> with {tags} to make sure the example displays in the browser.
<script>
var subject = 'true ,she said. {tag \" . ,}';
var regex = /{[^}]*}|([.,"'])/g;
replaced = subject.replace(regex, function(m, group1) {
if (group1 == "" ) return m;
else return "<span>" + group1 + "</span>";
});
document.write(replaced);
</script>
Here's a live demo
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...

Develop Reference

JavaScript is the programming language of the Web.

Replace with RegExp only outside tags in the string - javascript

I think you can use a regex like this : (Just for a simple data not a nested one) /<[^\/]?b.?<\/.?>|(b)/ig [Regex Demo] If you wanna use a regex I can suggest you use below regex to remove all tags recursively until all tags removed: /<[^\/][^<]>[^<]<\/.?>/g then use a replace for finding any b.

Related

Using exec to find multiple matches fails when we are trying to replace with the match itself

Is it possible to move substrings to a specific location with RegEx?

Trimming whitespace without affecting strings

How do I add prevent my regex from parsing certain parts of a string?

JavaScript RegEx to match punctuation NOT part of any HTML tags

Categories

Resources