Is it possible to move substrings to a specific location with RegEx?

Is it possible to move substrings to a specific location with RegEx? - javascript

Background: I used quill.js to get some rich text input. The result I want is quite similar to HTML so I went with the quill.container.firstChild.innerHTML approach instead of actually serializing the data. But when it comes to anchor, instead of
Anchor
I actually want
Anchor{{link:test.html}}
With .replace() method I easily got {{link:test.html}}Anchor</a> but I need to put the link description after the Anchor text. Is there a way to swap {{link:test.html}} with the next </a> so I can get the desired result? There can be multiple anchors in the string, like:
str = 'This is a test. And another one here.'
I would like it to become:
str = 'This is a test{{link:test1.html}}. And another one{{link:test2.html}} here.'

You could also use dom methods. The dom is a better html parser than regex. This is a fairly simple replaceWith
str = 'This is a test. And another one here.'
var div = document.createElement('div');
div.innerHTML = str;
div.querySelectorAll('a').forEach(a=>{
a.replaceWith(`${a.textContent}{{link:${a.getAttribute('href')}}}`)
})
console.log(div.innerHTML)

Yes, you can use capture groups and placeholders in the replacement string, provided it really is in exactly the format you've shown:
const str = 'This is a test. And another one here.';
const result = str.replace(/<a href="([^"]+)">([^<]+)<\/a>/g, "$2{{link:$1}}");
console.log(result);
This is very fragile, which is why famously you don't use regular expressions to parse HTML. For instance, it would fail with this input string:
const str = 'This is a test <span>blah</span>. And another one here.';
...because of the <span>blah</span>.
But if the format is as simple and consistent as you appear to be getting from quill.js, you can apply a regular expression to it.
That said, if you're doing this on a browser or otherwise have a DOM parser available to you, use the DOM as charlietfl demonstrates;

Related

Extracting text that sits between two pairs of special characters

I am trying to extract a string from a sentence that is embedded within the HTML tags <b></b> that are also embedded within parenthesis ( ).
I can do this with the following code
const regExp = /\(([^)]+)\)/
// fetches the string within parentheses
let string = regExp.exec('This is some (<b>super cool</b>) text I have here')
// output = '<b>super cool</b>
// removes the html tags
let string2 = string.replace(/<[^>]*>?/gm, '')
// output = 'super cool'
The problem is I sometimes have sentences with multiple sets of parentheses. The code above will only extract the first instance of parentheses, and they may or may not be within the <b></b> tags
i.e., the string
This is (some) (<b>super cool</b>) text I have (here)
will return some using the same code above, but what I want is to return super cool
How can I traverse the entire string to extract only the text that sits within (<b> and </b>)?
EDIT
I forgot to mention (apologies), there may be text that comes in between the closing tag </b> and the closing parenthesis ). For example
This is some (<b>super cool</b> groovy) text I have here
Which adds a bit of complexity (otherwise I could use split() and pop()

You could use this regExp instead: /(?<=\(<b>)(.*?)(?=<\/b>\))/ which will capture everything between the first (<b> and </b>) encountered.
If you want to capture all instances, just add the global flag /g : /(?<=\(<b>)(.*?)(?=<\/b>\))/g
Also with this method you won't need to do a string.replace() afterwards, saving you another operation.
const regExp = /(?<=\(<b>)(.*?)(?=<\/b>\))/
const str = 'This is some (<b>super cool</b>) text I have here'
console.log(str.match(regExp)[0])
// --> super cool
EDIT: Following OP's edit, if some text can come between the closing tag </b> and the closing ), just change your regExp to: /(?<=\(<b>)(.*?)(?=\))/, which will capture everything between the first (<b> and ) encountered.
But then you will also need to string.replace('</b>', '') to remove the closing </b> tag.
const regExp = /(?<=\(<b>)(.*?)(?=\))/
const str = 'This is some (<b>super cool</b> groovy) text I have here'
console.log(str.match(regExp)[0].replace('</b>', ''))
// --> super cool groovy

This works for me try like this instead of regex use split
const string = 'This is (some) (<b>super cool</b>) text I have (here)';
const str = string.split('<b>').pop().split('</b>')[0];
console.log(str);

Replace with RegExp only outside tags in the string

I have a strings where some html tags could present, like
this is a nice day for bowling <b>bbbb</b>
how can I replace with RegExp all b symbols, for example, with :blablabla: (for example) but ONLY outside html tags?
So in that case the resulting string should become
this is a nice day for :blablabla:owling <b>bbbb</b>
EDIT: I would like to be more specific, based on the answers I have received. So first of all I have just a string, not DOM element, or anything else. The string may or may not contain tags (opening and closing). The main idea is to be able to replace anywhere in the text except inside tags. For example if I have a string like
not feeling well today :/ check out this link http://example.com
the regexp should replace only first :/ with real smiley image, but should not replace second and third, because they are inside (and part of) tag. Here's an example snippet using the regexp from one of the answer.
var s = 'not feeling well today :/ check out this link http://example.com';
var replaced = s.replace(/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g, "smiley_image_here");
document.querySelector("pre").textContent = replaced;
<pre></pre>
It is strange but the DEMO shows that it captured the correct group, but the same regexp in replace function seem not to be working.

The regex itself to replace all bs with :blablabla: is not that hard:
.replace(/b/g, ":blablabla:")
It is a bit tricky to get the text nodes where we need to perform search and replace.
Here is a DOM-based example:
function replaceTextOutsideTags(input) {
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = input;
doc.appendChild( wrapper );
return textNodesUnder(doc);
}
function textNodesUnder(el){
var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode())
{
if (n.parentNode.nodeName.toLowerCase() === 'myelt')
n.nodeValue = n.nodeValue.replace(/:\/(?!\/)/g, "smiley_here");
}
return el.firstChild.innerHTML;
}
var s = 'not feeling well today :/ check out this link http://example.com';
console.log(replaceTextOutsideTags(s));
Here, we only modify the text nodes that are direct children of the custom-created element named myelt.
Result:
not feeling well today smiley_here check out this link http://example.com

var input = "this is a nice day for bowling <b>bbbb</b>";
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:')
+ c;
});
document.querySelector("pre").textContent = result;
<pre></pre>
You can do this:
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:') // you may do something else here
+ c;
});
Note that in most (no all but most) real complex use cases, it's much more convenient to manipulate a parsed DOM rather than just a string. If you're starting with a HTML page, you might use a library (some, like my one, accept regexes to do so).

I think you can use a regex like this : (Just for a simple data not a nested one)
/<[^\/]*?b.*?<\/.*?>|(b)/ig
[Regex Demo]
If you wanna use a regex I can suggest you use below regex to remove all tags recursively until all tags removed:
/<[^\/][^<]*>[^<]*<\/.*?>/g
then use a replace for finding any b.

Regular expression to match a string which is NOT matched by a given regexp

I've been hoving around by some answers here, and I can't find a solution to my problem:
I have this regexp which matches everyting inside an HTML span tag, including contents:
<span\b[^>]*>(.*?)</span>
and I want to find a way to make a search in all the text, except for what is matched with that regexp.
For example, if my text is:
var text = "...for there is a class of <span class="highlight">guinea</span> pigs which..."
... then the regexp would match:
<span class="highlight">guinea</span>
and I want to be able to make a regexp such that if I search for "class", regexp will match "...for there is a class of..."
and will not match inside the tag, like in
"... class="highlight"..."
The word to be matched ("class") might be anywhere within the text. I've tried
(?!<span\b[^>]*>(.*?)</span>)class
but it keeps searching inside tags as well.
I want to find a solution using only regexp, not dealing with DOM nor JQuery. Thanks in advance :).

Although I wouldn't recommend this, I would do something like below
(class)(?:(?=.*<span\b[^>]*>))|(?:(?<=<\/span>).*)(class)
You can see this in action here
Rubular Link for this regex
You can capture your matches from the groups and work with them as needed. If you can, use a HTML parser and then find matches from the text element.

It's not pretty, but if I get you right, this should do what you wan't. It's done with a single RegEx but js can't (to my knowledge) extract the result without joining the results in a loop.
The RegEx: /(?:<span\b[^>]*>.*?<\/span>)|(.)/g
Example js code:
var str = '...for there is a class of <span class="highlight">guinea</span> pigs which...',
pattern = /(?:<span\b[^>]*>.*?<\/span>)|(.)/g,
match,
res = '';
match = pattern.exec(str)
while( match != null )
{
res += match[1];
match = pattern.exec(str)
}
document.writeln('Result:' + res);
In English: Do a non capturing test against your tag-expression or capture any character. Do this globally to get the entire string. The result is a capture group for each character in your string, except the tag. As pointed out, this is ugly - can result in a serious number of capture groups - but gets the job done.
If you need to send it in and retrieve the result in one call, I'd have to agree with previous contributors - It can't be done!

Wrap single pattern occurring more than one time in a string using Regex and JavaScript

I have this string:
whatever [that's the pattern] by [pattern again] before [whatever].
I would love to wrap the brackets [] and everthing inside them in a span to be like:
whatever <span>[that's the pattern]</span> by </span>[pattern again]</span>
before <span>[whatever]</span>.
I have tried
re = new RegExp(/\[(.*)\]/);
var newString = oldString.replace(re, "<span>$1</span>");
But this returns:
whatever <span>[that's the pattern] by [pattern again] before [whatever]</span>.
Before anyone shoot me about why shouldn't I use regex to parse HTML, the reason I am doing this is because this string is echoed on the web page and the brackets and the text within has different color. I need to wrap them in a span to be able to style them. If there's a better solution I am open.
Many thanks!

Aside from setting the global flag, you also need to make your regex lazy:
var newString = oldString.replace(/\[(.*?)\]/g, "<span>[$1]</span>")
^ ^

You need to set the global flag, which tells it to match more than once:
oldString.replace(/\[(.*)\]/g, ...)

RegExp: how to exclude matched groups from $N?

I've made a working regexp, but i think it's not the best use-case:
el = '<div style="color:red">123</div>';
el.replace(/(<div.*>)(\d+)(<\/div>)/g, '$1<b>$2</b>$3');
// expecting result: <div style="color:red"><b>123</b></div>
After googling i've found that (?: ... ) in regexps - means ignoring group match, thus:
el.replace(/(?:<div.*>)(\d+)(?:<\/div>)/g, '<b>$1</b>');
// returns <b>123</b>
but i need an expecting result from 1st example.
Is there a way to exclude 'em? just to write replace(/.../, '<b>$1</b>')?
This is just a little case for understanding how to exclude groups in regexp. And i know, what we can't parse HTML with regexp :)

So you want to get the same result while only using the replacement <b>$1</b>?
In your case just replace(/\d+/, '<b>$&</b>') would suffice.
But if you want to make sure there are div tags around the number, you could use lookarounds and \K like in the following expression. Except that JS does not support lookbehind nor \K, so you're out of luck, you have to use a capturing group for that in JS.
<div[^>]*>\K\d+(?=</div>)

There nothing wrong with a replacement value of '$1<b>$2</b>$3'. I would just change your regex to this:
el = '<div style="color:red">123</div>';
el.replace(/(<div[^>]*>)(\d+)(<\/div>)/g, '$1<b>$2</b>$3');
Changing how it matches the first div keeps the full match on the div tags, but makes sure it matches the minimum possible before the closing > of the first div tag rather than the maximum possible.
With your regex, you would not get what you wanted with this input string:
el = '<div style="color:red">123</div><div style="color:red">456</div>';
The problem with using something like:
el.replace(/\d+/, '<b>$&</b>')
is that doesn't work properly with things like this:
el = '<div style="margin-left: 10px">123</div>'
because it picks up the numbers inside the div tag.

Develop Reference

JavaScript is the programming language of the Web.

Is it possible to move substrings to a specific location with RegEx? - javascript

Related

Extracting text that sits between two pairs of special characters

Replace with RegExp only outside tags in the string

Regular expression to match a string which is NOT matched by a given regexp

Wrap single pattern occurring more than one time in a string using Regex and JavaScript

RegExp: how to exclude matched groups from $N?

Categories

Resources