RegExp: how to exclude matched groups from $N?

RegExp: how to exclude matched groups from $N? - javascript

I've made a working regexp, but i think it's not the best use-case:
el = '<div style="color:red">123</div>';
el.replace(/(<div.*>)(\d+)(<\/div>)/g, '$1<b>$2</b>$3');
// expecting result: <div style="color:red"><b>123</b></div>
After googling i've found that (?: ... ) in regexps - means ignoring group match, thus:
el.replace(/(?:<div.*>)(\d+)(?:<\/div>)/g, '<b>$1</b>');
// returns <b>123</b>
but i need an expecting result from 1st example.
Is there a way to exclude 'em? just to write replace(/.../, '<b>$1</b>')?
This is just a little case for understanding how to exclude groups in regexp. And i know, what we can't parse HTML with regexp :)

So you want to get the same result while only using the replacement <b>$1</b>?
In your case just replace(/\d+/, '<b>$&</b>') would suffice.
But if you want to make sure there are div tags around the number, you could use lookarounds and \K like in the following expression. Except that JS does not support lookbehind nor \K, so you're out of luck, you have to use a capturing group for that in JS.
<div[^>]*>\K\d+(?=</div>)

There nothing wrong with a replacement value of '$1<b>$2</b>$3'. I would just change your regex to this:
el = '<div style="color:red">123</div>';
el.replace(/(<div[^>]*>)(\d+)(<\/div>)/g, '$1<b>$2</b>$3');
Changing how it matches the first div keeps the full match on the div tags, but makes sure it matches the minimum possible before the closing > of the first div tag rather than the maximum possible.
With your regex, you would not get what you wanted with this input string:
el = '<div style="color:red">123</div><div style="color:red">456</div>';
The problem with using something like:
el.replace(/\d+/, '<b>$&</b>')
is that doesn't work properly with things like this:
el = '<div style="margin-left: 10px">123</div>'
because it picks up the numbers inside the div tag.

Related

Replace multiple identical characters with a string

Using Javascript, I want to replace:
This is a test, please complete ____.
with:
This is a test, please complete %word%.
The number of underlines isn't consistent, so I cannot just use something like str.replace('_____', '%word%').
I've tried str.replace(/(_)*/g, '%word%') but it didn't work. Any suggestions?

Remove the capturing group, and make sure _ repeats with + (at least one occurrence, matches as many _s as possible):
const str = 'This is a test, please complete ____.';
console.log(
str.replace(/_+/g, '%word%')
);
The regular expression
/(_)*/
means, in plain language: match zero or more underscores, which of course isn't what you're looking for. That will match every position in the string (except positions in the string between underscores).

I'm going to suggest a slightly different approach to this. Instead of maintaining the sentence as you currently have it, instead maintain something like this:
This is the {$1} test, please complete {$2}.
When you want to render this sentence, use a regex replacement to replace the placeholders with underscores:
var sentence = "This is the {$1} test, please complete {$2}.";
var show = sentence.replace(/\{\$\d+\}/g, "____");
console.log(show);
When you want to replace a given placeholder, you may also use a targeted regex replacement. For example, to target the first placeholder you could use:
var sentence = "This is the {$1} test, please complete {$2}.";
var show = sentence.replace(/\{\$1\}/g, "first");
console.log(show);
This is a fairly robust and scalable solution, and is more accurate than just doing a single blanket replacement of all underscores.

Regular expression to match a string which is NOT matched by a given regexp

I've been hoving around by some answers here, and I can't find a solution to my problem:
I have this regexp which matches everyting inside an HTML span tag, including contents:
<span\b[^>]*>(.*?)</span>
and I want to find a way to make a search in all the text, except for what is matched with that regexp.
For example, if my text is:
var text = "...for there is a class of <span class="highlight">guinea</span> pigs which..."
... then the regexp would match:
<span class="highlight">guinea</span>
and I want to be able to make a regexp such that if I search for "class", regexp will match "...for there is a class of..."
and will not match inside the tag, like in
"... class="highlight"..."
The word to be matched ("class") might be anywhere within the text. I've tried
(?!<span\b[^>]*>(.*?)</span>)class
but it keeps searching inside tags as well.
I want to find a solution using only regexp, not dealing with DOM nor JQuery. Thanks in advance :).

Although I wouldn't recommend this, I would do something like below
(class)(?:(?=.*<span\b[^>]*>))|(?:(?<=<\/span>).*)(class)
You can see this in action here
Rubular Link for this regex
You can capture your matches from the groups and work with them as needed. If you can, use a HTML parser and then find matches from the text element.

It's not pretty, but if I get you right, this should do what you wan't. It's done with a single RegEx but js can't (to my knowledge) extract the result without joining the results in a loop.
The RegEx: /(?:<span\b[^>]*>.*?<\/span>)|(.)/g
Example js code:
var str = '...for there is a class of <span class="highlight">guinea</span> pigs which...',
pattern = /(?:<span\b[^>]*>.*?<\/span>)|(.)/g,
match,
res = '';
match = pattern.exec(str)
while( match != null )
{
res += match[1];
match = pattern.exec(str)
}
document.writeln('Result:' + res);
In English: Do a non capturing test against your tag-expression or capture any character. Do this globally to get the entire string. The result is a capture group for each character in your string, except the tag. As pointed out, this is ugly - can result in a serious number of capture groups - but gets the job done.
If you need to send it in and retrieve the result in one call, I'd have to agree with previous contributors - It can't be done!

Javascript Regex only replacing first match occurence

I am using regular expressions to do some basic converting of wiki markup code into copy-pastable plain text, and I'm using javascript to do the work.
However, javascript's regex engine behaves much differently to the ones I've used previously as well as the regex in Notepad++ that I use on a daily basis.
For example- given a test string:
==Section Header==
===Subsection 1===
# Content begins here.
## Content continues here.
I want to end up with:
Section Header
Subsection 1
# Content begins here.
## Content continues here.
Simply remove all equals signs.
I began with the regex setup of:
var reg_titles = /(^)(=+)(.+)(=+)/
This regex searches for lines that begin with one or more equals with another set of one or more equals. Rubular shows that it matches my lines accurately and does not catch equals signs in the middle of contet. http://www.rubular.com/r/46PrkPx8OB
The code to replace the string based on regex
var lines = $('.tb_in').val().split('\n'); //use jquery to grab text in a textarea, and split into an array of lines based on the \n
for(var i = 0;i < lines.length;i++){
line_temp = lines[i].replace(reg_titles, "");
lines[i] = line_temp; //replace line with temp
}
$('.tb_out').val(lines.join("\n")); //rejoin and print result
My result is unfortunately:
Section Header==
Subsection 1===
# Content begins here.
## Content continues here.
I cannot figure out why the regex replace function, when it finds multiple matches, seems to only replace the first instance it finds, not all instances.
Even when my regex is updated to:
var reg_titles = /(={2,})/
"Find any two or more equals", the output is still identical. It makes a single replacement and ignores all other matches.
No one regex expression executor behaves this way for me. Running the same replace multiple times has no effect.
Any advice on how to get my string replace function to replace ALL instances of the matched regex instead of just the first one?

^=+|=+$
You can use this.Do not forget to add g and m flags.Replace by ``.See demo.
http://regex101.com/r/nA6hN9/28

Add the g modifier to do a global search:
var reg_titles = /^(=+)(.+?)(=+)/g

Your regex is needlessly complex, and yet doesn't actually accomplish what you set out to do. :) You might try something like this instead:
var reg_titles = /^=+(.+?)=+$/;
lines = $('.tb_in').val().split('\n');
lines.forEach(function(v, i, a) {
a[i] = v.replace(reg_titles, '$1');
})
$('.tb_out').val(lines.join("\n"));

Skipping over tags and spaces in regex html

I'm using this regex to find a String that starts with !?, ends with ?!, and has another variable inbetween (in this example "a891d050"). This is what I use:
var pattern = new RegExp(/!\\?.*\s*(a891d050){1}.*\s*\\?!/);
It matches correctly agains this one:
!?v8qbQ5LZDnFLsny7VmVe09HJFL1/WfGD2A:::a891d050?!
But fails when the string is broken up with html tags.
<span class="userContent"><span>!?v8qbQ5LZDnFLsny7VmVe09HJFL1/</span><wbr /><span class="word_break"></span>WfGD2A:::a891d050?!</span></div></div></div></div>
I tried adding \s and {space}*, but it still fails.
The question is, what (special?)characters do I need to account for if I want to ignore whitespace and html tags in my match.
edit: this is how I use the regex:
var pattern = /!\?[\s\S]*a891d050[\s\S]*\?!/;
document.body.innerHTML = document.body.innerHTML.replace(pattern,"new content");
It appears to me that when it encounters the 'plain' string it replaces is correctly. But when faced with String with classes around it and inside, it makes a mess of the classes or doesn't replace at all depending on the context. So I decided to try jquery-replacetext-plugin(as it promises to leave tags as they were) like this:
$("body *").replaceText( pattern, "new content" );
But with no success, the results are the same as before.

Maybe this:
var pattern = /!\?[\s\S]*a891d050[\s\S]*\?!/;
[\s\S] should match any character. I have also removed {1}.

The problem was apparently solved by using this regex:
var pattern = /(!\?)(?:<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])*?>)?(.)*?(a891d050)(?:<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])*?>)?(.)*?(\?!)/;

REGEX to match starting and ending span tags without their inner text

I am using the following RegEx to do a replacement in a string:
<\/?(span)\b(?:\s+class="highlight")?>
But this regex has a flaw... Take this sample code for example:
<p>
Some text here
<span class="highlight">This is highlighted</span>
<span>This is not highlighted</span>
</p>
My regex will match both of the span tags although i only want the one with the class="highlight" set. How can I achieve this using RegEx?
PS: please do not tell me that I should not use RegEx for this because i will downgrade your answer as it is off-topic. This is a question for the RegEx guys.
EDIT: based on the accepted answer below i am using the following regex to do a replace
NOTE: code is in javascript (mootools)
var regex = new RegExp("(<span[^>]+class\\s*=\\s*(\"|')highlight\\2[^>]*>)(.*?)(</span>)",'g');
var replaced = element.get('html').replace(regex, "$3");
element.set('html', replaced);
The above regex will replace a some text here with "some text here" (without the double quotes)

This should give the most flexibility.
(<span[^>]+class\s*=\s*("|')highlight\2[^>]*>)[^<]*(</span>)
UPDATE:
The captured groups you need for the opening and closing tags are \1 and \3.

Just to show you that an alternative solution is not only possible bot also better than using regex:
$$('span.highlight').each(function (node, idx, Elem) {
var txt = document.createTextNode(Elem.get('text'));
node.parentNode.replaceChild(txt, node)
});
See this fiddle: http://jsfiddle.net/Tomalak/umgZp/
(And this is just off the top of my hat, I've had zero exposure to MooTools so far. There might be more elegant ways than this.)

You are obviously stating that that class=highlight part is optional, by placing a ? in front of the group capturing it.
This should do it for you:
var regex = /(?:<span\s+[^>]*?\s*class\s*=\s*('|")(?:\S+\s+)?highlight(?:\s+\S+)?\1[^>]*>|<\/span>/;
This will also include SPAN tags with class attributes like a b c highlight e f g.
Also, if you want to capture a SPAN tag with its matching ending, you can use this, and access groups 1 and 3 respectively for the opening and ending tags:
var regex = /(<span\s+[^>]*?\s*class\s*=\s*('|")(?:\S+\s+)?highlight(?:\s+\S+)?\1[^>]*>).*?(<\/span>)/;

Develop Reference

JavaScript is the programming language of the Web.

RegExp: how to exclude matched groups from $N? - javascript

Related

Replace multiple identical characters with a string

Regular expression to match a string which is NOT matched by a given regexp

Javascript Regex only replacing first match occurence

Skipping over tags and spaces in regex html

REGEX to match starting and ending span tags without their inner text

Categories

Resources