regex to match subexpression at end of string - javascript

I'm trying to test whether the ending pattern in a string is an html closing tag (assuming trailing spaces are trimmed).
var str1 = "<em>I</em> am <strong>dummy</strong> <em>text.</em>"; //ends with html close tag
var str2 = "<em>I</em> am <strong>dummy</strong> <strong>text.</strong>"; //ends with html close tag
var str3 = "<em>I</em> am <strong>dummy</strong> text"; //does not end with html close tag
Using str1 above, I would like to get the position of the ending tag, which is an . Here are my attempts:
var rgx1 = /(<\/em>)$/g; // works. basic scenario. matches closing </em> tags at the end of the string.
var rgx2 = /<\s*\/\s*\w\s*.*?>/g; //matches html closing tags.
var rgx3 = /<\s*\/\s*\w\s*.*?>$/g; //doesn't work. supposed to match closing html tag at the end of the string
console.log(str.search(rgx1))
While rgx1 correctly returns the position of the ending tag, and rgx2 correctly returns the position of a closing html tag in general, I'm trying get a generalized regex that will return the positing of any html tag that ends the string. Why doesn't rgx3 work?

Should just use a negative char class to match anything that's not a closing >
var rgx = /<\/[^>]+>$/g;
as to why rgx3 didn't work... your pattern isn't really good but it should technically match... if it didn't work with the $ on the end there, then the string you are matching probably isn't trimmed as you suppose it to be (or some other thing on the end other than closing html tag)

Seems like there might be an issue with rgx2 and rgx3 - an extra .*? before the > and a missing * after \w - here's how I would write the regexes. The fact that rgx2 was working at all was because of the match all (.*)
var rgx2 = /<\s*\/\s*\w*\s*>/g;
var rgx3 = /<\s*\/\s*\w*\s*>$/g;

Related

Replce repeating set of character from end of string using regex

I want to remove all <br> from the end of this string. Currently I am doing this (in javascript) -
const value = "this is an event. <br><br><br><br>"
let description = String(value);
while (description.endsWith('<br>')) {
description = description.replace(/<br>$/, '');
}
But I want to do it without using while loop, by only using some regex with replace. Is there a way?
To identify the end of the string in RegEx, you can use the special $ symbol to denote that.
To identify repeated characters or blocks of text containing certain characters, you can use + symbol.
In your case, the final regex is: (<br>)*$
This will remove 0 or more occurrence of <br> from the end of the line.
Example:
const value = "this is an event. <br><br><br><br>"
let description = String(value);
description.replace(/(<br>)*$/g, '');
You may try:
var value = "this is an event. <br><br><br><br>";
var output = value.replace(/(<.*?>)\1*$/, "");
console.log(output);
Here is the regex logic being used:
(<.*?>) match AND capture any HTML tag
\1* then match that same tag zero or more additional times
$ all tags occurring at the end of the string

JS conditional RegEx that removes different parts of a string between two delimiters

I have a string of text with HTML line breaks. Some of the <br> immediately follow a number between two delimiters «...» and some do not.
Here's the string:
var str = ("«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>");
I’m looking for a conditional regex that’ll remove the number and delimiters (ex. «1») as well as the line break itself without removing all of the line breaks in the string.
So for instance, at the beginning of my example string, when the script encounters »<br> it’ll remove everything between and including the first « to the left, to »<br> (ex. «1»<br>). However it would not remove «2»some text<br>.
I’ve had some help removing the entire number/delimiters (ex. «1») using the following:
var regex = new RegExp(UsedKeys.join('|'), 'g');
var nextStr = str.replace(/«[^»]*»/g, " ");
I sure hope that makes sense.
Just to be super clear, when the string is rendered in a browser, I’d like to go from this…
«1»
«2»some text
«3»
«4»more text
«5»
«6»even more text
To this…
«2»some text
«4»more text
«6»even more text
Many thanks!
Maybe I'm missing a subtlety here, if so I apologize. But it seems that you can just replace with the regex: /«\d+»<br>/g. This will replace all occurrences of a number between « & » followed by <br>
var str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\d+»<br>/g, '')
console.log(newStr)
To match letters and digits you can use \w instead of \d
var str = "«a»<br>«b»some text<br>«hel»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\w+?»<br>/g, '')
console.log(newStr)
This snippet assumes that the input within the brackets will always be a number but I think it solves the problem you're trying to solve.
const str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>";
console.log(str.replace(/(«(\d+)»<br>)/g, ""));
/(«(\d+)»<br>)/g
«(\d+)» Will match any brackets containing 1 or more digits in a row
If you would prefer to match alphanumeric you could use «(\w+)» or for any characters including symbols you could use «([^»]+)»
<br> Will match a line break
//g Matches globally so that it can find every instance of the substring
Basically we are only removing the bracketed numbers if they are immediately followed by a line break.

Javascript regex not containing keyword with backslashes

I'm having a problem with a javascript regex that has to comment out all tags inside a script tag. But it can not comment out special first script tag with id "ignorescript".
Here is a sample string to regex:
<script id="ignorescript">
var test = '<script>test<\/script>;
var xxxx = 'x';
</script>
Script tag inside ignorescipt has extra backslash because it is JSON encoded (from PHP).
And here is the final result i have to get:
<script id="ignorescript">
var test = '<!ignore-- <script>test<\/script> ignore-->;
var xxxx = 'x';
</script>
Following example works:
content = content.replace(/(<script>.*<\\\/script>)/g,
"<!--ignore $1 ignore-->");
But I need to check that it does not contain a keyword "ignorescript". If that keyword comes up then I do not want to replace anything. Otherwise add ignore comments to whole script tag So far I have gotten this far:
content = content.replace(/(<script.((?!ignorescript).)*<\/script>)/g,
"<!--ignore $1 ignore-->");
It kinda works, but not the way it supposed to be. I also have one more backslash in ending tag. So I changed it to:
content = content.replace(/(<script.((?!ignorescript).)*<\\\/script>)/g,
"<!--ignore $1 ignore-->");
Not it does not find anything at all.
Got it finally working.
Here is the working regex:
/(<script(?!\sid="ignorescript").*?<\\\/script>)/g

Skipping over tags and spaces in regex html

I'm using this regex to find a String that starts with !?, ends with ?!, and has another variable inbetween (in this example "a891d050"). This is what I use:
var pattern = new RegExp(/!\\?.*\s*(a891d050){1}.*\s*\\?!/);
It matches correctly agains this one:
!?v8qbQ5LZDnFLsny7VmVe09HJFL1/WfGD2A:::a891d050?!
But fails when the string is broken up with html tags.
<span class="userContent"><span>!?v8qbQ5LZDnFLsny7VmVe09HJFL1/</span><wbr /><span class="word_break"></span>WfGD2A:::a891d050?!</span></div></div></div></div>
I tried adding \s and {space}*, but it still fails.
The question is, what (special?)characters do I need to account for if I want to ignore whitespace and html tags in my match.
edit: this is how I use the regex:
var pattern = /!\?[\s\S]*a891d050[\s\S]*\?!/;
document.body.innerHTML = document.body.innerHTML.replace(pattern,"new content");
It appears to me that when it encounters the 'plain' string it replaces is correctly. But when faced with String with classes around it and inside, it makes a mess of the classes or doesn't replace at all depending on the context. So I decided to try jquery-replacetext-plugin(as it promises to leave tags as they were) like this:
$("body *").replaceText( pattern, "new content" );
But with no success, the results are the same as before.
Maybe this:
var pattern = /!\?[\s\S]*a891d050[\s\S]*\?!/;
[\s\S] should match any character. I have also removed {1}.
The problem was apparently solved by using this regex:
var pattern = /(!\?)(?:<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])*?>)?(.)*?(a891d050)(?:<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])*?>)?(.)*?(\?!)/;

regular expression to remove comment javascript

I am using below regular expression to remove comments from string
<\!{1}\-{2}(.*?)\-{2}\s*>
This is working fine except for mult-iline string
var search = '<\!{1}\-{2}(.*?)\-{2}\s*>';
var re = new RegExp(search, "gm");
var subject = <multi-line string>;
result = subject.replace(re, '');
what should I do to get it working with multiline strings
. does not allow linebreaks.
This one should work:
^(<\!\-{2})((.|\s)*?)\-{2}>$
Fix:
<!--[\S\s]*?-->
I removed the \s at the beginning and the end of the expression and added it in the middle so multiline-comments are allowed.
But you shoud have a look at BartKs comment ;)
regards

Categories

Resources