Regex Help for content between two strings (javascript)

Regex Help for content between two strings (javascript) - javascript

Hoping someone might help. I have a string formatted like the example below:
Lipsum text as part of a paragraph here, yada. |EMBED|{"content":"foo"}|/EMBED|. Yada and the text continues...
What I am looking for is a Javascript RegEx to capture the content between the |EMBED||/EMBED| 'tags', run a function on that content, and then to replace the entire |EMBED|...|/EMBED| string with the return of that function.
The catch is that I may have multiple |EMBED| blocks within a larger string. For example:
Yabba...|EMBED|{"content":"foo"}|/EMBED|. Dabba-do...|EMBED|{"content":"yo"}|/EMBED|.
I need the RegEx to capture and process each |EMBED| block separately, since the content contained within will be similar, but unique.
My initial thought is that I could just have a RegEx that captures the first iteration of the |EMBED| block, and the function which replaces this |EMBED| block is either part of a loop or recursion to continuously find the next block and replace it, until no more blocks are found in the string.
...but this seems expensive. Is there a more eloquent way?

You can use String.prototype.replace to replace a substring found via a regular expression with a modified version of the match using a mapping function, e.g.:
var input = 'Yabba...|EMBED|{"content":"foo"}|/EMBED|. Dabba-do...|EMBED|{"content":"yo"}|/EMBED|.'
var output = input.replace(/\|EMBED\|(.*?)\|\/EMBED\|/g, function(match, p1) {
return p1.toUpperCase()
})
console.log(output) // "Yabba...{"CONTENT":"FOO"}. Dabba-do...{"CONTENT":"YO"}."
Make sure that you use a non-greedy selector .*? to select the content between the delimiters to allow multiple replacements per string.

This is the cod which iterate through the matches of the regex:
var str = 'Lipsum text as part of a paragraph here, yada. |EMBED|{"content":"foo"}|/EMBED|. Yada and the text continues...';
var rx = /\|EMBED\|(.*)\|\/EMBED\|/gi;
var match;
while (true)
{
match = rx.exec(str);
if (!match)
break;
console.log(match[1]); //match[1] is the content between "the tags"
}

Related

JS conditional RegEx that removes different parts of a string between two delimiters

I have a string of text with HTML line breaks. Some of the <br> immediately follow a number between two delimiters «...» and some do not.
Here's the string:
var str = ("«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>");
I’m looking for a conditional regex that’ll remove the number and delimiters (ex. «1») as well as the line break itself without removing all of the line breaks in the string.
So for instance, at the beginning of my example string, when the script encounters »<br> it’ll remove everything between and including the first « to the left, to »<br> (ex. «1»<br>). However it would not remove «2»some text<br>.
I’ve had some help removing the entire number/delimiters (ex. «1») using the following:
var regex = new RegExp(UsedKeys.join('|'), 'g');
var nextStr = str.replace(/«[^»]*»/g, " ");
I sure hope that makes sense.
Just to be super clear, when the string is rendered in a browser, I’d like to go from this…
«1»
«2»some text
«3»
«4»more text
«5»
«6»even more text
To this…
«2»some text
«4»more text
«6»even more text
Many thanks!

Maybe I'm missing a subtlety here, if so I apologize. But it seems that you can just replace with the regex: /«\d+»<br>/g. This will replace all occurrences of a number between « & » followed by <br>
var str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\d+»<br>/g, '')
console.log(newStr)
To match letters and digits you can use \w instead of \d
var str = "«a»<br>«b»some text<br>«hel»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\w+?»<br>/g, '')
console.log(newStr)

This snippet assumes that the input within the brackets will always be a number but I think it solves the problem you're trying to solve.
const str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>";
console.log(str.replace(/(«(\d+)»<br>)/g, ""));
/(«(\d+)»<br>)/g
«(\d+)» Will match any brackets containing 1 or more digits in a row
If you would prefer to match alphanumeric you could use «(\w+)» or for any characters including symbols you could use «([^»]+)»
<br> Will match a line break
//g Matches globally so that it can find every instance of the substring
Basically we are only removing the bracketed numbers if they are immediately followed by a line break.

Regular expression to match a string which is NOT matched by a given regexp

I've been hoving around by some answers here, and I can't find a solution to my problem:
I have this regexp which matches everyting inside an HTML span tag, including contents:
<span\b[^>]*>(.*?)</span>
and I want to find a way to make a search in all the text, except for what is matched with that regexp.
For example, if my text is:
var text = "...for there is a class of <span class="highlight">guinea</span> pigs which..."
... then the regexp would match:
<span class="highlight">guinea</span>
and I want to be able to make a regexp such that if I search for "class", regexp will match "...for there is a class of..."
and will not match inside the tag, like in
"... class="highlight"..."
The word to be matched ("class") might be anywhere within the text. I've tried
(?!<span\b[^>]*>(.*?)</span>)class
but it keeps searching inside tags as well.
I want to find a solution using only regexp, not dealing with DOM nor JQuery. Thanks in advance :).

Although I wouldn't recommend this, I would do something like below
(class)(?:(?=.*<span\b[^>]*>))|(?:(?<=<\/span>).*)(class)
You can see this in action here
Rubular Link for this regex
You can capture your matches from the groups and work with them as needed. If you can, use a HTML parser and then find matches from the text element.

It's not pretty, but if I get you right, this should do what you wan't. It's done with a single RegEx but js can't (to my knowledge) extract the result without joining the results in a loop.
The RegEx: /(?:<span\b[^>]*>.*?<\/span>)|(.)/g
Example js code:
var str = '...for there is a class of <span class="highlight">guinea</span> pigs which...',
pattern = /(?:<span\b[^>]*>.*?<\/span>)|(.)/g,
match,
res = '';
match = pattern.exec(str)
while( match != null )
{
res += match[1];
match = pattern.exec(str)
}
document.writeln('Result:' + res);
In English: Do a non capturing test against your tag-expression or capture any character. Do this globally to get the entire string. The result is a capture group for each character in your string, except the tag. As pointed out, this is ugly - can result in a serious number of capture groups - but gets the job done.
If you need to send it in and retrieve the result in one call, I'd have to agree with previous contributors - It can't be done!

Javascript Regex only replacing first match occurence

I am using regular expressions to do some basic converting of wiki markup code into copy-pastable plain text, and I'm using javascript to do the work.
However, javascript's regex engine behaves much differently to the ones I've used previously as well as the regex in Notepad++ that I use on a daily basis.
For example- given a test string:
==Section Header==
===Subsection 1===
# Content begins here.
## Content continues here.
I want to end up with:
Section Header
Subsection 1
# Content begins here.
## Content continues here.
Simply remove all equals signs.
I began with the regex setup of:
var reg_titles = /(^)(=+)(.+)(=+)/
This regex searches for lines that begin with one or more equals with another set of one or more equals. Rubular shows that it matches my lines accurately and does not catch equals signs in the middle of contet. http://www.rubular.com/r/46PrkPx8OB
The code to replace the string based on regex
var lines = $('.tb_in').val().split('\n'); //use jquery to grab text in a textarea, and split into an array of lines based on the \n
for(var i = 0;i < lines.length;i++){
line_temp = lines[i].replace(reg_titles, "");
lines[i] = line_temp; //replace line with temp
}
$('.tb_out').val(lines.join("\n")); //rejoin and print result
My result is unfortunately:
Section Header==
Subsection 1===
# Content begins here.
## Content continues here.
I cannot figure out why the regex replace function, when it finds multiple matches, seems to only replace the first instance it finds, not all instances.
Even when my regex is updated to:
var reg_titles = /(={2,})/
"Find any two or more equals", the output is still identical. It makes a single replacement and ignores all other matches.
No one regex expression executor behaves this way for me. Running the same replace multiple times has no effect.
Any advice on how to get my string replace function to replace ALL instances of the matched regex instead of just the first one?

^=+|=+$
You can use this.Do not forget to add g and m flags.Replace by ``.See demo.
http://regex101.com/r/nA6hN9/28

Add the g modifier to do a global search:
var reg_titles = /^(=+)(.+?)(=+)/g

Your regex is needlessly complex, and yet doesn't actually accomplish what you set out to do. :) You might try something like this instead:
var reg_titles = /^=+(.+?)=+$/;
lines = $('.tb_in').val().split('\n');
lines.forEach(function(v, i, a) {
a[i] = v.replace(reg_titles, '$1');
})
$('.tb_out').val(lines.join("\n"));

How do I add prevent my regex from parsing certain parts of a string?

As per this answer, I am making use of a replaceAll() function to swap arbitrary strings in my javascript (Node) application:
this.replaceAll = function(str1, str2, ignoreCase)
{
return this.replace(new RegExp(str1.replace(/([\/\,\!\\\^\$\{\}\[\]\(\)\.\*\+\?\|\<\>\-\&])/g,"\\$&"),(ignoreCase?"gi":"g")),(typeof(str2)=="string")?str2.replace(/\$/g,"$$$$"):str2);
}
I would like to expand this regex, such that it does not try to match anything inside a set of <span>...</span> tags. (I need to add HTML Span tags around parts of some strings, and I don't want to wrap anything in a span twice, if the patterns would duplicate [I.e., 'Foo' and 'Foobar' in the string "Foobarbaz"])
I am running multiple regex search/replaces on a single string, and I want to make sure that nothing gets processed multiple times.
My understanding is that I'll need a [<SPAN>] ... [</SPAN>] somehow, but I'm not quite sure on the specifics. Any advice?

I don't understand what your regexp does, but in general the technique is like this:
html = "replace this and this but not <span> this one </span> or <b>this</b> but this is fine"
// remove parts you don't want to be touched and place them in a buffer
tags = []
html = html.replace(/<(\w+).+?<\/\1>/g, function($0) {
tags.push($0)
return '##' + (tags.length - 1)
})
// do the actual replacement
html = html.replace(/this/g, "that")
// put preserved parts back
html = html.replace(/##(\d+)/g, function($0, $1) {
return tags[$1]
})

Javascript Regular expression to remove unwanted <br>,

I have a JS stirng like this
<div id="grouplogo_nav"><br> <ul><br> <li><a class="group_hlfppt" target="_blank" href="http://www.hlfppt.org/">&nbsp;</a></li><br> </ul><br> </div>
I need to remove all <br> and $nbsp; that are only between > and <. I tried to write a regular expression, but didn't got it right. Does anybody have a solution.
EDIT :
Please note i want to remove only the tags b/w > and <

Avoid using regex on html!
Try creating a temporary div from the string, and using the DOM to remove any br tags from it. This is much more robust than parsing html with regex, which can be harmful to your health:
var tempDiv = document.createElement('div');
tempDiv.innerHTML = mystringwithBRin;
var nodes = tempDiv.childNodes;
for(var nodeId=nodes.length-1; nodeId >= 0; --nodeId) {
if(nodes[nodeId].tagName === 'br') {
tempDiv.removeChild(nodes[nodeId]);
}
}
var newStr = tempDiv.innerHTML;
Note that we iterate in reverse over the child nodes so that the node IDs remain valid after removing a given child node.
http://jsfiddle.net/fxfrt/

myString = myString.replace(/^( |<br>)+/, '');
... where /.../ denotes a regular expression, ^ denotes start of string, ($nbsp;|<br>) denotes " or <br>", and + denotes "one or more occurrence of the previous expression". And then simply replace that full match with an empty string.

s.replace(/(>)(?: |<br>)+(\s?<)/g,'$1$2');
Don't use this in production. See the answer from Phil H.
Edit: I try to explain it a bit and hope my english is good enough.
Basically we have two different kinds of parentheses here. The first pair and third pair () are normal parentheses. They are used to remember the characters that are matched by the enclosed pattern and group the characters together. For the second pair, we don't need to remember the characters for later use, so we disable the "remember" functionality by using the form (?:) and only group the characters to make the + work as expected. The + quantifier means "one or more occurrences", so or <br> must be there one or more times. The last part (\s?<) matches a whitespace character (\s), which can be missing or occur one time (?), followed by the characters <. $1 and $2 are kind of variables that are replaces by the remembered characters of the first and third parentheses.
MDN provides a nice table, which explains all the special characters.

You need to replace globally. Also don't forget that you can have the being closed . Try this:
myString = myString.replace(/( |<br>|<br \/>)/g, '');

This worked for me, please note for the multi lines
myString = myString.replace(/( |<br>|<br \/>)/gm, '');

myString = myString.replace(/^( |<br>)+/, '');
hope this helps

Develop Reference

JavaScript is the programming language of the Web.

Regex Help for content between two strings (javascript) - javascript

Related

JS conditional RegEx that removes different parts of a string between two delimiters

Regular expression to match a string which is NOT matched by a given regexp

Javascript Regex only replacing first match occurence

How do I add prevent my regex from parsing certain parts of a string?

Javascript Regular expression to remove unwanted <br>,

Categories

Resources