Javascript/Node.JS find text between two strings, multiple times

Javascript/Node.JS find text between two strings, multiple times - javascript

I need to find the text between begin and end, multiple times. I have a regex expression setup, but it only finds the first instance. Is there a way that I could make it find every "text" and then I can call them separately as an array, i.e., instances[1], instances[2], etc. I am using Node.JS so I cannot use the DOM as some other answers have applied.
begin
text
end
begin
text
end
begin
text
end
begin
text
end

Yes append /g to the end of the regular expression to match all occurrences like so:
let myArr = "begin middle end".match(/regularExpression/g)
Below is a snippet for your purposes:
var input = "begin middle end";
var regex = /begin\s(.*)\send/g;
var matches;
while (matches = regex.exec(input)) {
console.log(matches);
console.log('Middle text is: ' + matches[1]);
}

The expression will be something like this:
/begin\n([\w ]+)\nend/g
Note the g at the end for global match.

Related

Replce repeating set of character from end of string using regex

I want to remove all <br> from the end of this string. Currently I am doing this (in javascript) -
const value = "this is an event. <br><br><br><br>"
let description = String(value);
while (description.endsWith('<br>')) {
description = description.replace(/<br>$/, '');
}
But I want to do it without using while loop, by only using some regex with replace. Is there a way?

To identify the end of the string in RegEx, you can use the special $ symbol to denote that.
To identify repeated characters or blocks of text containing certain characters, you can use + symbol.
In your case, the final regex is: (<br>)*$
This will remove 0 or more occurrence of <br> from the end of the line.
Example:
const value = "this is an event. <br><br><br><br>"
let description = String(value);
description.replace(/(<br>)*$/g, '');

You may try:
var value = "this is an event. <br><br><br><br>";
var output = value.replace(/(<.*?>)\1*$/, "");
console.log(output);
Here is the regex logic being used:
(<.*?>) match AND capture any HTML tag
\1* then match that same tag zero or more additional times
$ all tags occurring at the end of the string

JS conditional RegEx that removes different parts of a string between two delimiters

I have a string of text with HTML line breaks. Some of the <br> immediately follow a number between two delimiters «...» and some do not.
Here's the string:
var str = ("«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>");
I’m looking for a conditional regex that’ll remove the number and delimiters (ex. «1») as well as the line break itself without removing all of the line breaks in the string.
So for instance, at the beginning of my example string, when the script encounters »<br> it’ll remove everything between and including the first « to the left, to »<br> (ex. «1»<br>). However it would not remove «2»some text<br>.
I’ve had some help removing the entire number/delimiters (ex. «1») using the following:
var regex = new RegExp(UsedKeys.join('|'), 'g');
var nextStr = str.replace(/«[^»]*»/g, " ");
I sure hope that makes sense.
Just to be super clear, when the string is rendered in a browser, I’d like to go from this…
«1»
«2»some text
«3»
«4»more text
«5»
«6»even more text
To this…
«2»some text
«4»more text
«6»even more text
Many thanks!

Maybe I'm missing a subtlety here, if so I apologize. But it seems that you can just replace with the regex: /«\d+»<br>/g. This will replace all occurrences of a number between « & » followed by <br>
var str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\d+»<br>/g, '')
console.log(newStr)
To match letters and digits you can use \w instead of \d
var str = "«a»<br>«b»some text<br>«hel»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\w+?»<br>/g, '')
console.log(newStr)

This snippet assumes that the input within the brackets will always be a number but I think it solves the problem you're trying to solve.
const str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>";
console.log(str.replace(/(«(\d+)»<br>)/g, ""));
/(«(\d+)»<br>)/g
«(\d+)» Will match any brackets containing 1 or more digits in a row
If you would prefer to match alphanumeric you could use «(\w+)» or for any characters including symbols you could use «([^»]+)»
<br> Will match a line break
//g Matches globally so that it can find every instance of the substring
Basically we are only removing the bracketed numbers if they are immediately followed by a line break.

Regex with multiple start and end characters that must be the same

I would like to be able to search for strings inside a special tag in a string in JavaScript. Strings in JavaScript can start with either " or ' character.
Here an example to illustrate what I want to do. My custom tag is called <my-tag. My regex is /('|")*?<my-tag>((.|\n)[^"']*?)<\/my-tag>*?('|")/g. I use this regex pattern on the following strings:
var a = '<my-tag>Hello World</my-tag>'; //is found as expected
var b = "<my-tag>Hello World" + '</my-tag>'; //is NOT found, this is good!
var c = "<my-tag>Hello World</my-tag>"; //is found as expected
var d = '<my-tag>something "special"</my-tag>'; //here the " char causes a problem
var e = "<my-tag>something 'special'</my-tag>"; //here the " char causes a problem
It works well with a and also c where it finds the tag with the containing text. It also does not find the text in b which is what I want. But in case d and e the tag with content is not found due to the occurrence of the " and ' character. What I want is a regex where inside the tag " is allowed if the string is start with ', and vice versa.
Is it possible to achieve this with one regex, or is the only thing I can do is to work with two separate regex expressions like
/(")*?<my-tag>((.|\n)[^']*?)<\/my-tag>*?(")/g and /(')*?<my-tag>((.|\n)[^"]*?)<\/my-tag>*?(')/g ?

It's not pretty, but I think this would work:
/("<my-tag>((.|\n)[^"]*?)<\/my-tag>"|'<my-tag>((.|\n)[^']*?)<\/my-tag>')/g

You should be able to use de match from the first match ('|") and reuse it for the second match. Something like the following:
/('|")<my-tag>.*?<\/my-tag>\1/g
This should make sure to match the same character at the beginning and the end.
But you really shouldn't use regex for parsing HTML.

Regexp to capture comma separated values

I have a string that can be a comma separated list of \w, such as:
abc123
abc123,def456,ghi789
I am trying to find a JavaScript regexp that will return ['abc123'] (first case) or ['abc123', 'def456', 'ghi789'] (without the comma).
I tried:
^(\w+,?)+$ -- Nope, as only the last repeating pattern will be matched, 789
^(?:(\w+),?)+$ -- Same story. I am using non-capturing bracket. However, the capturing just doesn't seem to happen for the repeated word
Is what I am trying to do even possible with regexp? I tried pretty much every combination of grouping, using capturing and non-capturing brackets, and still not managed to get this happening...

If you want to discard the whole input when there is something wrong, the simplest way is to validate, then split:
if (/^\w+(,\w+)*$/.test(input)) {
var values = input.split(',');
// Process the values here
}
If you want to allow empty value, change \w+ to \w*.
Trying to match and validate at the same time with single regex requires emulation of \G feature, which assert the position of the last match. Why is \G required? Since it prevents the engine from retrying the match at the next position and bypass your validation. Remember than ECMA Script regex doesn't have look-behind, so you can't differentiate between the position of an invalid character and the character(s) after it:
something,=bad,orisit,cor&rupt
^^ ^^
When you can't differentiate between the 2 positions, you can't rely on the engine to do a match-all operation alone. While it is possible to use a while loop with RegExp.exec and assert the position of last match yourself, why would you do so when there is a cleaner option?
If you want to savage whatever available, torazaburo's answer is a viable option.

Live demo
Try this regex :
'/([^,]+)/'
Alternatively, strings in javascript have a split method that can split a string based on a delimeter:
s.split(',')

Split on the comma first, then filter out results that do not match:
str.split(',').filter(function(s) { return /^\w+$/.test(s); })

This regex pattern separates numerical value in new line which contains special character such as .,,,# and so on.
var val = [1234,1213.1212, 1.3, 1.4]
var re = /[0-9]*[0-9]/gi;

var str = "abc123,def456, asda12, 1a2ass, yy8,ghi789";
var re = /[a-z]{3}\d{3}/g;
var list = str.match(re);
document.write("<BR> list.length: " + list.length);
for(var i=0; i < list.length; i++) {
document.write("<BR>list(" + i + "): " + list[i]);
}
This will get only "abc123" code style in the list and nothing else.

May be you can use split function
var st = "abc123,def456,ghi789";
var res = st.split(',');

Find string using regular expression, and reuse said string with new surrounding text

I've searched Hi and low, but couldn't find an exact answer to what i'm trying to do...
I'd like to find any text with __ in the beginning and /__ at the end (i.e. "in the middle of the sentence __this/__ could be underlined, and __this(!) can also/__ be underlined"). so, it can be one word, or a few, with any characters in there, including spaces. There could be different words and combination - in the same paragraph - starting with __ and ending with /__ .
Once found, i'd like to remove the __ and /__ and replace them with HTML - for example, a div tag.
so:
__sample string /__
should be:
<div>sample string</div>
I know i'm supposed to use capturing groups, but i can't find a way to do this.
javascript:
.match seems to match, and put the results in an array - but how do i go back into the string and replace the found results?
jquery:
.replace should work for this, but i'm not sure how to reference the found string, and surround it...
Thanks for reading!

You don't need match but you need String#replace:
s='in the middle of the sentence __this/__ could be underlined, and __this(!) can also/__ be underlined';
var repl = s.replace(/__(.*?)\/__/g, "<div>$1</div>");
//=> in the middle of the sentence <div>this</div> could be underlined, and <div>this(!) can also</div> be underlined

Try this. It's a slight variation of something we have working here. I modified the replace part...but didn't actually test it. If you need to find more than one occurrence, I suppose you could pass-in a new starting index which would be the index of where you left off from the first time.
public static string getBetween(string strSource, string strStart, string strEnd)
{
int Start, End;
if (strSource.Contains(strStart) && strSource.Contains(strEnd))
{
Start = strSource.IndexOf(strStart, 0) + strStart.Length;
End = strSource.IndexOf(strEnd, Start);
return strSource.Substring(Start, End - Start);
}
else
{
return "";
}
}
string betweenString = getBetween(sourceString, "__", "/__");
sourceString = sourceString.Replace("__"+betweenString+"/__", "<div>"+betweenString+"</div>");

Develop Reference

JavaScript is the programming language of the Web.

Javascript/Node.JS find text between two strings, multiple times - javascript

The expression will be something like this: /begin\n([\w ]+)\nend/g Note the g at the end for global match.

Related

Replce repeating set of character from end of string using regex

JS conditional RegEx that removes different parts of a string between two delimiters

Regex with multiple start and end characters that must be the same

Regexp to capture comma separated values

Find string using regular expression, and reuse said string with new surrounding text

Categories

Resources