Regex match with variables - javascript

I am using JavaScript and would like to match everything in a custom template language like this:
Begin10
Match THIS keyword and ANOTHER
End10
So I would like to find Begin10 using the 10 as variable to find End10 and match THIS and ANOTHER between them.
I've looked at capture groups. I assume this is the way to go, but I can't figure out how to compose the expression.
THIS and ANOTHER need to be targeted for syntax highlighting by my code.

You can use regex with captured group
var str = `Begin10
Match THIS keyword and ANOTHER1
End10
Begin20
Match THIS keyword and ANOTHER2
End20`;
console.log(
str.match(/\bBegin(\d+)[\s\S]*?\bEnd\1\b/g)
);
To get the string between them, do something like this
var str = `Begin10
Match THIS keyword and ANOTHER1
End10
Begin20
Match THIS keyword and ANOTHER2
End20`;
var res = [],
regex = /\bBegin(\d+)\s+([\s\S]*?)\s+\bEnd\1\b/g,
match;
while (match = regex.exec(str)) {
res.push(match[2]);
}
console.log(res);
Regex explanation here
UPDATE :
If there is only THIS or ANOTHER between them then use
var str = `Begin10
THIS
End10
Begin20
ANOTHER
End20`;
var res = [],
regex = /\bBegin(\d+)\s+(THIS|ANOTHER)\s+\bEnd\1\b/g,
match;
while (match = regex.exec(str)) {
res.push(match[2]);
}
console.log(res);

This will work
\bBegin(\d+)\b([\S\s]*)\bEnd\1\b
Regex Demo

\nBegin(\d+)\s*\nMatch \b(\w+)\b keyword and \b(\w+)\b\s*\nEnd\1\n
Explanation:
\n will match a newline, and I am sure you don't want to get anything besides newline.
\1 that's a variable for the first group to make sure it has the same value.
\s* just to ignore some unnecessary whitespaces.

Related

Javascipt regex to get string between two characters except escaped without lookbehind

I am looking for a specific javascript regex without the new lookahead/lookbehind features of Javascript 2018 that allows me to select text between two asterisk signs but ignores escaped characters.
In the following example only the text "test" and the included escaped characters are supposed to be selected according the rules above:
\*jdjdjdfdf*test*dfsdf\*adfasdasdasd*test**test\**sd* (Selected: "test", "test", "test\*")
During my research I found this solution Regex, everything between two characters except escaped characters /(?<!\\)(%.*?(?<!\\)%)/ but it uses negative lookbehinds which is supported in javascript 2018 but I need to support IE11 as well, so this solution doesn't work for me.
Then i found another approach which is almost getting there for me here: Javascript: negative lookbehind equivalent?. I altered the answer of Kamil Szot to fit my needs: ((?!([\\])).|^)(\*.*?((?!([\\])).|^)\*) Unfortuantely it doesn't work when two asterisks ** are in a row.
I have already invested a lot of hours and can't seem to get it right, any help is appreciated!
An example with what i have so far is here: https://www.regexpal.com/?fam=117350
I need to use the regexp in a string.replace call (str.replace(regexp|substr, newSubStr|function); so that I can wrap the found strings with a span element of a specific class.
You can use this regular expression:
(?:\\.|[^*])*\*((?:\\.|[^*])*)\*
Your code should then only take the (only) capture group of each match.
Like this:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /(?:\\.|[^*])*\*((?:\\.|[^*])*)\*/g
var match;
while (match = regex.exec(str)) {
console.log(match[1]);
}
If you need to replace the matches, for instance to wrap the matches in a span tag while also dropping the asterisks, then use two capture groups:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /((?:\\.|[^*])*)\*((?:\\.|[^*])*)\*/g
var result = str.replace(regex, "$1<span>$2</span>");
console.log(result);
One thing to be careful with: when you use string literals in JavaScript tests, escape the backslash (with another backslash). If you don't do that, the string actually will not have a backslash! To really get the backslash in the in-memory string, you need to escape the backslash.
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr.match(/\*(\\.)*t(\\.)*e(\\.)*s(\\.)*t(\\.)*\*/g).map(m => m.substr(1, m.length-2));
console.log(m);
More generic code:
const prepareRegExp = (word, delimiter = '\\*') => {
const escaped = '(\\\\.)*';
return new RegExp([
delimiter,
escaped,
[...word].join(escaped),
escaped,
delimiter
].join``, 'g');
};
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr
.match(prepareRegExp('test'))
.map(m => m.substr(1, m.length-2));
console.log(m);
https://instacode.dev/#Y29uc3QgcHJlcGFyZVJlZ0V4cCA9ICh3b3JkLCBkZWxpbWl0ZXIgPSAnXFwqJykgPT4gewogIGNvbnN0IGVzY2FwZWQgPSAnKFxcXFwuKSonOwogIHJldHVybiBuZXcgUmVnRXhwKFsKICAgIGRlbGltaXRlciwKICAgIGVzY2FwZWQsCiAgICBbLi4ud29yZF0uam9pbihlc2NhcGVkKSwKICAgIGVzY2FwZWQsCiAgICBkZWxpbWl0ZXIKICBdLmpvaW5gYCwgJ2cnKTsKfTsKCmNvbnN0IHRlc3RTdHIgPSBgXFwqamRqZGpkZmRmKnRlc3QqZGZzZGZcXCphZGZhc2Rhc2Rhc2QqdGVzdCoqdGVzdFxcKipzZCpgOwpjb25zdCBtID0gdGVzdFN0cgoJLm1hdGNoKHByZXBhcmVSZWdFeHAoJ3Rlc3QnKSkKCS5tYXAobSA9PiBtLnN1YnN0cigxLCBtLmxlbmd0aC0yKSk7Cgpjb25zb2xlLmxvZyhtKTs=

remove last part of string following '&&&' with JavaScript Regex

I'm trying to use a regex in JS to remove the last part of a string. This substring starts with &&&, is followed by something not &&&, and ends with .pdf.
So, for example, the final regex should take a string like:
parent&&&child&&&grandchild.pdf
and match
parent&&&child
I'm not that great with regex's, so my best effort has been something like:
.*?(?:&&&.*\.pdf)
Which matches the whole string. Can anyone help me out?
You may use this greedy regex either in replace or in match:
var s = 'parent&&&child&&&grandchild.pdf';
// using replace
var r = s.replace(/(.*)&&&.*\.pdf$/, '$1');
console.log(r);
//=> parent&&&child
// using match
var m = s.match(/(.*)&&&.*\.pdf$/)
if (m) {
console.log(m[1]);
//=> parent&&&child
}
By using greedy pattern .* before &&& we make sure to match **last instance of &&& in input.
You want to remove the last portion, so replace it
var str = "parent&&&child&&&grandchild.pdf"
var result = str.replace(/&&&[^&]+\.pdf$/, '')
console.log(result)

RegExp to filter characters after the last dot

For example, I have a string "esolri.gbn43sh.earbnf", and I want to remove every character after the last dot(i.e. "esolri.gbn43sh"). How can I do so with regular expression?
I could of course use non-RegExp way to do it, for example:
"esolri.gbn43sh.earbnf".slice("esolri.gbn43sh.earbnf".lastIndexOf(".")+1);
But I want a regular expression.
I tried /\..*?/, but that remove the first dot instead.
I am using Javascript. Any help is much appreciated.
I would use standard js rather than regex for this one, as it will be easier for others to understand your code
var str = 'esolri.gbn43sh.earbnf'
console.log(
str.slice(str.lastIndexOf('.') + 1)
)
Pattern Matching
Match a dot followed by non-dots until the end of string
let re = /\.[^.]*$/;
Use this with String.prototype.replace to achieve the desired output
'foo.bar.baz'.replace(re, ''); // 'foo.bar'
Other choices
You may find it is more efficient to do a simple substring search for the last . and then use a string slicing method on this index.
let str = 'foo.bar.baz',
i = str.lastIndexOf('.');
if (i !== -1) // i = -1 means no match
str = str.slice(0, i); // "foo.bar"

Replace all character matches that are not escaped with backslash

I am using regex to replace ( in other regexes (or regexs?) with (?: to turn them into non-matching groups. My expression assumes that no (?X structures are used and looks like this:
(
[^\\] - Not backslash character
|^ - Or string beginning
)
(?:
[\(] - a bracket
)
Unfortunatelly this doesn't work in case that there are two matches next to each other, like in this case: how((\s+can|\s+do)(\s+i)?)?
With lookbehinds, the solution is easy:
/(?<=[^\\]|^)[\(]/g
But javascript doesn't support lookbehinds, so what can I do? My searches didn't bring any easy universal lookbehind alternative.
Use lookbehind through reversal:
function revStr(str) {
return str.split('').reverse().join('');
}
var rx = /[(](?=[^\\]|$)/g;
var subst = ":?(";
var data = "how((\\s+can|\\s+do)(\\s+i)?)?";
var res = revStr(revStr(data).replace(rx, subst));
document.getElementById("res").value = res;
<input id="res" />
Note that the regex pattern is also reversed so that we could use a look-ahead instead of a look-behind, and the substitution string is reversed, too. It becomes too tricky with longer regexps, but in this case, it is still not that unreadable.
One option is to do a two-pass replacement, with a token (I like unicode for this, as it's unlikely to appear elsewhere):
var s = 'how((\\s+can|\\s+do)(\\s+i)?)?';
var token = "\u1234";
// Look for the character preceding the ( you want
// to replace. We'll add the token after it.
var patt1 = /([^\\])(?=\()/g;
// The second pattern looks for the token and the (.
// We'll replace both with the desired string.
var patt2 = new RegExp(token + '\\(', 'g');
s = s.replace(patt1, "$1" + token).replace(patt2, "(?:");
console.log(s);
https://jsfiddle.net/48e75wqz/1/
(EDITED)
string example:
how((\s+can|\s+do)(\s+i)?)?
one line solution:
o='how((\\s+can|\\s+do)(\\s+i)?)?';
alert(o.replace(/\\\(/g,9e9).replace(/\(/g,'(?:').replace(/90{9}/g,'\\('))
result:
how(?:(?:\s+can|\s+do)(?:\s+i)?)?
and of course it works with strings like how((\s+\(can\)|\s+do)(\s+i)?)?

What javascript RegEx will match all instances of my mergefields?

I need a RegEx pattern to match any/all instances that look like {!(.*)}
I have tried the following:
ie.
var mergefield_array = value.match(patt);
where value = '{!lat},{!lng}'
and patt = /{!(.*)}/
it returns a single result: {!lat},{!lng}
but I want it to return two matches in this case ('{!lat}' and '{!lng}').
How do I do it?
Colin G
Your regex is greedy in nature due to use of .* hence matching first { to last }, grabbing anything on the way.
To fix you can make it non-greedy:
patt = /{!(.*?)}/g
Or use negation:
patt = /{!([^}]*)}/g

Categories

Resources