RegExp using regex + variables - javascript

I know there are many questions about variables in regex. Some of them for instance:
concat variable in regexp pattern
Variables in regexp
How to properly escape characters in regexp
Matching string using variable in regular expression with $ and ^
Unfortunately none of them explains in detail how to escape my RegExp.
Let's say I want to find all files that have this string before them:
file:///storage/sdcard0/
I tried this with regex:
(?:file:\/\/\/storage\/sdcard0\(.*))(?:\"|\')
which correctly got my image1.jpg and image2.jpg in certain json file. (tried with http://regex101.com/#javascript)
For the life of me I can't get this to work inside JS. I know you should use RegExp to solve this, but I'm having issues.
var findStr = "file:///storage/sdcard0/";
var regex = "(?:"+ findStr +"(.*))(?:\"|\')";
var re = new RegExp(regex,"g");
var result = <mySearchStringVariable>.match(re);
With this I get 1 result and it's wrong (bunch of text). I reckon I should escape this as said all over the web.. I tried to escape findStr with both functions below and the result was the same. So I thought OK I need to escape some chars inside regex also.
I tried to manually escape them and the result was no matches.
I tried to escape the whole regex variable before passing it to RegExp constructor and the result was the same: no matches.
function quote(regex) {
return regex.replace(/([()[{*+.$^\\|?])/g, '\\$1');
}
function escapeRegExp(str) {
return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
What the hell am I doing wrong, please?
Is there any good documentation on how to write RegExp with variables in it?

All I needed to do was use LAZY instead of greedy with
var regex = "(?:"+ findStr +"(.*?))(?:\"|\')"; // added ? in (.*?)

Related

Javascipt regex to get string between two characters except escaped without lookbehind

I am looking for a specific javascript regex without the new lookahead/lookbehind features of Javascript 2018 that allows me to select text between two asterisk signs but ignores escaped characters.
In the following example only the text "test" and the included escaped characters are supposed to be selected according the rules above:
\*jdjdjdfdf*test*dfsdf\*adfasdasdasd*test**test\**sd* (Selected: "test", "test", "test\*")
During my research I found this solution Regex, everything between two characters except escaped characters /(?<!\\)(%.*?(?<!\\)%)/ but it uses negative lookbehinds which is supported in javascript 2018 but I need to support IE11 as well, so this solution doesn't work for me.
Then i found another approach which is almost getting there for me here: Javascript: negative lookbehind equivalent?. I altered the answer of Kamil Szot to fit my needs: ((?!([\\])).|^)(\*.*?((?!([\\])).|^)\*) Unfortuantely it doesn't work when two asterisks ** are in a row.
I have already invested a lot of hours and can't seem to get it right, any help is appreciated!
An example with what i have so far is here: https://www.regexpal.com/?fam=117350
I need to use the regexp in a string.replace call (str.replace(regexp|substr, newSubStr|function); so that I can wrap the found strings with a span element of a specific class.
You can use this regular expression:
(?:\\.|[^*])*\*((?:\\.|[^*])*)\*
Your code should then only take the (only) capture group of each match.
Like this:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /(?:\\.|[^*])*\*((?:\\.|[^*])*)\*/g
var match;
while (match = regex.exec(str)) {
console.log(match[1]);
}
If you need to replace the matches, for instance to wrap the matches in a span tag while also dropping the asterisks, then use two capture groups:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /((?:\\.|[^*])*)\*((?:\\.|[^*])*)\*/g
var result = str.replace(regex, "$1<span>$2</span>");
console.log(result);
One thing to be careful with: when you use string literals in JavaScript tests, escape the backslash (with another backslash). If you don't do that, the string actually will not have a backslash! To really get the backslash in the in-memory string, you need to escape the backslash.
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr.match(/\*(\\.)*t(\\.)*e(\\.)*s(\\.)*t(\\.)*\*/g).map(m => m.substr(1, m.length-2));
console.log(m);
More generic code:
const prepareRegExp = (word, delimiter = '\\*') => {
const escaped = '(\\\\.)*';
return new RegExp([
delimiter,
escaped,
[...word].join(escaped),
escaped,
delimiter
].join``, 'g');
};
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr
.match(prepareRegExp('test'))
.map(m => m.substr(1, m.length-2));
console.log(m);
https://instacode.dev/#Y29uc3QgcHJlcGFyZVJlZ0V4cCA9ICh3b3JkLCBkZWxpbWl0ZXIgPSAnXFwqJykgPT4gewogIGNvbnN0IGVzY2FwZWQgPSAnKFxcXFwuKSonOwogIHJldHVybiBuZXcgUmVnRXhwKFsKICAgIGRlbGltaXRlciwKICAgIGVzY2FwZWQsCiAgICBbLi4ud29yZF0uam9pbihlc2NhcGVkKSwKICAgIGVzY2FwZWQsCiAgICBkZWxpbWl0ZXIKICBdLmpvaW5gYCwgJ2cnKTsKfTsKCmNvbnN0IHRlc3RTdHIgPSBgXFwqamRqZGpkZmRmKnRlc3QqZGZzZGZcXCphZGZhc2Rhc2Rhc2QqdGVzdCoqdGVzdFxcKipzZCpgOwpjb25zdCBtID0gdGVzdFN0cgoJLm1hdGNoKHByZXBhcmVSZWdFeHAoJ3Rlc3QnKSkKCS5tYXAobSA9PiBtLnN1YnN0cigxLCBtLmxlbmd0aC0yKSk7Cgpjb25zb2xlLmxvZyhtKTs=

Rewriting javascript regex in php when regex have escapes

I'm trying to write my regex as string (it's part of my S-Expression tokenizer that first split on string, regular expressions and lisp comments and then tokenize stuff between), it works in https://regex101.com/r/nH4kN6/1/ but have problem to write it as string for php.
My JavaScript regex look like this:
var pre_parse_re = /("(?:\\[\S\s]|[^"])*"|\/(?! )[^\/\\]*(?:\\[\S\s][^\/\\]*)*\/[gimy]*(?=\s|\(|\)|$)|;.*)/g;
I've tried to write this regex in php (the one from Regex101 was inside single quote).
$pre_parse_re = "%(\"(?:\\[\\S\\s]|[^\"])*\"|/(?! )[^/\\]*(?:\\[\\S\\s][^/\\]*)*/[gimy]*(?=\\s|\\(|\\)|$)|;.*)%";
My input
'(";()" /;;;/g baz); (baz quux)'
when called:
$parts = preg_split($pre_parse_re, $str, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
it should create same array as in Regex101 (3 matches and stuff between) but it keep splitting on first semicolon inside regex /;;;/g
I think your escaping might be incorrect. Try this regex instead:
$pre_parse_re = "%(\"(?:\\\\[\\\\S\\\\s]|[^\"])*\"|\/(?! )[^\/\\\\]*(?:\\\\[\S\s][^\/\\\\]*)*\/[gimy]*(?=\s|\(|\)|$)|;.*)%";
Using preg_split might also return more than the capturing groups that you want, so you could also change to use this if you just want the 3 matches.
$parts;
preg_match_all($pre_parse_re, $str, $parts, PREG_SET_ORDER, 0);

Using search and replace with regex in javascript

I have a regular expression that I have been using in notepad++ for search&replace to manipulate some text, and I want to incorporate it into my javascript code. This is the regular expression:
Search
(?-s)(.{150,250}\.(\[\d+\])*)\h+ and replace with \1\r\n\x20\x20\x20
In essence creating new paragraphs for every 150-250 words and indenting them.
This is what I have tried in JavaScript. For a text area <textarea name="textarea1" id="textarea1"></textarea>in the HTML. I have the following JavaScript:
function rep1() {
var re1 = new RegExp('(?-s)(.{150,250}\.(\[\d+\])*)\h+');
var re2 = new RegExp('\1\r\n\x20\x20\x20');
var s = document.getElementById("textarea1").value;
s = string.replace(re1, re2);
document.getElementById("textarea1").value = s;
}
I have also tried placing the regular expressions directly as arguments for string.replace() but that doesn't work either. Any ideas what I'm doing wrong?
Several issues:
JavaScript does not support (?-s). You would need to add modifiers separately. However, this is the default setting in JavaScript, so you can just leave it out. If it was your intention to let . also match line breaks, then use [^] instead of . in JavaScript regexes.
JavaScript does not support \h -- the horizontal white space. Instead you could use [^\S\r\n].
When passing a string literal to new RegExp be aware that backslashes are escape characters for the string literal notation, so they will not end up in the regex. So either double them, or else use JavaScript's regex literal notation
In JavaScript replace will only replace the first occurrence unless you provided the g modifier to the regular expression.
The replacement (second argument to replace) should not be a regex. It should be a string, so don't apply new RegExp to it.
The backreferences in the replacement string should be of the $1 format. JavaScript does not support \1 there.
You reference string where you really want to reference s.
This should work:
function rep1() {
var re1 = /(.{150,250}\.(\[\d+\])*)[^\S\r\n]+/g;
var re2 = '$1\r\n\x20\x20\x20';
var s = document.getElementById("textarea1").value;
s = s.replace(re1, re2);
document.getElementById("textarea1").value = s;
}

javascript, declare associative array of regex expressions

How to declare associative array of regex?
This is not working
var Validators = {
url : /^http(s?)://((\w+\.)?\w+\.\w+|((2[0-5]{2}|1[0-9]{2}|[0-9]{1,2})\.){3}(2[0-5]{2}|1[0-9]{2}|[0-9]{1,2}))(/)?$/gm
};
EDITED: Now working!
This will be valid in JS (like # operator in C#)
url : `/^http(s?)://((\w+\.)?\w+\.\w+|((2[0-5]{2}|1[0-9]{2}|[0-9]{1,2})\.){3}(2[0-5]{2}|1[0-9]{2}|[0-9]{1,2}))(/)?$/gm`
However, will still not work due to double escape, one in JS and other in Regex. If expression is small, perhaps naked eye can manually escape for both JS and Regex. My brain just can't :)
In order to use strings as tested on regex101.com for example, all required strings should be declared as 'row' like this:
var exp = String.raw`^(http(s?):\/\/)?(((www\.)?[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$`;
var strings = [
String.raw`http://www.goo gle.com`,
String.raw`http://www.google.com`,
];
Wrap it with new RegExp() and escape slashes
var Validators = {
url : new RegExp( /^http(s?):\/\/((\w+\.)?\w+\.\w+|((2[0-5]{2}|1[0-9]{2}|[0-9]{1,2})\.){3}(2[0-5]{2}|1[0-9]{2}|[0-9]{1,2}))(\/)?$/gm )
};
Your regex has forward slashes in it. This symbol needs to be escaped because it is supposed to indicate the start and end of the expression. Try \/.

Find file sequence with RegExp in Javascript

I have a simple question:
How do I use RegExp in Javascript to find strings that matches this filter:
*[0-9].png in order to filter out file sequences.
For example:
bird001.png
bird002.png
bird003.png
or
abc_1.png
abc_2.png
Should ignore strings like abc_1b.png and abc_abc.png
I'm going to use it in a getFiles function.
var regExp = new RegExp(???);
var files = dir.getFiles(regExp);
Thanks in advance!
EDIT:
If I have a defined string, let's say
var beginningStr = "bird";
How can I check if a string matches the filter
beginningStr[0-9].png
? And ideally beginningString without case sensitivity. So that the filter would allow Bird01 and bird02.
Thanks again!
Anything followed by [0-9] and ened by .png:
/^.*[0-9]\.png$/i
Or simply without begining (regex will find it itself):
/[0-9]\.png$/i
If I understood correctly, you need a regex that matches files with names which:
Begin with letters a-z, A-Z
Optionally followed with single _
Followed by one or more digits
Ending with .png
Regex for this is [a-zA-Z]_{0,1}+\d+\.png
You could try online regex builders which offer immediate explanation of what you write.
If I understood correctly,
var re = /\s[a-zA-Z]*[0-9]+\.png/g;
var filesArr = str.match(re);
filesArr.sort();// you can use own sort function
Please specify what is the dir variable

Categories

Resources