Rewriting javascript regex in php when regex have escapes - javascript

I'm trying to write my regex as string (it's part of my S-Expression tokenizer that first split on string, regular expressions and lisp comments and then tokenize stuff between), it works in https://regex101.com/r/nH4kN6/1/ but have problem to write it as string for php.
My JavaScript regex look like this:
var pre_parse_re = /("(?:\\[\S\s]|[^"])*"|\/(?! )[^\/\\]*(?:\\[\S\s][^\/\\]*)*\/[gimy]*(?=\s|\(|\)|$)|;.*)/g;
I've tried to write this regex in php (the one from Regex101 was inside single quote).
$pre_parse_re = "%(\"(?:\\[\\S\\s]|[^\"])*\"|/(?! )[^/\\]*(?:\\[\\S\\s][^/\\]*)*/[gimy]*(?=\\s|\\(|\\)|$)|;.*)%";
My input
'(";()" /;;;/g baz); (baz quux)'
when called:
$parts = preg_split($pre_parse_re, $str, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
it should create same array as in Regex101 (3 matches and stuff between) but it keep splitting on first semicolon inside regex /;;;/g

I think your escaping might be incorrect. Try this regex instead:
$pre_parse_re = "%(\"(?:\\\\[\\\\S\\\\s]|[^\"])*\"|\/(?! )[^\/\\\\]*(?:\\\\[\S\s][^\/\\\\]*)*\/[gimy]*(?=\s|\(|\)|$)|;.*)%";
Using preg_split might also return more than the capturing groups that you want, so you could also change to use this if you just want the 3 matches.
$parts;
preg_match_all($pre_parse_re, $str, $parts, PREG_SET_ORDER, 0);

Related

Find and replace complex term in a string using Regex

I Need to wrap any expression in a string (it can be multiline string with many expressions, mixed with regular words) that starts with abc.(efg|xyz).bar. with curly braces.
I'm using find and replace approach using the following Regex:
const MY_REGEX = /"?(?<![a-zA-Z0-9-_])((?:abc\.(efg|xyz\.bar))(?:\.[a-zA-Z0-9-_]*)+)"?/gm
someInput.replace(MY_REGEX, '{{$1}}')
My strategy works fine for simple cases like this:
const input = 'abc.efg.bar.name.first, abc.xyz.role, non.captured.term'
// outputs: {{abc.efg.bar.name.first}}, {{abc.xyz.role}}, non.captured.term
But fails miserably for a complex inputs like this one:
const input = 'abc.xyz.bar.$func(foos[param[name="primary" and bool=true]].param[name="new"].multiValue)'
// outputs: {{abc.xyz.bar.}}$func(foos[param[name="primary" and bool=true]].param[name="new"].multiValue)
// Should be: {{abc.xyz.bar.$func(foos[param[name="primary" and bool=true]].param[name="new"].multiValue)}}
I'm looking for a more robust way or better regex to do it.
Any suggestions?
Based on the examples you gave, it looks like , should be the delimiter between expressions. So (?:\.[a-zA-Z0-9-_]*)+) to [^,]* to match everything until the next ,.
const MY_REGEX = /"?(?<![a-zA-Z0-9-_])((?:abc\.(efg|xyz\.bar))[^,]*)"?/gm;
console.log('abc.xyz.bar.$func(foos[param[name="primary" and bool=true]].param[name="new"].multiValue)'.replace(MY_REGEX, '{{$1}}'));
console.log('abc.efg.bar.name.first, abc.xyz.role, non.captured.term'.replace(MY_REGEX, '{{$1}}'));

javascript, declare associative array of regex expressions

How to declare associative array of regex?
This is not working
var Validators = {
url : /^http(s?)://((\w+\.)?\w+\.\w+|((2[0-5]{2}|1[0-9]{2}|[0-9]{1,2})\.){3}(2[0-5]{2}|1[0-9]{2}|[0-9]{1,2}))(/)?$/gm
};
EDITED: Now working!
This will be valid in JS (like # operator in C#)
url : `/^http(s?)://((\w+\.)?\w+\.\w+|((2[0-5]{2}|1[0-9]{2}|[0-9]{1,2})\.){3}(2[0-5]{2}|1[0-9]{2}|[0-9]{1,2}))(/)?$/gm`
However, will still not work due to double escape, one in JS and other in Regex. If expression is small, perhaps naked eye can manually escape for both JS and Regex. My brain just can't :)
In order to use strings as tested on regex101.com for example, all required strings should be declared as 'row' like this:
var exp = String.raw`^(http(s?):\/\/)?(((www\.)?[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$`;
var strings = [
String.raw`http://www.goo gle.com`,
String.raw`http://www.google.com`,
];
Wrap it with new RegExp() and escape slashes
var Validators = {
url : new RegExp( /^http(s?):\/\/((\w+\.)?\w+\.\w+|((2[0-5]{2}|1[0-9]{2}|[0-9]{1,2})\.){3}(2[0-5]{2}|1[0-9]{2}|[0-9]{1,2}))(\/)?$/gm )
};
Your regex has forward slashes in it. This symbol needs to be escaped because it is supposed to indicate the start and end of the expression. Try \/.

RegExp using regex + variables

I know there are many questions about variables in regex. Some of them for instance:
concat variable in regexp pattern
Variables in regexp
How to properly escape characters in regexp
Matching string using variable in regular expression with $ and ^
Unfortunately none of them explains in detail how to escape my RegExp.
Let's say I want to find all files that have this string before them:
file:///storage/sdcard0/
I tried this with regex:
(?:file:\/\/\/storage\/sdcard0\(.*))(?:\"|\')
which correctly got my image1.jpg and image2.jpg in certain json file. (tried with http://regex101.com/#javascript)
For the life of me I can't get this to work inside JS. I know you should use RegExp to solve this, but I'm having issues.
var findStr = "file:///storage/sdcard0/";
var regex = "(?:"+ findStr +"(.*))(?:\"|\')";
var re = new RegExp(regex,"g");
var result = <mySearchStringVariable>.match(re);
With this I get 1 result and it's wrong (bunch of text). I reckon I should escape this as said all over the web.. I tried to escape findStr with both functions below and the result was the same. So I thought OK I need to escape some chars inside regex also.
I tried to manually escape them and the result was no matches.
I tried to escape the whole regex variable before passing it to RegExp constructor and the result was the same: no matches.
function quote(regex) {
return regex.replace(/([()[{*+.$^\\|?])/g, '\\$1');
}
function escapeRegExp(str) {
return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
What the hell am I doing wrong, please?
Is there any good documentation on how to write RegExp with variables in it?
All I needed to do was use LAZY instead of greedy with
var regex = "(?:"+ findStr +"(.*?))(?:\"|\')"; // added ? in (.*?)

Add regex to ignore /js /img and /css

I have this regular expression
// Look for /en/ or /en-US/ or /en_US/ on the URL
var matches = req.url.match( /^\/([a-zA-Z]{2,3}([-_][a-zA-Z]{2})?)(\/|$)/ );
Now with the above regular express it will cause the problem with the URL such as:
http://mydomain.com/css/bootstrap.css
or
http://mydomain.com/js/jquery.js
because my regular expression is to strip off 2-3 characters from A-Z or a-z
My question is how would I add in to this regular expression to not strip off anything with
js or img or css or ext
Without impacting the original one.
I'm not so expert on regular expression :(
Negative lookahead?
var matches = req.url.match(/^\/(?!(js|css))([a-zA-Z]{2,3}([-_][a-zA-Z]{2})?)(\/|$)/ );
\ not followed by js or css
First of all you have not defined what exactly you are searching for.
Define an array with lowercased common language codes (Common language codes)
This way you'll know what to look for.
After that, convert your url to lowercase and replace all '_' with '-' and search for every member of the array in the resulting string using indexOf().
Since you said you're using the regex to replace text, I changed it to a replace function. Also, you forced the regex to match the start of the string; I don't see how it would match anything with that. Anyway, here's my approach:
var result = req.url.replace(/\/([a-z]{2,3}([-_][a-z]{2})?)(?=\/|$)/i,
function(s,t){
switch(t){case"js":case"img":case"css":case"ext":return s;}
return "";
}
);

Javascript string validation using the regex object

I am complete novice at regex and Javascript. I have the following problem: need to check into a textfield the existence of one (1) or many (n) consecutive * (asterisk) character/characters eg. * or ** or *** or infinite (n) *. Strings allowed eg. *tomato or tomato* or **tomato or tomato** or as many(n)*tomato many(n)*. So, far I had tried the following:
var str = 'a string'
var value = encodeURIComponent(str);
var reg = /([^\s]\*)|(\*[^\s])/;
if (reg.test(value) == true ) {
alert ('Watch out your asterisks!!!')
}
By your question it's hard to decipher what you're after... But let me try:
Only allow asterisks at beginning or at end
If you only allow an arbitrary number (at least one) of asterisks either at the beginning or at the end (but not on both sides) like:
*****tomato
tomato******
but not **tomato*****
Then use this regular expression:
reg = /^(?:\*+[^*]+|[^*]+\*+)$/;
Match front and back number of asterisks
If you require that the number of asterisks at the biginning matches number of asterisks at the end like
*****tomato*****
*tomato*
but not **tomato*****
then use this regular expression:
reg = /^(\*+)[^*]+\1$/;
Results?
It's unclear from your question what the results should be when each of these regular expressions match? Are strings that test positive to above regular expressions fine or wrong is on you and your requirements. As long as you have correct regular expressions you're good to go and provide the functionality you require.
I've also written my regular expressions to just exclude asterisks within the string. If you also need to reject spaces or anything else simply adjust the [^...] parts of above expressions.
Note: both regular expressions are untested but should get you started to build the one you actually need and require in your code.
If I understand correctly you're looking for a pattern like this:
var pattern = /\**[^\s*]+\**/;
this won't match strings like ***** or ** ***, but will match ***d*** *d or all of your examples that you say are valid (***tomatos etc).If I misunderstood, let me know and I'll see what I can do to help. PS: we all started out as newbies at some point, nothing to be ashamed of, let alone apologize for :)
After the edit to your question I gather the use of an asterisk is required, either at the beginning or end of the input, but the string must also contain at least 1 other character, so I propose the following solution:
var pattern = /^\*+[^\s*]+|[^\s*]+\*+$/;
'****'.match(pattern);//false
' ***tomato**'.match(pattern);//true
If, however *tomato* is not allowed, you'll have to change the regex to:
var pattern = /^\*+[^\s*]+$|^[^\s*]+\*+$/;
Here's a handy site to help you find your way in the magical world of regular expressions.

Categories

Resources