What does the regular expression, /(?!^)/g mean? - javascript

What does the regular expression, /(?!^)/g mean?
I can see that from here x(?!y): Matches x only if x is not followed by y.
That explains, if ?! is used before any set of characters, it checks for the not followed by condition.
But we have, ?!^. So, if we try to say it in words, it would probably mean not followed by negate. That really does not make me guess a probable statement for it. I mean negate of what?
I'm still guessing and could not reach a fruitful conclusion.
Help is much appreciated.
Thanks!

Circumflex ^ only implies negation in a character class [^...] (when comes as first character in class). Outside of it it means start of input string or line. A negative lookahead that contains ^ only, means match shouldn't be found at start of input string or line.
See it in action

It returns
all matches... (/g, global flag)
...which are not followed by... (x?!y, negative lookahead where x is the empty string)
...the position at the start of the string (^)
That is, any position between two characters except for the position at the start of the string, as you can see here.
This regex is useful, thus, for detecting empty strings (after all, applying the regex to an empty string will return no matches). Empty strings may be the result of splitting strings, and you probably don't want to do anything with them.

Related

Ambiguity in regex in javascript

var a = 'a\na'
console.log(a.match(/.*/g)) // ['a', '', 'a', '']
Why are there two empty strings in the result?
Let's say if there are empty strings, why isn't there one at beginning and at the end of each line as well, hence 4 empty strings?
I am not looking for how to select 'a's but just want to understand the presence of the empty strings.
The best explanation I can offer for the following:
'ab\na'.match(/.*/g)
["ab", "", "a", ""]
Is that JavaScript's match function uses dot not in DOT ALL mode, meaning that dot does not match across newlines. When the .* pattern is applied to ab\na, it first matches ab, then stops at the newline. The newline generates an empty match. Then, a is matched, and then for some reason the end of the string matches another empty match.
If you just want to extract the non whitespace content from each line, then you may try the following:
print('ab\na'.match(/.+/g))
ab,a
Let's say if there are empty strings, why isn't there one at beginning
and at the end...
.* applies greediness. It swallows a complete line asap. By a line I mean everything before a line break. When it encounters end of a line, it matches again due to star quantifier.
If you want 4 you may add ? to star quantifier and make it lazy .*? but yet this regex has different result in different flavors because of the way they handle zero-length matches.
You can try .*? with both PCRE and JS engines in regex101 and see the differences.
Question:
You may ask why does engine try to find a match at the end of line while whole thing is already matched?
Answer:
It's for the reason that we have a definition for end of lines and end of strings. So not whole thing is matched. There is a left position that has a chance to be matched and we have it with star quantifier.
This left position is end of line here which is a true match for $ when m flag is on. A . doesn't match this position but a .* or .*? match because they would be a pattern for zero-length positions too as any X-STAR patterns like \d*, \D*, a* or b?
Star operator * means there can be any number of ocurrences (even 0 ocurrences). With the expression used, an empty string can be a match. Not sure what are you looking for, but maybe a + operator (1 or more ocurrences) will be better?
Want to add some more info, regex use a greedy algorithm by default (in some languages you can override this behaviour), so it will pick as much of the text as it can. In this case, it will pick the a, because it can be processed with the regex, so the "\na" is still there. "\n" does not match the ".", so the only available option is the empty string. Then, we will process the next line, and again, we can match a "a". After this, only the empty string matches the regex.
* Matches the preceding expression 0 or more times.
. matches any single character except the newline character.
That is what official doc says about . and *. So i guess the array you received is something like this:
[ the first "any character" of the first line, following "nothing", the first "any character" of the second line, following "nothing"]
And the new-line character is just ignored

Regex finding the last string that doesnt contain a number

Usually in my system i have the following string:
http://localhost/api/module
to find out the last part of the string (which is my route) ive been using the following:
/[^\/]+$/g
However there may be cases where my string looks abit different such as:
http://localhost/api/module/123
Using the above regex it would then return 123. When my String looks like this i know that the last part will always be a number. So my question is how do i make sure that i can always find the last string that does not contain a number?
This is what i came up with which really stricty matches only module for the following lines:
http://localhost/api/module
http://localhost/api/module/123
http://localhost/api/module/123a
http://localhost/api/module/a123
http://localhost/api/module/a123a
http://localhost/api/module/1a3
(?!\w*\d\w*)[^\/][a-zA-Z]+(?=\/\w*\d+\w*|$)
Explanation
I basically just extended your expression with negative lookahead and lookbehind which basically matches your expression given both of the following conditions is true:
(?!\w*\d\w*) May contain letters, but no digits
[a-zA-Z]+ Really, truly only consists of one or more letters (was needed)
(?=\/\d+|$)The match is either followed by a slash, followed by digits or the end of the line
See this in action in my sample at Regex101.
partYouWant = urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
Here it is in action:
urlString="http://localhost/api/module/123"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
urlString="http://localhost/api/module"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
It just uses a capture expression to find the last non-numeric part.
It's going to do this too, not sure if this is what you want:
urlString="http://localhost/api/module/123/456"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
/([0-9])\w+/g
That would select the numbers. You could use it remove that part from the url. What language are you using it for ?

How to make this javascript regex greedy?

I am having trouble making zero or one '?' give preference to one occurrence over zero, in javascript.
Ex. This is my regex: (=1)?
This is my string, str: abcd=1
when i do regex.exec(str) I get the following returned: ["",undefined]. It looks like it's choosing the zero length match in the beginning of my string. Is there a way to get it to choose '=1'? Possibly this may work differently in other languages but I'm currently using javascript, and this seems to be the case.
The expression (=1)? will give precedence to one occurrence over zero, but the regular expression engine will always attempt to match as early in the string as possible. So starting at the first character in the string, first it will try to match =1 and fail, and then because of the ? it will match the empty string.
I think the following expression is most similar to what your intention is:
(?:.*(=1))?
This will put =1 into the first capture group if it is anywhere in the string, but every string will still be matched because of the ? making the non-capturing group optional.
By defaut ? is greedy in javascript. Your problem is somewhere else.
aside note: to have ? lazy you must write ?? (like other quantifiers)
(=1)* will match =1=1=1=1=1=1=1=1
If you're looking to match all digits after the = then use
abcde=(\d+)
This will place all the digits into capture group 1.

JavaScript and regular expressions: get the number of parenthesized subpattern

I have to get the number of parenthesized substring matches in a regular expression:
var reg=/([A-Z]+?)(?:[a-z]*)(?:\([1-3]|[7-9]\))*([1-9]+)/g,
nbr=0;
//Some code
alert(nbr); //2
In the above example, the total is 2: only the first and the last couple of parentheses will create grouping matches.
How to know this number for any regular expressions?
My first idea was to check the value of RegExp.$1 to RegExp.$9, but even if there are no corresponding parenthseses, these values are not null, but empty string...
I've also seen the RegExp.lastMatch property, but this one represents only the value of the last matched characters, not the corresponding number.
So, I've tried to build another regular expression to scan any RegExp and count this number, but it's quite difficult...
Do you have a better solution to do that?
Thanks in advance!
Javascripts RegExp.match() method returns an Array of matches. You might just want to check the length of that result array.
var mystr = "Hello 42 world. This 11 is a string 105 with some 2 numbers 55";
var res = mystr.match(/\d+/g);
console.log( res.length );
Well, judging from the code snippet we can assume that the input pattern is always a valid regular expression, because otherwise it would fail before the some code partm right? That makes the task much easier!
Because We just need to count how many starting capturing parentheses there are!
var reg = /([A-Z]+?)(?:[a-z]*)(?:\([1-3]|[7-9]\))*([1-9]+)/g;
var nbr = (' '+reg.source).match(/[^\\](\\\\)*(?=\([^?])/g);
nbr = nbr ? nbr.length : 0;
alert(nbr); // 2
And here is a breakdown:
[^\\] Make sure we don't start the match with an escaping slash.
(\\\\)* And we can have any number of escaped slash before the starting parenthes.
(?= Look ahead. More on this later.
\( The starting parenthes we are looking for.
[^?] Make sure it is not followed by a question mark - which means it is capturing.
) End of look ahead
Why match with look ahead? To check that the parenthes is not an escaped entity, we need to capture what goes before it. No big deal here. We know JS doens't have look behind.
Problem is, if there are two starting parentheses sticking together, then once we capture the first parenthes the second parenthes would have nothing to back it up - its back has already been captured!
So to make sure a parenthes can be the starting base of the next one, we need to exclude it from the match.
And the space added to the source? It is there to be the back of the first character, in case it is a starting parenthes.

Removing Numbers from a String using Javascript

How do I remove numbers from a string using Javascript?
I am not very good with regex at all but I think I can use with replace to achieve the above?
It would actually be great if there was something JQuery offered already to do this?
//Something Like this??
var string = 'All23';
string.replace('REGEX', '');
I appreciate any help on this.
\d matches any number, so you want to replace them with an empty string:
string.replace(/\d+/g, '')
I've used the + modifier here so that it will match all adjacent numbers in one go, and hence require less replacing. The g at the end is a flag which means "global" and it means that it will replace ALL matches it finds, not just the first one.
Just paste this into your address bar to try it out:
javascript:alert('abc123def456ghi'.replace(/\d+/g,''))
\d indicates a character in the range 0-9, and the + indicates one or more; so \d+ matches one or more digits. The g is necessary to indicate global matching, as opposed to quitting after the first match (the default behavior).

Categories

Resources