what is difference between these two syntax in my code - javascript

What is difference between The following syntaxs in regular expression?
Please give an example.
(?=.*\d)
and
.*(?=\d)

The first one is just an assertion, a positive look-ahead saying "there must be zero or more characters followed by a digit." If you match it against a string containing at least one digit, it will tell you whether the assertion is true, but the matched text will just be an empty string.
The second one searches for a match, with an assertion (a positive-lookahead) after the match saying "there must be a digit." The matched text will be the characters before the last digit in the string (including any previous digits, because .* is greedy, so it'll consume digits up until the last one, because the last one is required by the assertion).
Note the difference in the match object results:
var str = "foo42";
test("rex1", /(?=.*\d)/, str);
test("rex2", /.*(?=\d)/, str);
function test(label, rex, str) {
console.log(label, "test result:", rex.test(str));
console.log(label, "match object:", rex.exec(str));
}
Output (for those who can't run snippets):
rex1 test result: true
rex1 match object: [
""
]
rex2 test result: true
rex2 match object: [
"foo4"
]
Notice how the match result in the second case was foo4 (from the string foo42), but blank in the first case.

(?=...) is a positive lookahead. Both of these expressions will match "any text followed by a number". The difference, though, is that (?=...) doesn't "eat" ("capture") any characters as it matches. For practical purposes, if this is the only thing your regex contains, they'll match the same stuff. However, .*(?=\d) would be a more correct expression, unless there's more to it than what you put in the question.
Where it really matters is when you're using capturing groups or where you're using the content of the matched text after running the regular expression:
If you want to capture all text before the number, but not the number itself, and use it after, you could do this:
(.*?(?=\d))
The ? makes the match non-greedy, so it will only match up to the first number. All text leading up to the number will be in the match result as the first group.

Please find the difference below
In detail
.* means matches any character (except newline)
(?=\d) means Positive Lookahead - Assert that the regex below can be matched
\d match a digit [0-9]
(?=.*\d)
CapturingGroup
MatchOnlyIfFollowedBy
Sequence: match all of the followings in order
Repeat
AnyCharacterExcept\n
zero or more times
Digit
.*(?=\d)
Sequence: match all of the followings in order
Repeat
AnyCharacterExcept\n
zero or more times
CapturingGroup
MatchOnlyIfFollowedBy
Digit

Related

Regex match valid Phone Number

I'm quite new to regex, and not sure what I'm doing wrong exactly.
I'm looking for a regex that match the following number format:
Matching requirements:
Must start with either 0 or 3
Must be between 7 to 11 digits
Must not allow ascending digits. e.g. 0123456789, 01234567
Must not allow repeated digits. e.g. 011111111, 3333333333, 0000000000
This is what I came up with:
^(?=(^[0,3]{1}))(?!.*(\d)\1{3,})(?!^(?:0(?=1|$))?(?:1(?=2|$))?(?:2(?=3|$))?(?:3(?=4|$))?(?:4(?=5|$))?(?:5(?=6|$))?(?:6(?=7|$))?(?:7(?=8|$))?(?:8(?=9|$))?9?$).{7,11}$
The above regex fails the No. (4) condition. Not sure why though.
Any help would be appreciated.
Thanks
A few notes about the pattern that you tried
You can omit the {1} and the comma in [0,3]
In the lookahead (?!.*(\d)\1{3,}) the (\d) is the second capturing group because this (?=(^[0,3]{1})) contains the first capturing group so it should be \2 instead of \1
In the lookahead, you can omit the comma in {3,}
In the match itself you use .{7,11} where the dot would match any character except a newline. You could use \d instead to match only digits
You pattern might look like
^(?=(^[03]))(?!.*(\d)\2{3})(?!^(?:0(?=1|$))?(?:1(?=2|$))?(?:2(?=3|$))?(?:3(?=4|$))?(?:4(?=5|$))?(?:5(?=6|$))?(?:6(?=7|$))?(?:7(?=8|$))?(?:8(?=9|$))?9?$)\d{7,11}$
Regex demo
Or leaving out the first lookahead and move that to the match, changing the quantifier to \d{6,10} and repeating capture group \1 instead of \2
^(?!.*(\d)\1{3})(?!(?:0(?=1|$))?(?:1(?=2|$))?(?:2(?=3|$))?(?:3(?=4|$))?(?:4(?=5|$))?(?:5(?=6|$))?(?:6(?=7|$))?(?:7(?=8|$))?(?:8(?=9|$))?9?$)[03]\d{6,10}$
Regex demo
Edit
Based on the comments, the string not having 4 ascending digits:
^(?!.*(\d)\1{3})[03](?!\d*(?:0123|1234|2345|3456|4567|5678|6789))\d{6,10}$
Regex demo
A solution for a JS flavor of PCRE would be
/^[03](?!123456(7(8(9|$)|$)|$))(?!(?<d>.)\k<d>+$)[0-9]{6,10}$/
Explanations
^[03] starts at the beginning of the string, then reads either 0 or 3
(?!123456(7(8(9|$)|$)|$)) makes sure that, after this first char, there is no sequence (if a sequence can be read, then the negative lookahead fails
(?!(?<d>.)\k<d>+$) is another negative lookahead : it ensures that the first char read (flagged d) is not repeated again and again until end of string
[0-9]{6,10}$/ finally reads 6 to 10 digits (first one already read)
A few tests:
"0123456789: No match"
"01234567: No match"
"01234568: No match"
"011111111: No match"
"33333333: No match"
"333333233 is valid"
"042157891023 is valid"
"019856: No match"
"0123451245 is valid"

Javascript RegEx assertion

I have this example:
/(?=\d)(?=[a-z])/.test("3a") which returns false
but this
/(?=\d)(?=.*[a-z])/.test("3a") works.
Can you explain this?
Let me break down what you are doing:
Test string = test("3a")
Example 1: /(?=\d)(?=[a-z])/
(?=\d) is a positive lookahead that the next character is a digit
(?=[a-z]) is a positive lookahead that the next character is in range a-z
This is impossible and will always return false as it is asserting that the next character is both a-z and a digit which it cannot be.
Example 2: /(?=\d)(?=.*[a-z])/
(?=\d) is a positive lookahead that the next character is a digit
(?=.*[a-z]) is a positive lookahead that anywhere in your string after where match starts there is a character is in range a-z
This sees 3a in the test string because starting the match at 3 the next character is a digit and 3a fulfills the .*[a-z] assertion.
It may or may not be important to point out that because these are lookaheads you are not actually matching anything. I don't know what it is you are really trying to do.
If you want to test that there is a-z after a digit you can put it into one assertion:
/(?=\d[a-z])/
Your first pattern
/(?=\d)(?=[a-z])/.test("3a")
is asserting that both a digit and letter occur in the same place. Obviously, this will never be true. On the other hand, your second pattern:
/(?=\d)(?=.*[a-z])/.test("3a")
asserts that a digit occurs, and it also asserts that a single letter occurs anywhere in the string. This matches for an input of 3a.

Regular expression match 0 or exact number of characters

I want to match an input string in JavaScript with 0 or 2 consecutive dashes, not 1, i.e. not range.
If the string is:
-g:"apple" AND --projectName:"grape": it should match --projectName:"grape".
-g:"apple" AND projectName:"grape": it should match projectName:"grape".
-g:"apple" AND -projectName:"grape": it should not match, i.e. return null.
--projectName:"grape": it should match --projectName:"grape".
projectName:"grape": it should match projectName:"grape".
-projectName:"grape": it should not match, i.e. return null.
To simplify this question considering this example, the RE should match the preceding 0 or 2 dashes and whatever comes next. I will figure out the rest. The question still comes down to matching 0 or 2 dashes.
Using -{0,2} matches 0, 1, 2 dashes.
Using -{2,} matches 2 or more dashes.
Using -{2} matches only 2 dashes.
How to match 0 or 2 occurrences?
Answer
If you split your "word-like" patterns on spaces, you can use this regex and your wanted value will be in the first capturing group:
(?:^|\s)((?:--)?[^\s-]+)
\s is any whitespace character (tab, whitespace, newline...)
[^\s-] is anything except a whitespace-like character or a -
Once again the problem is anchoring the regex so that the relevant part isn't completely optionnal: here the anchor ^ or a mandatory whitespace \s plays this role.
What we want to do
Basically you want to check if your expression (two dashes) is there or not, so you can use the ? operator:
(?:--)?
"Either two or none", (?:...) is a non capturing group.
Avoiding confusion
You want to match "zero or two dashes", so if this is your entire regex it will always find a match: in an empty string, in --, in -, in foobar... What will be match in these string will be an empty string, but the regex will return a match.
This is a common source of misunderstanding, so bear in mind the rule that if everything in your regex is optional, it will always find a match.
If you want to only return a match if your entire string is made of zero or two dashes, you need to anchor the regex:
^(?:--)?$
^$ match respectively the beginning and end of the string.
a(-{2})?(?!-)
This is using "a" as an example. This will match a followed by an optional 2 dashes.
Edit:
According to your example, this should work
(?<!-)(-{2})?projectName:"[a-zA-Z]*"
Edit 2:
I think Javascript has problems with lookbehinds.
Try this:
[^-](-{2})?projectName:"[a-zA-Z]*"
Debuggex Demo

How do I need to write this RegEx to match the given test case? (don't match the ending period)

regex:
/#([\S]*?(?=\s)(?!\. ))/g
given string:
'this string has #var.thing.me two strings to be #var. replaced'.replace(/#([\S]*?(?=\s)(?!\. ))/g,function(){return '7';})
expected result:
'this string has 7 two strings to be 7. replaced'
In case you want to make it "better" I'm trying to match Razor Html Encoded Expressions but mind the case about not matching an ending period followed by a space. The test case above shows that with the second (shorter) #var, whereas the first captures as #var.thing.me
Try with following regex:
var input = 'this string has #var.thing.me two strings to be #var. replaced';
input.replace(/(#[a-z][a-z.]+[a-z])/gi, function(){
return '7';
});
This regex (#[a-z]([a-z.]+[a-z])*) matches #, then letter (in case there cannot be dot after #), then letters or dot and letter again at the end.
i modificator allows makes regex case-insensitive.
Your pattern is not restrictive enough i.e., it captures too much. The last #var. (including the dot) in your example string is captured because it is followed by a space (as required by the positive lookahead) which, in addition, is not followed by a dot and a space (as required by the negative lookahead). You can try this pattern:
/#([\S]*?)(?=[.]?\s)/g
It will match the #something substring (which can contain dot characters) both when it is followed by a space (as it happens in the first match of your string) and when it is followed by a dot and a space (as it happens in the second match of your string). Testing it in the chromium browser console it seems to work fine:
> 'this string has #var.thing.me two strings to be #var. replaced'.replace(/#([\S]*?)(?=[.]?\s)/g,function(){return '7';})
"this string has 7 two strings to be 7. replaced"
Try this
#((?!\. )\S)+
See it here at regexr
This matches a # followed by non whitespace characters \S. But it matches the next non whitespace only, if it is not a dot followed by a space. This is ensured by the negative lookahead assertion (?!\. ) before the \S.

Why does this regular expression match?

I'm trying to enlarge my regexp knowledge but I have no clue why the following returns true:
/[A-Z]{2}/.test("ABC")
// returns true
I explicity put {2} in the expression which should mean that only exactly two capital letters match.
According to http://www.regular-expressions.info/repeat.html:
Omitting both the comma and max tells the engine to repeat the token exactly min times.
What am I misunderstanding here?
You must anchor the regex using ^ and $ to indicate the start and end of the string.
/^[A-Z]{2}$/.test("ABC")
// returns false
Your current regex matches the "AB" part of the string.
It's matching AB, the first two letters of ABC.
To do an entire match, use the ^ and $ anchors:
/^[A-Z]{2}$/.test("ABC")
This matches an entire string of exactly 2 capital letters.
You should use ^[A-Z]{2}$ to match only the whole string rather than parts of it. In your sample, the regex matches AB - which are indeed two capital letters in a row.
you are missing ^ and $ characters in your regexp - beginning of the string and end of the string. Because they are missing your regular expression says "2 characters", but not "only two characters", so its matching either "AB" or "BC" in your string...
The doc don't lie :)
Omitting both the comma and max tells the engine to repeat the token exactly min times.
It says min times not max times

Categories

Resources