Why does this regular expression match? - javascript

I'm trying to enlarge my regexp knowledge but I have no clue why the following returns true:
/[A-Z]{2}/.test("ABC")
// returns true
I explicity put {2} in the expression which should mean that only exactly two capital letters match.
According to http://www.regular-expressions.info/repeat.html:
Omitting both the comma and max tells the engine to repeat the token exactly min times.
What am I misunderstanding here?

You must anchor the regex using ^ and $ to indicate the start and end of the string.
/^[A-Z]{2}$/.test("ABC")
// returns false
Your current regex matches the "AB" part of the string.

It's matching AB, the first two letters of ABC.
To do an entire match, use the ^ and $ anchors:
/^[A-Z]{2}$/.test("ABC")
This matches an entire string of exactly 2 capital letters.

You should use ^[A-Z]{2}$ to match only the whole string rather than parts of it. In your sample, the regex matches AB - which are indeed two capital letters in a row.

you are missing ^ and $ characters in your regexp - beginning of the string and end of the string. Because they are missing your regular expression says "2 characters", but not "only two characters", so its matching either "AB" or "BC" in your string...

The doc don't lie :)
Omitting both the comma and max tells the engine to repeat the token exactly min times.
It says min times not max times

Related

Ambiguity in regex in javascript

var a = 'a\na'
console.log(a.match(/.*/g)) // ['a', '', 'a', '']
Why are there two empty strings in the result?
Let's say if there are empty strings, why isn't there one at beginning and at the end of each line as well, hence 4 empty strings?
I am not looking for how to select 'a's but just want to understand the presence of the empty strings.
The best explanation I can offer for the following:
'ab\na'.match(/.*/g)
["ab", "", "a", ""]
Is that JavaScript's match function uses dot not in DOT ALL mode, meaning that dot does not match across newlines. When the .* pattern is applied to ab\na, it first matches ab, then stops at the newline. The newline generates an empty match. Then, a is matched, and then for some reason the end of the string matches another empty match.
If you just want to extract the non whitespace content from each line, then you may try the following:
print('ab\na'.match(/.+/g))
ab,a
Let's say if there are empty strings, why isn't there one at beginning
and at the end...
.* applies greediness. It swallows a complete line asap. By a line I mean everything before a line break. When it encounters end of a line, it matches again due to star quantifier.
If you want 4 you may add ? to star quantifier and make it lazy .*? but yet this regex has different result in different flavors because of the way they handle zero-length matches.
You can try .*? with both PCRE and JS engines in regex101 and see the differences.
Question:
You may ask why does engine try to find a match at the end of line while whole thing is already matched?
Answer:
It's for the reason that we have a definition for end of lines and end of strings. So not whole thing is matched. There is a left position that has a chance to be matched and we have it with star quantifier.
This left position is end of line here which is a true match for $ when m flag is on. A . doesn't match this position but a .* or .*? match because they would be a pattern for zero-length positions too as any X-STAR patterns like \d*, \D*, a* or b?
Star operator * means there can be any number of ocurrences (even 0 ocurrences). With the expression used, an empty string can be a match. Not sure what are you looking for, but maybe a + operator (1 or more ocurrences) will be better?
Want to add some more info, regex use a greedy algorithm by default (in some languages you can override this behaviour), so it will pick as much of the text as it can. In this case, it will pick the a, because it can be processed with the regex, so the "\na" is still there. "\n" does not match the ".", so the only available option is the empty string. Then, we will process the next line, and again, we can match a "a". After this, only the empty string matches the regex.
* Matches the preceding expression 0 or more times.
. matches any single character except the newline character.
That is what official doc says about . and *. So i guess the array you received is something like this:
[ the first "any character" of the first line, following "nothing", the first "any character" of the second line, following "nothing"]
And the new-line character is just ignored

what is difference between these two syntax in my code

What is difference between The following syntaxs in regular expression?
Please give an example.
(?=.*\d)
and
.*(?=\d)
The first one is just an assertion, a positive look-ahead saying "there must be zero or more characters followed by a digit." If you match it against a string containing at least one digit, it will tell you whether the assertion is true, but the matched text will just be an empty string.
The second one searches for a match, with an assertion (a positive-lookahead) after the match saying "there must be a digit." The matched text will be the characters before the last digit in the string (including any previous digits, because .* is greedy, so it'll consume digits up until the last one, because the last one is required by the assertion).
Note the difference in the match object results:
var str = "foo42";
test("rex1", /(?=.*\d)/, str);
test("rex2", /.*(?=\d)/, str);
function test(label, rex, str) {
console.log(label, "test result:", rex.test(str));
console.log(label, "match object:", rex.exec(str));
}
Output (for those who can't run snippets):
rex1 test result: true
rex1 match object: [
""
]
rex2 test result: true
rex2 match object: [
"foo4"
]
Notice how the match result in the second case was foo4 (from the string foo42), but blank in the first case.
(?=...) is a positive lookahead. Both of these expressions will match "any text followed by a number". The difference, though, is that (?=...) doesn't "eat" ("capture") any characters as it matches. For practical purposes, if this is the only thing your regex contains, they'll match the same stuff. However, .*(?=\d) would be a more correct expression, unless there's more to it than what you put in the question.
Where it really matters is when you're using capturing groups or where you're using the content of the matched text after running the regular expression:
If you want to capture all text before the number, but not the number itself, and use it after, you could do this:
(.*?(?=\d))
The ? makes the match non-greedy, so it will only match up to the first number. All text leading up to the number will be in the match result as the first group.
Please find the difference below
In detail
.* means matches any character (except newline)
(?=\d) means Positive Lookahead - Assert that the regex below can be matched
\d match a digit [0-9]
(?=.*\d)
CapturingGroup
MatchOnlyIfFollowedBy
Sequence: match all of the followings in order
Repeat
AnyCharacterExcept\n
zero or more times
Digit
.*(?=\d)
Sequence: match all of the followings in order
Repeat
AnyCharacterExcept\n
zero or more times
CapturingGroup
MatchOnlyIfFollowedBy
Digit

Confusing with Regular Expressions repeaing parts

I am really confused with Regular Expressions repeating parts with curly braces. Consider the following example:
var dateTime = /\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}/;
console.log(dateTime.test("30/1/2003 8:45"));
// true
Now if I change 30 to 300000 and 45 to 455555, I'll get true again! Other parts between outer numbers are ok and the result is as expected.
Can somebody help me find the problem?
Thanks.
You're not matching the beginning and end of the String (^ and $) so it's just finding a match anywhere in the String which still happens, and then giving true.
300000/1/2003 8:455555
dd/m/yyyy h:mm
You probably want
/^\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}$/;
Or to be even more exact;
/^(?:0?[1-9]|[12]\d|3[01])\/(?:0?[1-9]|1[0-2])\/\d{4} (?:0?\d|1\d|2[0-3]):[0-5]\d$/;
(?:pattern) non capture group
pattern? the n in pattern is optional
[1-9] character class; a number ranging from 1 to 9
pattern1|pattern2 either pattern1 or pattern2
[12] character class; either 1 or 2
\d same as [0-9]
pattern{4} the n in pattern happens 4 times
You're not matching the whole string, just part of it - using the test function, that's enough to return True.
Try this instead:
/^\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}$/
The ^ anchor matches the beginning of the string, the $ one matches the end of the string.
You can find more useful information about a match by using string.match(regex) rather than regex.test(string).
In this case, you'd see that it's matching 00/1/2003 8:45 because you did not use ^ and $ to mark the start and end of the subject string, respectively.
What is happening is that when you change 30 to 30000, the last two zeroes(00) of 30000 are matched and with 45555 the matching is stopped with first two digits(45) and the rest of the string is not matched.
To stop that from happening, you have to indicate that the string must begin and end with regex specified.
This can be done using anchors. Like this -
var dateTime = /^\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{2}$/;
^ ^

Regexp: numbers and few special characters

I am buried in a RegExp hell and can't find way out, please help me.
I need RegExp that matches only numbers (at least 1 number) and one of this characters: <, >, = (exactly one of them one time).
My reg. expression looks like this:
^[0-9]+$|^[=<>]{1}$
And I thought it should match when my string containts one or more digits and exactly 1 special character defined by me. But it doesn't act correctly. I think there might be problem with my start/end of string definition but Im not sure about that.
Examples that should pass include:
<1
=2
22>
>1
=00123456789
Examples that should not pass this reg. exp.:
<<2
==222
<>=2
I thought it should match when my string containts one or more digits and exactly 1 special character
No, the original pattern matches a string contains one or more digits or exactly 1 special character. For example it will match 123 and = but not 123=.
Try this pattern:
^\d+[=<>]$
This will match that consists of one or more digits, followed by exactly one special character. For example, this will match 123= but not 123 or =.
If you want your special character to appear before the number, use a pattern like this instead:
^[=<>]\d+$
This will match =123 but not 123 or =.
Update
Given the examples you provided, it looks like you want to match any string which contains one or more digits and exactly one special character either at the beginning or the end. In that case use this pattern:
^([=<>]\d+|\d+[=<>])$
This will match <1, =2, 22>, and >1, but not 123 or =.
Just use [0-9]+[=<>]
Here are visualizers of your regexp and this one:
http://www.regexper.com/#%5E%5B0-9%5D%2B%24%7C%5E%5B%3D%3C%3E%5D%7B1%7D%24
http://www.regexper.com/#%5B0-9%5D%2B%5B%3D%3C%3E%5D
Your regex says:
1 or more numbers OR 1 symbol
Also, the ^ and $ means the whole string, not contains. if you want a contains, drop them. I don't know if you have a space between the number and symbol, so put in a conditional space:
[0-9]+\s?[=<>]{1}
This should work.
^[0-9]+[=<>]$
1 or more digits followed by "=<>".
Try this regex:
^\d+[=<>]$
Description
This one:
/^\d+[<>=]$|^[<>=]\d+$/

Use Regex in Javascript to return an array of words

Try though I may, can't figure out how to use regex in javascript to get an array of words (assuming all words are capitalized). For example:
Given this: NowIsAGoodTime
How can you use regex to get: ['Now','Is','A','Good','Time']
Thanks Very Much!!
'NowIsAGoodTime'.match(/[A-Z][a-z]*/g)
[A-Z], [a-z] and [0-9] are character sets defined as ranges. They could be [b-xï], for instance (from "b" to "x" plus "ï").
String.prototype.match always returns an array or null if no match at all.
Finally, the g regexp flag stands for "global match". It means, it will try to match the same pattern on subsequent string parts. By default (with no g flag), it will be satisfied with the first match.
With a globally matching regexp, match returns an array of matching substrings.
With single match regexps, it would return the substring matched to the whole pattern, followed by the pattern groups matches. E. g.:
'Hello'.match(/el(lo)/) // [ 'ello', 'lo' ]
See more on Regexp.
This regex will break it up in desired parts:
([A-Z][a-z]*)

Categories

Resources