I my application I am using below regex for pattern matching.
Original Pattern :
/(\w+\.){2,}/ig
Above pattern added in one array. Since this pattern has comma ( , ) after 2, creating problem in some environment.
As we know below concept in regex :
{n} - matches n times
{n, m} - matches at least n times, but not more than m times
So I have removed comma present after 2, because in above pattern no value exist after comma.
Pattern after removing comma :
/(\w+\.){2}/ig
As per above change i have resolved environment problem which i was facing earlier.
So here, I just wanted to know that by removing comma after 2 creates any problem while matching, for above given case.
{2} means match if it appears exactly 2 times, and {2,} means 2 times or above. Depending on the usage, this may or may not matter.
For example, if you want to validate whether the string contains 2 or more \w+\., then the comma doesn't matter. However, if you want to replace those 2 or more \w+\. with something else, the comma will affect the result.
'foo.bar.baz.'.replace(/(\w+\.){2}/ig, '~') == '~baz.'
'foo.bar.baz.'.replace(/(\w+\.){2,}/ig, '~') == '~'
{2,} means two or more. There is no max limit.
With this, {0,} is the same as *, and {1,} is the same as +
To summarize:
{n} match n times
{n,m} match at least n times, but not more than m times
{n,} match at least n times
Refer this for details
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
I have a regular expression in a code(written by someone else), I am trying to understand what that expression means.
var decimal = /^\d[0,1]+(\.\d[1,4])?$/;
Can anyone explain to me what it does...
In order:
^ - Match the beginning of the input
\d - A digit (0-9)
[0,1]+ - One or more occurrences of the characters 0, ,, or 1 —but see note below, this is probably not what the author meant to do
( - The beginning of a capture group
\. - A literal . (without the backslash, it would mean something special)
\d - A digit
[1,4] - Exactly one of the characters 1, ,, or 4 —but see note below, this is probably not what the author meant to do
) - The end of the capture group
? - Inidicates that the entire capture gruop is optional (zero or once)
$ - Match the end of the input
Re the [0,1]+ and [1,4], the expression was probably supposed to have {0,1} and {1,4} instead, which mean:
{0,1} - match what came before either zero times or once (note that you have to remove the + that was after the [0,1])
{1,4} - match what came before 1, 2, 3, or 4 times
Here's an explanation on regex101.com
If we speculate that they probably meant this
/^\d{0,1}(\.\d{1,4})?$/
...then in prose it means: Match any number that may or may not have one leading digit, and then may or may not have a decimal point followed by one to four digits. But it's still got issues, not least that the string "" matches it, and (depending on what you're doing) you probably want to support values equal to or greater than 2, which that expression doesn't.
Basically: If it's meant to validate a decimal, throw it away, and search for something that does a better job, such as this if you really want at most four digits of precision and you want to capture the fractional portion (as your original does):
/^(?:0|[1-9]\d*)(\.\d{1,4})?$/
If you want to allow any level of precision:
/^(?:0|[1-9]\d*)(\.\d+)?$/
If you don't need the capture group:
/^(?:0|[1-9]\d*)(?:\.\d{1,4})?$/ // Only allow 1-4 digits of precision
/^(?:0|[1-9]\d*)(?:\.\d+)?$/ // Allow any number of digits of precision
That last is probably what I'd go with. Note that it doesn't allow leading zeros you wouldn't normally write (e.g., it disallows 02.345). If you want to allow them, then just /^\d*(?:\.\d+)?$/.
The crucial parts:
^: Beginning of input
\d: A digit
[0,1]+: One or more occurences of 0 or 1 or ,
(\.\d[1,4])?: An optional capture group, containing: a . literal, a digit, and a 1 or 4 or ,
$: End of input
The full story can be found here.
So some allowed input is:
80.94
41111111.44
4,,,1.44
30
I have a requirement to handle a regular expression for no more than two of the same letters/digits in an XSL file.
no space
does not support special chars
support (a-z,A-Z,0-9)
require one of a-z
require one of 0-9
no more than 2 same letter/digits (i.e., BBB will fail, BB is accepted)
What I have so far
(?:[^a-zA-Z0-9]{1,2})
This regex will do it:
^(?!.*([A-Za-z0-9])\1{2})(?=.*[a-z])(?=.*\d)[A-Za-z0-9]+$
Here's the breakdown:
(?!.*([A-Za-z0-9])\1{2}) makes sure that none of the chars repeat more than twice in a row.
(?=.*[a-z]) requires at least one lowercase letter
(?=.*\d) requires at least one digit
[A-Za-z0-9]+ allows only letters and digits
EDIT :
removed an extraneous .* from the negative lookahead
(Partial solution) For matching the same character repeated 3 or more times consecutively, try:
([a-zA-Z0-9])\1{2,}
Sample matches (tested both here and here): AABBAA (no matches), AABBBAAA (matches BBB and AAA), ABABABABABABABA (no matches), ABCCCCCCCCCC (matches CCCCCCCCCC).
Does this one work for you?
/(\b(?:([A-Za-z0-9])(?!\2{2}))+\b)/
Try it out:
var regex = new RegExp(/(\b(?:([A-Za-z0-9])(?!\2{2}))+\b)/)
var tests = ['A1D3E', 'AAAA', 'AABAA', 'abccddeeff', 'abbbc', '1234']
for(test in tests) {
console.log(tests[test] + ' - ' + Boolean(tests[test].match(regex)))
}
Will output:
A1D3E - true
AAAA - false
AABAA - true
abccddeeff - true
abbbc - false
1234 - true
You may do this in 2 regexes:
/^(?=.*[a-z])(?=.*[0-9])[a-z0-9]+$/i This will assure that there is at least 1 digit and 1 letter while accepting only letters and digits (no space or special characters)
/([a-z0-9])\1{2,}/i If this one is matched, then there is a repeated character. Which means you should throw false.
Explanation:
First regex:
^ : match begin of line
(?=.*[a-z]) : check if there is at least one letter
(?=.*[0-9]) : check if there is at least one digit
[a-z0-9]+ : if the checks were true, then match only digits/letters one or more times
$ : match end of line
i : modifier, match case insensitive
Second regex:
([a-z0-9]) : match and group a digit or a letter
\1{2,} : match group 1 two or more times
i : modifier, match case insensitive
In response to a clarification, it seems that a single regular expression isn't strictly required. In that case I suggest you use several regular expressions or functions. My guess is, performance isn't a requirement, since usually these sorts of checks are done in response to user input. User input validation can take 100ms and still appear to be instant, and you can run a lot of code in 100ms.
For example, I personally would do a check for each of your conditions in a separate test. First, check for spaces. Second, check for at least one letter. Next, check for at least one number. Finally, look for any spans of three or more repeated characters.
Your code will be much easier to understand, and it will be much easier to modify the rules later (which, experience has shown, is almost certainly going to happen).
For example:
function do_validation(string) {
return (has_no_space(string) &&
has_no_special_char(string) &&
has_alpha(string) &&
has_digit(string) &&
! (has_repeating(string)))
I personally consider the above to be orders of magnitude easier to read than one complex regular expression. Plus, adding or removing a rule doesn't make you have to reimplement a complex regular expression (and thus, be required to re-test all possible combinations).
I have quite small but very annoying problem with regex. I would like to find regex for comma separated list which contains nine digits phone number for example :
Pass : 123456789,123456789
Not Pass : 123456789,123456789,
So far, I have something like this :/^\d{9}+(,\d{9}\+)\*$/ Of course it works for example in this tool http://regex.larsolavtorvik.com, but in javascript it does not work and I get this I suppose well known error (for Javascript people) :
Invalid regular expression: /^\d{9}+(,\d{9}\+)\*$/: Nothing to repeat
So, I added backslash and it looks like this one : /^\d{9}\+(,\d{9}\+)\*$/. Of course this one also does not work.
You are escaping *,+ with \.That is the problem..
* means match the preceding char 0 to many times
+ means match the preceding char 1 to many times
{9} means match the preceding char 9 times..so there is no need of using + after it
The regex should be
/^\d{9}(,\d{9})*$/