Regex: not providing length of certain string places [duplicate] - javascript

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?

They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here

The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).

In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com

+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.

A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.

Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba

+ is minimal one, * can be zero as well.

A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.

I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

Related

Regular expression, plus vs asterisk [duplicate]

This question already has answers here:
Why does String.match( / \d*/ ) return an empty string?
(4 answers)
Regex plus vs star difference? [duplicate]
(9 answers)
Closed 4 years ago.
I have a String with a number in it:
dfdf00023546546
I want to get only the number:
(0*)(\d+) works
(0*)(\d*) doesn't work
(0*)(\d*$) works
if plus means 1 or more and asterisk means 0 or more, isn't * suppose to catch more than +? why does adding the $ sign makes it work?
Thanks
Your problem is with g mode which is probably not set. If you set this global mode you will see expected substring is matched.
This (0*)(\d*) matches but returns more than two groups in a g mode because both patterns are *-quantified which includes zero-length matches.
+ quantifier denotes at least one occurrence of preceding token so it looks for something which its existence is a must. Having that said, it doesn't return zero-length matches.
Your third try (0*)(\d*$) works the same as + quantifier for the reason that zero-length matches couldn't occur earlier than meeting digits that meet the end of input string. With this regex however, there is a zero-length match at the end when g mode is on.
This might be hard to understand, but your regex will be somewhat as follows:
(0*)(\d+) will return a single match 00023546546.
(0*)(\d*$)
will return 2 matches 00023546546 and
end of string {empty}. The second match is because it has to check for zero or
more ocurrences of 0 - which can be {empty} and zero or more
occurrences of numbers between 0-9 - which again can be {empty} and the end of string check.
(0*)(\d*) on the other hand checks at 6 different positions - before each of the letters, because technically a match can be an {empty} according to your regex. One non-empty match which will return your numbers and one end of string match which is again empty.
Please remember that regex will not only match characters, but also produce 0-length matches.
(0*)(\d*) in fact works, it's just that it matches the stuff you want plus some empty matches:
[ '', '', '', '', '00023546546', '' ]
See those 0-length matches?
Now I'll explain why those 0-length matches are there. Your regex says that there should be 0 or more 0s, followed by 0 or more digits. This means that it can match 0 0s and 0 digits, doesn't it? So the space between every character is matched because that "substring" has exactly 0 0s and 0 digits!
By the way (0*)(\d*$) will only work if the match is at the end of the string.

Regex with capture group not working , but without capture group works perfectly fine [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

Regex- match third character [duplicate]

This question already has answers here:
How to match all characters after nth character with regex in JavaScript?
(2 answers)
Closed 5 years ago.
For example, this is my string "RL5XYZ" and I want to check the third character is it 5 or some other number.
I would like to do this with the Regex, without substring.
If you're trying to check whether the third character of a string is a number, you can use the following regex:
/^..[0-9]/
^ Means the match must occur at the start of the string
. Means match any character (we do this twice)
[0-9] Means match a number character in the range 0-9. You can actually adjust this to be a different range.
You can also condense the . using the following notation
/^.{2}[0-9]/
The number in braces basically means repeat the previous operator twice.
You can also rewrite the character set [0-9] as \d.
/^.{2}\d/
To match in JS, simple call exec against the pattern you've created:
/^.{2}\d/.exec('aa3') // => ["aa3", index: 0, input: "aa3"]
/^.{2}\d/.exec('aaa') // => null
If its always going to be checking for the existence of two characters followed by a 5 which is then followed by something else then you could simply check
/..5*/
if you want to get the third character (assuming its always a digit) then you could use.
/..(\d)*/
You'll get results back from regEx like this:
Match 1
Full match 0-3 `RL5`
Group 1. 2-3 `5`
Match 2
Full match 3-5 `XY`
If you want to check if the third character is a digit you can use
.{2}\d.*
But . matches everything so maybe you prefer:
\w{2}\d\w*
\w{2} means any of this [a-zA-Z0-9_] two times.
\d means any digit
\w* means any of [a-zA-Z0-9_] zero or multiple times
var input = 'RL5XYZ';
var position = 3;
var match = '5';
position--;
var r = new RegExp('^[^\s\S]{'+position+'}'+match);
console.log(input.match(r));
position to check where is to find and match what to find
edit: I forgot a ^

JavaScript RegExp to match a (partial) hour

I want to allow people to enter times into a textbox in various formats. One of the formats would be either:
2h for 2 hours, or
2.5h for 2 and a half hours
I want to use a regex to recognise the pattern but it's not picking it up for some reason:
I have:
var hourRegex = /^\d{1,2}[\.\d+]?[h|H]$/;
which works for 2h, but not for 2.5h.
I thought that this regex would mean - Start at the beginning of the string, have one or two digits, then have none or one decimal points which if present must be followed by one or more digits then have a h or a H and then it must be the end of the string.
I have tried the regex tool here but no luck.
/^\d{1,2}(?:\.\d+)?h$/i; Use parentheses instead of square braces.
Start at the beginning
One or two digits
Optional: a dot followed by at least one digit
End with a h
Case insensitive
RegExp tuturial
[...] - square braces mean: anything which is within the provided range.
[^...] means: Match a character which is not within the provided range
(...) - parentheses mean: Group me. Optionally, the first characters of a group can start with:
?: - Don't reference me (me, I = group)
?= - Don't include me in the match, though I have to be here
?! - I may not show up at this point
{a,b}, {a,} means: At least a, maximum b characters. Omitting b = Infinity
+ means: at least one time, match as much as possible equivalen to {1,}
* means: match as much as possible equivalent to {0,}
+? and *? have the same effect as previously described, with one difference: Match as less as possible
Examples
[a-z] One character, any character between a, b, c, ..., z
(a-z) Match "a-z", and group it
[^0-9] Match any non-number character
See also
MDN: Regular Expressions - A more detailed guide
The trouble is here :
[\.\d+]
you can not use character classes inside brackets.
Use this instead:
(\.[0-9]+)?
You've confused your square brackets with your parenthesis. Square brackets look for a single match of any contained character, whereas parenthesis look for a match of the entire enclosed pattern.
Your issue lies in [\.\d+]? It's looking for . or 0-9 or +.
Instead you should try:
/^\d{1,2}(\.\d+)?(h|H)$/
Although that will still allow users to enter invalid numbers, such as 99.3 which is probably not the expected behavior.

Difference between two regex pattern in javascript

I my application I am using below regex for pattern matching.
Original Pattern :
/(\w+\.){2,}/ig
Above pattern added in one array. Since this pattern has comma ( , ) after 2, creating problem in some environment.
As we know below concept in regex :
{n} - matches n times
{n, m} - matches at least n times, but not more than m times
So I have removed comma present after 2, because in above pattern no value exist after comma.
Pattern after removing comma :
/(\w+\.){2}/ig
As per above change i have resolved environment problem which i was facing earlier.
So here, I just wanted to know that by removing comma after 2 creates any problem while matching, for above given case.
{2} means match if it appears exactly 2 times, and {2,} means 2 times or above. Depending on the usage, this may or may not matter.
For example, if you want to validate whether the string contains 2 or more \w+\., then the comma doesn't matter. However, if you want to replace those 2 or more \w+\. with something else, the comma will affect the result.
'foo.bar.baz.'.replace(/(\w+\.){2}/ig, '~') == '~baz.'
'foo.bar.baz.'.replace(/(\w+\.){2,}/ig, '~') == '~'
{2,} means two or more. There is no max limit.
With this, {0,} is the same as *, and {1,} is the same as +
To summarize:
{n} match n times
{n,m} match at least n times, but not more than m times
{n,} match at least n times
Refer this for details

Categories

Resources