Regular expression, plus vs asterisk [duplicate] - javascript

This question already has answers here:
Why does String.match( / \d*/ ) return an empty string?
(4 answers)
Regex plus vs star difference? [duplicate]
(9 answers)
Closed 4 years ago.
I have a String with a number in it:
dfdf00023546546
I want to get only the number:
(0*)(\d+) works
(0*)(\d*) doesn't work
(0*)(\d*$) works
if plus means 1 or more and asterisk means 0 or more, isn't * suppose to catch more than +? why does adding the $ sign makes it work?
Thanks

Your problem is with g mode which is probably not set. If you set this global mode you will see expected substring is matched.
This (0*)(\d*) matches but returns more than two groups in a g mode because both patterns are *-quantified which includes zero-length matches.
+ quantifier denotes at least one occurrence of preceding token so it looks for something which its existence is a must. Having that said, it doesn't return zero-length matches.
Your third try (0*)(\d*$) works the same as + quantifier for the reason that zero-length matches couldn't occur earlier than meeting digits that meet the end of input string. With this regex however, there is a zero-length match at the end when g mode is on.

This might be hard to understand, but your regex will be somewhat as follows:
(0*)(\d+) will return a single match 00023546546.
(0*)(\d*$)
will return 2 matches 00023546546 and
end of string {empty}. The second match is because it has to check for zero or
more ocurrences of 0 - which can be {empty} and zero or more
occurrences of numbers between 0-9 - which again can be {empty} and the end of string check.
(0*)(\d*) on the other hand checks at 6 different positions - before each of the letters, because technically a match can be an {empty} according to your regex. One non-empty match which will return your numbers and one end of string match which is again empty.

Please remember that regex will not only match characters, but also produce 0-length matches.
(0*)(\d*) in fact works, it's just that it matches the stuff you want plus some empty matches:
[ '', '', '', '', '00023546546', '' ]
See those 0-length matches?
Now I'll explain why those 0-length matches are there. Your regex says that there should be 0 or more 0s, followed by 0 or more digits. This means that it can match 0 0s and 0 digits, doesn't it? So the space between every character is matched because that "substring" has exactly 0 0s and 0 digits!
By the way (0*)(\d*$) will only work if the match is at the end of the string.

Related

Regex: not providing length of certain string places [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

Regex with capture group not working , but without capture group works perfectly fine [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

Regex- match third character [duplicate]

This question already has answers here:
How to match all characters after nth character with regex in JavaScript?
(2 answers)
Closed 5 years ago.
For example, this is my string "RL5XYZ" and I want to check the third character is it 5 or some other number.
I would like to do this with the Regex, without substring.
If you're trying to check whether the third character of a string is a number, you can use the following regex:
/^..[0-9]/
^ Means the match must occur at the start of the string
. Means match any character (we do this twice)
[0-9] Means match a number character in the range 0-9. You can actually adjust this to be a different range.
You can also condense the . using the following notation
/^.{2}[0-9]/
The number in braces basically means repeat the previous operator twice.
You can also rewrite the character set [0-9] as \d.
/^.{2}\d/
To match in JS, simple call exec against the pattern you've created:
/^.{2}\d/.exec('aa3') // => ["aa3", index: 0, input: "aa3"]
/^.{2}\d/.exec('aaa') // => null
If its always going to be checking for the existence of two characters followed by a 5 which is then followed by something else then you could simply check
/..5*/
if you want to get the third character (assuming its always a digit) then you could use.
/..(\d)*/
You'll get results back from regEx like this:
Match 1
Full match 0-3 `RL5`
Group 1. 2-3 `5`
Match 2
Full match 3-5 `XY`
If you want to check if the third character is a digit you can use
.{2}\d.*
But . matches everything so maybe you prefer:
\w{2}\d\w*
\w{2} means any of this [a-zA-Z0-9_] two times.
\d means any digit
\w* means any of [a-zA-Z0-9_] zero or multiple times
var input = 'RL5XYZ';
var position = 3;
var match = '5';
position--;
var r = new RegExp('^[^\s\S]{'+position+'}'+match);
console.log(input.match(r));
position to check where is to find and match what to find
edit: I forgot a ^

Regular expression match 0 or exact number of characters

I want to match an input string in JavaScript with 0 or 2 consecutive dashes, not 1, i.e. not range.
If the string is:
-g:"apple" AND --projectName:"grape": it should match --projectName:"grape".
-g:"apple" AND projectName:"grape": it should match projectName:"grape".
-g:"apple" AND -projectName:"grape": it should not match, i.e. return null.
--projectName:"grape": it should match --projectName:"grape".
projectName:"grape": it should match projectName:"grape".
-projectName:"grape": it should not match, i.e. return null.
To simplify this question considering this example, the RE should match the preceding 0 or 2 dashes and whatever comes next. I will figure out the rest. The question still comes down to matching 0 or 2 dashes.
Using -{0,2} matches 0, 1, 2 dashes.
Using -{2,} matches 2 or more dashes.
Using -{2} matches only 2 dashes.
How to match 0 or 2 occurrences?
Answer
If you split your "word-like" patterns on spaces, you can use this regex and your wanted value will be in the first capturing group:
(?:^|\s)((?:--)?[^\s-]+)
\s is any whitespace character (tab, whitespace, newline...)
[^\s-] is anything except a whitespace-like character or a -
Once again the problem is anchoring the regex so that the relevant part isn't completely optionnal: here the anchor ^ or a mandatory whitespace \s plays this role.
What we want to do
Basically you want to check if your expression (two dashes) is there or not, so you can use the ? operator:
(?:--)?
"Either two or none", (?:...) is a non capturing group.
Avoiding confusion
You want to match "zero or two dashes", so if this is your entire regex it will always find a match: in an empty string, in --, in -, in foobar... What will be match in these string will be an empty string, but the regex will return a match.
This is a common source of misunderstanding, so bear in mind the rule that if everything in your regex is optional, it will always find a match.
If you want to only return a match if your entire string is made of zero or two dashes, you need to anchor the regex:
^(?:--)?$
^$ match respectively the beginning and end of the string.
a(-{2})?(?!-)
This is using "a" as an example. This will match a followed by an optional 2 dashes.
Edit:
According to your example, this should work
(?<!-)(-{2})?projectName:"[a-zA-Z]*"
Edit 2:
I think Javascript has problems with lookbehinds.
Try this:
[^-](-{2})?projectName:"[a-zA-Z]*"
Debuggex Demo

Possible to make regular expression with sub-query

I am trying to write a regular expression for a bit of javascript code that takes a user's input of a mobile number and in one regular expression, performs the following checks:
Starts with 07
Contains only numbers, whitespace or dashes
Contains exactly 11 numbers
Is this possible to do in just one regular expression and if so, how please?
I don't think it is possible with one regex, but it is possible by testing for two conditions:
if(/^07[\d\- ]+$/.test(str) && str.replace(/[^\d]/g, "").length === 11) {
//string matches conditions
}
Explanation of the regex:
^: Anchor that means "match start of string".
07: Match the string 07. Together with the above, it means that the string must start with 07.
[: Beginning of a character class i.e., a set of characters that we want to allow
\d: Match a digit (equivalent to 0-9).
\-:
" ": Match whitespace (markdown doesn't let me show a single space as code)
]: End of character class.
+: One or more of the previous.
$: Anchor that means "match end of string". Together with the ^, this basically means that this regex must apply to the entire string.
So here we check to see that the string matches the general format (starts with 07 and contains only digits, dashes or spaces) and we also make sure that we have 11 numbers in total inside the string. We do this by getting of anything that is not a digit and then checking to see that the length of the string is equal to 11.
Since #Vivin throws out the challenge :
/^07([-\s]*\d){9}[-\s]*$/
^07 : begin with digits 07
( : start group
[-\s]* : any number of - or whitespace
\d : exactly one digit
){9} : exactly 9 copies of this group (11 digits including 07)
[-\s]* : optional trailing spaces or -
$ : end of string
Of course a more useful way might be as follows
if ((telNo = telNo.replace (/[-\s]+/g, '')).match (/^07\d{9}$/)) {
....
}
which has the advantage (?) of leaving just the digits in telNo
Thank you all for trying, but after a good while trying different ideas, I finally found a working "single" regular expression:
07((?:\s|-)*\d(?:\s|-)*){9}
This make sure that it starts with 07, only contains digits, whitespace or dashes, and only 11 of them (9 plus the first 2) are numbers.
Sorry to have wasted your time.
Explanation:
() - include in capture
(?:) - do not include in capture
\s - whitespace
| - or
- - dash
* - zero or more
\d - digits only
{9} - exactly nine of what is captured

Categories

Resources