How to use regex lookahead and match the previous string / character class

How to use regex lookahead and match the previous string / character class - javascript

Trying to use negative look ahead to match a number if it doesn't precede a % sign.
\d+(?!%) .
8989% .
//matches 898, but not the 9% .
I'd like it to match 8989 as a whole.
Also is it possible use negative look ahead with matching a whole character class or more complex regex?
[\d.+](?!%) .
\d+(\.\d{1,2})?(?!%) .
Which would match decimals not preceding a %

The \d+(?!%) pattern matches one or more digits, and grabs 8989 in 8989% at first, but the (?!%) negative lookahead fails that match, and the engine, seeing the + quantifier, starts backtracking. It discards the last 9 from the match buffer and retries the (?!%) lookahead that succeeds as 898 is not followed with % symbol.
You may use
/\d+(?![\d%])/g
See the regex demo
The (?![\d%]) negative lookahead will fail the match if 1+ digits is followed with any digit or % char, and thus will not return partial matches of 1+ digits that are followed with a % symbol.

Related

Limit 10 characters is numbers and only 1 dot

I'm having a regex problem when input
That's the requirement: limit 10 characters (numbers) including dots, and only 1 dot is allowed
My current code is only 10 characters before and after the dot.
^[0-9]{1,10}\.?[0-9]{0,10}$
thank for support.

You could assert 10 chars in the string being either . or a digit.
Then you can match optional digits, and optionally match a dot and again optional digits:
^(?=[.\d]{10}$)\d*(?:\.\d*)?$
The pattern matches:
^ Start of string
(?=[.\d]{10}$) Positive lookahead, assert 10 chars . or digit till the end of string
\d* Match optional digits
(?:\.\d*)? Optionally match a `. and optional digits
$ End of string
See a regex demo.
If the pattern should not end on a dot:
^(?=[.\d]{10}$)\d*(?:\.\d+)?$
Regex demo

The decimal point throws a wrench into most single pattern approaches. I would probably use an alternation here:
^(?:\d{1,10}|(?=\d*\.)(?!\d*\.\d*\.)[0-9.]{2,11})$
This pattern says to match:
^ from the start of the number
(?:
\d{1,10} a pure 1 to 10 digit integer
| OR
(?=\d*\.) assert that one dot is present
(?!\d*\.\d*\.) assert that ONLY one dot is present
[0-9.]{2,11} match a 1 to 10 digit float
)
$ end of the number

You can use a lookahead to achieve your goals.
First, looking at your regex, you've used [0-9] to represent all digit characters. We can shorten this to \d, which means the same thing.
Then, we can focus on the requirement that there be only one dot. We can test for this with the following pattern:
^\d*\.?\d*$
\d* means any number of digit characters
\.? matches one literal dot, optionally
\d* matches any number of digit characters after the dot
$ anchors this to the end of the string, so the match can't just end before the second dot, it actually has to fail if there's a second dot
Now, we don't actually want to consume all the characters involved in this match, because then we wouldn't be able to ensure that there are <=10 characters. Here's where the lookahead comes in: We can use the lookahead to ensure that our pattern above matches, but not actually perform the match. This way we verify that there is only one dot, but we haven't actually consumed any of the input characters yet. A lookahead would look like this:
^(?=\d*\.?\d*$)
Next, we can ensure that there are aren't more than 10 characters total. Since we already made sure there are only dots and digits with the above pattern, we can just match up to 10 of any characters for simplicity, like so:
^.{1,10}$
Putting these two patterns together, we get this:
^(?=\d*\.?\d*$).{1,10}$
This will only match number inputs which have 10 or fewer characters and have no more than one dot.
If you would like to ensure that, when there is a dot, there is also a digit accompanying it, we can achieve this by adding another lookahead. The only case that meets this condition is when the input string is just a dot (.), so we can just explicitly rule this case out with a negative lookahead like so:
(?!\.$)
Adding this back in to our main expression, we get:
^(?=\d*\.?\d*$)(?!\.$).{1,10}$

Using lookahead, how to ensure at least 4 alphanumeric chars are included + underscores

I'm trying to make sure that at least 4 alphanumeric characters are included in the input, and that underscores are also allowed.
The regular-expressions tutorial is a bit over my head because it talks about assertions and success/failure if there is a match.
^\w*(?=[a-zA-Z0-9]{4})$
my understanding:
\w --> alphanumeric + underscore
* --> matches the previous token between zero and unlimited times ( so, this means it can be any character that is alphanumeric/underscore, correct?)
(?=[a-zA-Z0-9]{4}) --> looks ahead of the previous characters, and if they include at least 4 alphanumeric characters, then I'm good.
Obviously I'm wrong on this, because regex101 is showing me no matches.

You want 4 or more alphanumeric characters, surround by any number of underscores (use ^ and $ to ensure it match's the whole input ):
^(_*[a-zA-Z0-9]_*){4,}$

Your pattern ^\w*(?=[a-zA-Z0-9]{4})$ does not match because:
^\w* Matches optional word characters from the start of the string, and if there are only word chars it will match until the end of the string
(?=[a-zA-Z0-9]{4}) The positive lookahead is true, if it can assert 4 consecutive alphanumeric chars to the right from the current position. The \w* allows backtracking, and can backtrack 4 positions so that the assertion it true.
But the $ asserts the end of the string, which it can not match as the position moved 4 steps to the left to fulfill the previous positive lookahead assertion.
Using the lookahead, what you can do is assert 4 alphanumeric chars preceded by optional underscores.
If the assertion is true, match 1 or more word characters.
^(?=(?:_*[a-zA-Z0-9]){4})\w+$
The pattern matches:
^ Start of string
(?= Positive lookahead, asser what is to the right is
(?:_*[a-zA-Z0-9]){4} Repeat 4 times matching optional _ followed by an alphanumeric char
) Close the lookahead
\w+ Match 1+ word characters (which includes the _)
$ End of string
Regex demo

I suggest using atomic groups (?>...), please see regex tutorial for details
^(?>_*[a-zA-Z0-9]_*){4,}$
to ensure 4 or more fragments each of them containing letter or digit.
Edit: If regex doesn't support atomic, let's try use just groups:
^(?:_*[A-Za-z0-9]_*){4,}$

positive lookahead

Use lookaheads to match a string that is greater than 5 characters long and have two consecutive digits.
I know the solution should be
/(?=\w{6,})(?=\D*\d{2})/
But why the second element is
(?=\D*\d{2})
Instead of
(?=\d{2})
Please help me to understand this.

Actually, /(?=\w{6,})(?=\D*\d{2})/ does not ensure there will be a match in a string with 2 consecutive digits.
Check this demo:
var reg = /(?=\w{6,})(?=\D*\d{2})/;
console.log(reg.test("Matches are found 12."))
console.log(reg.test("Matches are not found 1 here 12."))
This happens because \D* only matches any non-digit chars, and once the \w{6,} matches, (?=\D*\d{2}) wants to find the two digits after any 0+ digits, but it is not the case in the string.
So, (?=\w{6,})(?=\D*\d{2}) matches a location in the string that is immediately followed with 6 or more word chars and any 0+ non-digit chars followed with 2 digits.
The correct regex to validate if a string contains 6 or more word chars and two consecutive digits anywhere in the string is
var reg = /^(?=.*\w{6,})(?=.*\d{2})/;
Or, to support multiline strings:
var reg = /^(?=[^]*\w{6,})(?=[^]*\d{2})/;
where [^] matches any char. Also, [^] can be replaced with [\s\S] / [\d\D] or [\w\W].
And to match a string that is greater than 5 characters long and have two consecutive digits you may use
var reg = /^(?=.*\d{2}).{5,}$/
var reg = /^(?=[\s\S]*\d{2})[\s\S]{5,}$/
where
^ - start of string
(?=[\s\S]*\d{2}) - there must be two digits anywhere after 0+ chars to the right of the current location
[\s\S]{5,} - five or more chars
$ - end of string.

The lookahead has to allow the 2 digits anywhere in the input. If you used just (?=\d{2}) then the 2 digits would have to be at the beginning.
You could also use (?=.*\d{2}). The point is that \d{2} has to be preceded by something that can match the rest of the input before the digits.

Javascript Regular Expresion [duplicate]

I'm trying to write a RegExp to match only 8 digits, with one optional comma maybe hidden in-between the digits.
All of these should match:
12345678
12,45678
123456,8
Right now I have:
^[0-9,]{8}
but of course that erroneously matches 012,,,67
Example:
https://regex101.com/r/dX9aS9/1
I know optionals exist but don't understand how to keep the 8 digit length applying to the comma while also keeping the comma limited to 1.
Any tips would be appreciated, thanks!

To match 8 char string that can only contain digits and an optional comma in-between, you may use
^(?=.{8}$)\d+,?\d+$
See the regex demo
The lookahead will require the string to contain 8 chars. ,? will make matching a comma optional, and the + after \d will require at least 1 digit before and after an optional comma.
If you need to match a string that has 8 digits and an optional comma, you can use
^(?:(?=.{9}$)\d+,\d+|\d{8})$
See the regex demo
Actually, the string will have 9 characters in the string (if it has a comma), or just 8 - if there are only digits.
Explanation:
^ - start of string
(?:(?=.{9}$)\d+,\d+|\d{8}) - 2 alternatives:
(?=.{9}$)\d+,\d+ - 1+ digits followed with 1 comma followed with 1+ digits, and the whole string matched should be 9 char long (8 digits and 1 comma)
| - or
\d{8} - 8 digits
$ - end of string
See the Java code demo (note that with String#matches(), the ^ and $ anchors at the start and end of the pattern are redundant and can be omitted since the pattern is anchored by default when used with this method):
List<String> strs = Arrays.asList("0123,,678", "0123456", // bad
"01234,567", "01234567" // good
);
for (String str : strs)
System.out.println(str.matches("(?:(?=.{9}$)\\d+,\\d+|\\d{8})"));
NOTE FOR LEADING/TRAILING COMMAS:
You just need to replace + (match 1 or more occurrences) quantifiers to * (match 0 or more occurrences) in the first alternative branch to allow leading/trailing commas:
^(?:(?=.{9}$)\d*,\d*|\d{8})$
See this regex demo

You can use following regex if you want to let trailing comma:
^((\d,?){8})$
Demo
Otherwise use following one:
^((\d,?){8})(?<!,)$
Demo
(?<!,) is a negative-lookbehind.

/^(?!\d{0,6},\d{0,6},\d{0,6})(?=\d[\d,]{6}\d).{8}$/
I guess this cooperation of positive and negative look-ahead does just what's asked. If you remove the start and end delimiters and set the g flag then it will try to match the pattern along decimal strings longer than 8 characters as well.
Please try http://regexr.com/3d63m
Explanation: The negative look ahead (?!\d{0,6},\d{0,6},\d{0,6}) tries not to find any commas side by side if they have 6 or less decimal characters in between while the positive look ahead (?=\d[\d,]{6}\d) tries to find 6 decimal or comma characters in between two decimal characters. And the last .{8} selects 8 characters.

Explain this regex js

I'm using this regex to match some strings:
^([^\s](-)?(\d+)?(\.)?(\d+)?)$/
I'm confusing about why it's permitted to enter two dots, like ..
What I understand is that only allowed to put 1 dash or none (-)?
Any digits with no limit or none (\d+)?
One dot or none (\.)?
Why is allowed to put .. or even .4.6?
Testing done in http://www.regextester.com/

[^\s] means anything that is not a whitespace. This includes dots. Trying to match .. will get you:
[^\s] matches .
(-)? doesn't match
(\d+)? doesn't match
(\.)? matches .
(\d+)? doesn't match
I'll assume you wanted to match numbers (possibly negative/floating):
^-?\d+(\.\d+)?$

^([^\s](-)?(\d+)?(\.)?(\d+)?)$/
Assert position at the beginning of the string ^
Match the regex below and capture its match into backreference number 1 ([^\s](-)?(\d+)?(\.)?(\d+)?)
Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) [^\s]
A single character from the list “\s” (case sensitive) \s
Match the regex below and capture its match into backreference number 2 (-)?
Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives ?
Match the character “-” literally -
Match the regex below and capture its match into backreference number 3 (\d+)?
Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives ?
MySQL does not support any shorthand character classes \d+
Between one and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives +
Match the regex below and capture its match into backreference number 4 (\.)?
Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives ?
Match the character “.” literally \.
Match the regex below and capture its match into backreference number 5 (\d+)?
Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives ?
MySQL does not support any shorthand character classes \d+
Between one and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives +
Assert position at the very end of the string $
Match the character “/” literally /
Created with RegexBuddy

As I mentioned in my comment, [^\n] is a negated character class that matches .. and as there is another (\.)? pattern, the regex can match 2 consecutive dots (since all of the parts except for [^\s] are optional).
In order not to match strings like .4.5 or .. you just need to add the . to the [^\n] negated character class:
^([^\s.](-)?(\d+)?(\.)?(\d+)?)$
^
See demo. This will not let any . in the initial capturing group.
You can use a lookahead to only disallow the first character as a dot:
^(?!\.)([^\s](-)?(\d+)?(\.)?(\d+)?)$
See another demo
All explanation is available at the online regex testers:
In order to match the numbers in the format you expect, use:
^(?:[-]?\d+\.?\d*|-)$
Human-readable explanation:
^ - start of string and then there are 2 alternatives...
[-]? - optional hyphen
\d+ - 1 or more digits
\.? - optional dot
\d* - 0 or more digits
| -OR-
- - a hyphen
$ - end of string
See demo

Develop Reference

JavaScript is the programming language of the Web.

How to use regex lookahead and match the previous string / character class - javascript

Related

Limit 10 characters is numbers and only 1 dot

Using lookahead, how to ensure at least 4 alphanumeric chars are included + underscores

positive lookahead

Javascript Regular Expresion [duplicate]

Explain this regex js

Categories

Resources