Negative lookahead RegEx limited to an exact number of characters - javascript

How can I limit a negative lookahead RegEx to an exact number of characters?
For example, this sentence should be denied...
This car is not that fast!
while this one should be allowed...
The car you are about to see may not be that fast, but it's very beautiful!
The RegEx should match any sentence that contains the word 'car', except the ones that include the word 'not' in the following 10 characters. This is the case of the first sentence, where there are only 4 characters in between the 'car' and 'not' words. So this sentence should be denied.
The second sentence, however, has more than 10 characters in between the 'car' and 'not' words, so it should pass the RegEx negative assertion.
Basically, what I am looking for is a negative lookahead RegEx that is limited to a certain number of characters.

Indeed, you can use negative look-ahead:
.*?car(?!.{0,10}not).*
If "car" and "not" in this rule are supposed to be separate words and not just substrings of any sequence, then add the appropriate \b:
.*?\bcar\b(?!.{0,10}\bnot\b).*

Negative lookahead Assertion,https://regex101.com/r/mD9JeR/11:
.*?car(?![\w\s]{0,10}not).*
looks whether \w\s characters 0-10 times before not. If so then it won't match.
Positive lookahead Assertion, just as an FYI - https://regex101.com/r/mD9JeR/12
.*?car(?=[\w\s]{10,}not).*

Related

Limit 10 characters is numbers and only 1 dot

I'm having a regex problem when input
That's the requirement: limit 10 characters (numbers) including dots, and only 1 dot is allowed
My current code is only 10 characters before and after the dot.
^[0-9]{1,10}\.?[0-9]{0,10}$
thank for support.
You could assert 10 chars in the string being either . or a digit.
Then you can match optional digits, and optionally match a dot and again optional digits:
^(?=[.\d]{10}$)\d*(?:\.\d*)?$
The pattern matches:
^ Start of string
(?=[.\d]{10}$) Positive lookahead, assert 10 chars . or digit till the end of string
\d* Match optional digits
(?:\.\d*)? Optionally match a `. and optional digits
$ End of string
See a regex demo.
If the pattern should not end on a dot:
^(?=[.\d]{10}$)\d*(?:\.\d+)?$
Regex demo
The decimal point throws a wrench into most single pattern approaches. I would probably use an alternation here:
^(?:\d{1,10}|(?=\d*\.)(?!\d*\.\d*\.)[0-9.]{2,11})$
This pattern says to match:
^ from the start of the number
(?:
\d{1,10} a pure 1 to 10 digit integer
| OR
(?=\d*\.) assert that one dot is present
(?!\d*\.\d*\.) assert that ONLY one dot is present
[0-9.]{2,11} match a 1 to 10 digit float
)
$ end of the number
You can use a lookahead to achieve your goals.
First, looking at your regex, you've used [0-9] to represent all digit characters. We can shorten this to \d, which means the same thing.
Then, we can focus on the requirement that there be only one dot. We can test for this with the following pattern:
^\d*\.?\d*$
\d* means any number of digit characters
\.? matches one literal dot, optionally
\d* matches any number of digit characters after the dot
$ anchors this to the end of the string, so the match can't just end before the second dot, it actually has to fail if there's a second dot
Now, we don't actually want to consume all the characters involved in this match, because then we wouldn't be able to ensure that there are <=10 characters. Here's where the lookahead comes in: We can use the lookahead to ensure that our pattern above matches, but not actually perform the match. This way we verify that there is only one dot, but we haven't actually consumed any of the input characters yet. A lookahead would look like this:
^(?=\d*\.?\d*$)
Next, we can ensure that there are aren't more than 10 characters total. Since we already made sure there are only dots and digits with the above pattern, we can just match up to 10 of any characters for simplicity, like so:
^.{1,10}$
Putting these two patterns together, we get this:
^(?=\d*\.?\d*$).{1,10}$
This will only match number inputs which have 10 or fewer characters and have no more than one dot.
If you would like to ensure that, when there is a dot, there is also a digit accompanying it, we can achieve this by adding another lookahead. The only case that meets this condition is when the input string is just a dot (.), so we can just explicitly rule this case out with a negative lookahead like so:
(?!\.$)
Adding this back in to our main expression, we get:
^(?=\d*\.?\d*$)(?!\.$).{1,10}$

Do consecutive lookaheads match based on first matched character

I came answers (below) to try and understand how consecutive lookaheads work. My understanding seems to be contradictory and was hoping someone could help clarify.
The answer here suggests that all the lookaheads specified must be present for the first matched character (Why consecutive lookaheads do not always work answer by Sam Whan)
If I apply that to the solution in this answer:
How to print a number with commas as thousands separators in JavaScript:
function numberWithCommas(x) {
return x.toString().replace(/\B(?=(\d{3})+(?!\d))/g, ",");
}
it means that it's looking for the a non-boundary character that is followed by a sequence of characters with length that is a multiple of 3 and at the same time followed by characters that are not digits.
e.g. 12345
Knowing that a comma should go after the 2 but it seems contradictory as 2 has 3 digits following it, satisfying the first lookahead but the second lookahead contradicts it as it's supposed to not be followed by any digits.
I'm sure I'm misunderstanding something. Any help is appreciated. Thanks!
This regex:
/\B(?=(\d{3})+(?!\d))/g
Has only one positive lookahead condition and other negative lookahead is inside this first lookahead.
Here are details:
\B: Match position where \b doesn't match (e.g. between word characters)
(?=: Start lookahead
(\d{3})+: Match one or more sets of 3 digits
(?!\d): Inner negative lookahead to assert that we don't have a digit after match set of 3 digits
): End lookahead
However do note that it is much better to use following code to format your number to a thousand separator string:
console.log( parseFloat('1234567.89').toLocaleString('en') )

Regex match valid Phone Number

I'm quite new to regex, and not sure what I'm doing wrong exactly.
I'm looking for a regex that match the following number format:
Matching requirements:
Must start with either 0 or 3
Must be between 7 to 11 digits
Must not allow ascending digits. e.g. 0123456789, 01234567
Must not allow repeated digits. e.g. 011111111, 3333333333, 0000000000
This is what I came up with:
^(?=(^[0,3]{1}))(?!.*(\d)\1{3,})(?!^(?:0(?=1|$))?(?:1(?=2|$))?(?:2(?=3|$))?(?:3(?=4|$))?(?:4(?=5|$))?(?:5(?=6|$))?(?:6(?=7|$))?(?:7(?=8|$))?(?:8(?=9|$))?9?$).{7,11}$
The above regex fails the No. (4) condition. Not sure why though.
Any help would be appreciated.
Thanks
A few notes about the pattern that you tried
You can omit the {1} and the comma in [0,3]
In the lookahead (?!.*(\d)\1{3,}) the (\d) is the second capturing group because this (?=(^[0,3]{1})) contains the first capturing group so it should be \2 instead of \1
In the lookahead, you can omit the comma in {3,}
In the match itself you use .{7,11} where the dot would match any character except a newline. You could use \d instead to match only digits
You pattern might look like
^(?=(^[03]))(?!.*(\d)\2{3})(?!^(?:0(?=1|$))?(?:1(?=2|$))?(?:2(?=3|$))?(?:3(?=4|$))?(?:4(?=5|$))?(?:5(?=6|$))?(?:6(?=7|$))?(?:7(?=8|$))?(?:8(?=9|$))?9?$)\d{7,11}$
Regex demo
Or leaving out the first lookahead and move that to the match, changing the quantifier to \d{6,10} and repeating capture group \1 instead of \2
^(?!.*(\d)\1{3})(?!(?:0(?=1|$))?(?:1(?=2|$))?(?:2(?=3|$))?(?:3(?=4|$))?(?:4(?=5|$))?(?:5(?=6|$))?(?:6(?=7|$))?(?:7(?=8|$))?(?:8(?=9|$))?9?$)[03]\d{6,10}$
Regex demo
Edit
Based on the comments, the string not having 4 ascending digits:
^(?!.*(\d)\1{3})[03](?!\d*(?:0123|1234|2345|3456|4567|5678|6789))\d{6,10}$
Regex demo
A solution for a JS flavor of PCRE would be
/^[03](?!123456(7(8(9|$)|$)|$))(?!(?<d>.)\k<d>+$)[0-9]{6,10}$/
Explanations
^[03] starts at the beginning of the string, then reads either 0 or 3
(?!123456(7(8(9|$)|$)|$)) makes sure that, after this first char, there is no sequence (if a sequence can be read, then the negative lookahead fails
(?!(?<d>.)\k<d>+$) is another negative lookahead : it ensures that the first char read (flagged d) is not repeated again and again until end of string
[0-9]{6,10}$/ finally reads 6 to 10 digits (first one already read)
A few tests:
"0123456789: No match"
"01234567: No match"
"01234568: No match"
"011111111: No match"
"33333333: No match"
"333333233 is valid"
"042157891023 is valid"
"019856: No match"
"0123451245 is valid"

what is difference between these two syntax in my code

What is difference between The following syntaxs in regular expression?
Please give an example.
(?=.*\d)
and
.*(?=\d)
The first one is just an assertion, a positive look-ahead saying "there must be zero or more characters followed by a digit." If you match it against a string containing at least one digit, it will tell you whether the assertion is true, but the matched text will just be an empty string.
The second one searches for a match, with an assertion (a positive-lookahead) after the match saying "there must be a digit." The matched text will be the characters before the last digit in the string (including any previous digits, because .* is greedy, so it'll consume digits up until the last one, because the last one is required by the assertion).
Note the difference in the match object results:
var str = "foo42";
test("rex1", /(?=.*\d)/, str);
test("rex2", /.*(?=\d)/, str);
function test(label, rex, str) {
console.log(label, "test result:", rex.test(str));
console.log(label, "match object:", rex.exec(str));
}
Output (for those who can't run snippets):
rex1 test result: true
rex1 match object: [
""
]
rex2 test result: true
rex2 match object: [
"foo4"
]
Notice how the match result in the second case was foo4 (from the string foo42), but blank in the first case.
(?=...) is a positive lookahead. Both of these expressions will match "any text followed by a number". The difference, though, is that (?=...) doesn't "eat" ("capture") any characters as it matches. For practical purposes, if this is the only thing your regex contains, they'll match the same stuff. However, .*(?=\d) would be a more correct expression, unless there's more to it than what you put in the question.
Where it really matters is when you're using capturing groups or where you're using the content of the matched text after running the regular expression:
If you want to capture all text before the number, but not the number itself, and use it after, you could do this:
(.*?(?=\d))
The ? makes the match non-greedy, so it will only match up to the first number. All text leading up to the number will be in the match result as the first group.
Please find the difference below
In detail
.* means matches any character (except newline)
(?=\d) means Positive Lookahead - Assert that the regex below can be matched
\d match a digit [0-9]
(?=.*\d)
CapturingGroup
MatchOnlyIfFollowedBy
Sequence: match all of the followings in order
Repeat
AnyCharacterExcept\n
zero or more times
Digit
.*(?=\d)
Sequence: match all of the followings in order
Repeat
AnyCharacterExcept\n
zero or more times
CapturingGroup
MatchOnlyIfFollowedBy
Digit

Match backwards from a given word with javascript

Using Javascript, I need to find an occurrence of a phrase in some text then match everything from it back to the last occurrence of a 5 digit number. (or at least thats the best way I know how to describe what I need)
Consider the following text:
24854
Random words
Ending Words
34975
Random words
Ending Words
47593
Random words
Ending Words
Target Word
32302
Random words
Ending Words
Given the above, I'd like my regex to match Every thing from 47593 to Target Word.
Each match should include both 47593 and Target Word
It needs to be greedy in that there will be multiple matches in my actual text and I need them all returned in an array.
This is what I've tried: .match(/[0-9]{5}[\s\S]+?Target Word/g)
My problem (as always with these) is the new lines. In order to match across multiple lines, I'm using [\s\S] but doing so makes the regex match everything from the first 5 digit number to the first occurrence of Target Word
How can I change this to achieve the desired result? I'm thinking I need to use lookbehind but most examples I've found have been very confusing for me.
You could use negative lookahead,
[0-9]{5}(?:(?![0-9]{5})[\S\s])*?Target\s*Word
DEMO
The above negative lookahead (?:(?![0-9]{5})[\S\s])* asserts that after the 5 digit number, match any space or non-space character zero or more times but it must not be a 5 digit number.
if there are no 5 digit pattern in the random words, you may perhaps use
/([\d]{5}(?:[^\d]{5})+?Target Word)/gm
demo here

Categories

Resources