Negative lookahead ends match before the last character I need - javascript

I am looking to identify parts of a string that are hex.
So if you consider the string
CHICKENORBEEFPIE, the match would be BEEF.
To do this I came up with this expression /[A-F0-9]{2,}(?![^A-F0-9])/g
This works perfectly - except it only matches BEE, not BEEF. Unless BEEF happened to be at the end of the string.

The negative lookahead (?![^A-F0-9]) means: do not match anything followed by any characters other than A-F, 0-9. Which translates to match pattern followed by A-F, 0-9. Your regex is matching 'BEE' because it is followed by F, which satisfies the condition.
If you want to identify sequences of two or more characters that are hex code, just eliminate the negative lookahead altogether.
/[A-F0-9]{2,}/g translates to: Find as many matches, a pattern consisting of A-F or 0-9 that are 2 or more characters long.

It is because the last part of your regex: (?![^A-F0-9])
Because of that, you are matching any strings that aren't followed by a non-hex character... which ultimately means to find strings where the next character is a hex character.
You could either remove the ^ or remove that whole piece altogether as it isn't necessary. The following will retrieve what you are looking for: /[A-F0-9]{2,}/g

[A-F0-9]{2,}(?![A-F0-9]) will match what is expected, however negative lookahead is superfluous because quantifier are greedy by default.
[A-F0-9]{2,}(?![^A-F0-9]) doesn't work because assertion is that following character must not be any character except A-F0-9 (double negation).
the reason why the last character F in BEEF is not matched is that after matching BEEF, negtaive lookahead fails P is in [^A-F0-9] which makes backtrack to BEE which success because F is not in [^A-F0-9].

If you need the given result with pair-based values you can use /([A-F0-9]{2})+/g, if not (if it doesn't matter whether it's odd or not) you can use /[A-F0-9]{2,}/g instead.
Hope it helps.

Use
/[A-F0-9]{2,}(?![^A-F0-9])*/g

Related

RegExp: How to find only one match (or not match pattern)

I can't get how to write regexp right to be able match only heo. So for example if we found some l
char during parsing - cancel that match then.
'heo heo helo'.match(/he.*(?!l)o/gi) // should be only [heo, heo]
UPD:
I need to match as mutch as possible times among the string. Not the first one. Thanks
Example (wrong one):
console.log('heo heo helo'.match(/he.*(?!l)o/gi))
There are two issues:
.* - matches any zero or more chars other than line break chars as many as possible, and thus will match till the last occurrence of the subsequent patterns in the regex. You might use a non-greedy .*? here to fix the issue.
(?!l)o - always matches o, since o is not l, (?!l), a negative lookahead, always returns true, saying, yes, go ahead and return the match. You wanted a negative lookbehind, (?<!l) here.
To match strings starting with he and then matching any chars (other than line break chars) as few as possible and then o not preceded with l, you can use
/he.*?(?<!l)o/gi
See this regex demo. The .*?(?<!l)o pattern matches any 0+ chars other than line break chars as few as possible up to the leftmot o that is not immediately preceded with l.
Now, if you just want to match words that start with he and end with o not preceded with l, you can use
/\bhe[a-z]*(?<!l)o\b/gi
/\bhe(?![a-z]*lo\b)[a-z]*o\b/gi
See this regex demo and this regex demo.
console.log('heo heo helo'.match(/he.*(?!l)o/gi))
You matches any characters .* before checking the condition (?!l). Your regex should check condition before matching characters.
Besides, you want to match only hexxxo (x is not l), so you should use \b in your regex. I suggest following regex.
console.log('heo heo helo aheob'.match(/\bhe[^l]*o\b/gi));

How to match one 'x' but not one or both of xs in 'xx' globally in string [duplicate]

Not quite sure how to go about this, but basically what I want to do is match a character, say a for example. In this case all of the following would not contain matches (i.e. I don't want to match them):
aa
aaa
fooaaxyz
Whereas the following would:
a (obviously)
fooaxyz (this would only match the letter a part)
My knowledge of RegEx is not great, so I am not even sure if this is possible. Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
^[^\sa]*\Ka(?=[^\sa]*$)
DEMO
\K discards the previously matched characters and lookahead assertes whether a match is possibel or not. So the above matches only the letter a which satifies the conditions.
OR
a{2,}(*SKIP)(*F)|a
DEMO
You may use a combination of a lookbehind and a lookahead:
(?<!a)a(?!a)
See the regex demo and the regex graph:
Details
(?<!a) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a a char
a - an a char
(?!a) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a a char.
You need two things:
a negated character class: [^a] (all except "a")
anchors (^ and $) to ensure that the limits of the string are reached (in other words, that the pattern matches the whole string and not only a substring):
Result:
^[^a]*a[^a]*$
Once you know there is only one "a", you can use the way you want to extract/replace/remove it depending of the language you use.

Regex: how to exclude empty match from somthing like (RegexA)?(RegexB)?(RegexA)? [duplicate]

I have regex which works fine in my application, but it matches an empty string too, i.e. no error occurs when the input is empty. How do I modify this regex so that it will not match an empty string ? Note that I DON'T want to change any other functionality of this regex.
This is the regex which I'm using: ^([0-9\(\)\/\+ \-]*)$
I don't know a lot about regex formulation myself, which is why I'm asking. I have searched for an answer, but couldn't find a direct one. Closest I got to was this: regular expression for anything but an empty string in c#, but that doesn't really work for me ..
Replace "*" with "+", as "*" means "0 or more occurrences", while "+" means "at least one occurrence"
There are a lot of pattern types that can match empty strings. The OP regex belongs to an ^.*$ type, and it is easy to modify it to prevent empty string matching by replacing * (= {0,}) quantifier (meaning zero or more) with the + (= {1,}) quantifier (meaning one or more), as has already been mentioned in the posts here.
There are other pattern types matching empty strings, and it is not always obvious how to prevent them from matching empty strings.
Here are a few of those patterns with solutions:
[^"\\]*(?:\\.[^"\\]*)* ⇒ (?:[^"\\]|\\.)+
abc||def ⇒ abc|def (remove the extra | alternation operator)
^a*$ ⇒ ^a+$ (+ matches 1 or more chars)
^(a)?(b)?(c)?$ ⇒ ^(?!$)(a)?(b)?(c?)$ (the (?!$) negative lookahead fails the match if end of string is at the start of the string)
or ⇒ ^(?=.)(a)?(b)?(c?)$ (the (?=.) positive lookahead requires at least a single char, . may match or not line break chars depending on modifiers/regex flavor)
^$|^abc$ ⇒ ^abc$ (remove the ^$ alternative that enables a regex to match an empty string)
^(?:abc|def)?$ ⇒ ^(?:abc|def)$ (remove the ? quantifier that made the (?:abc|def) group optional)
To make \b(?:north|south)?(?:east|west)?\b (that matches north, south, east, west, northeast, northwest, southeast, southwest), the word boundaries must be precised: make the initial word boundary only match start of words by adding (?<!\w) after it, and let the trailing word boundary only match at the end of words by adding (?!\w) after it.
\b(?:north|south)?(?:east|west)?\b ⇒ \b(?<!\w)(?:north|south)?(?:east|west)?\b(?!\w)
You can either use + or the {min, max} Syntax:
^[0-9\(\)\/\+ \-]{1,}$
or
^[0-9\(\)\/\+ \-]+$
By the way: this is a great source for learning regular expressions (and it's fun): http://regexone.com/
Obviously you need to replace Replace * with +, as + matches 1 or more character. However inside character class you don't to do all that escaping you're doing. Your regex can be simplified to:
^([0-9()\/+ -]+)$

Regular expression match 0 or exact number of characters

I want to match an input string in JavaScript with 0 or 2 consecutive dashes, not 1, i.e. not range.
If the string is:
-g:"apple" AND --projectName:"grape": it should match --projectName:"grape".
-g:"apple" AND projectName:"grape": it should match projectName:"grape".
-g:"apple" AND -projectName:"grape": it should not match, i.e. return null.
--projectName:"grape": it should match --projectName:"grape".
projectName:"grape": it should match projectName:"grape".
-projectName:"grape": it should not match, i.e. return null.
To simplify this question considering this example, the RE should match the preceding 0 or 2 dashes and whatever comes next. I will figure out the rest. The question still comes down to matching 0 or 2 dashes.
Using -{0,2} matches 0, 1, 2 dashes.
Using -{2,} matches 2 or more dashes.
Using -{2} matches only 2 dashes.
How to match 0 or 2 occurrences?
Answer
If you split your "word-like" patterns on spaces, you can use this regex and your wanted value will be in the first capturing group:
(?:^|\s)((?:--)?[^\s-]+)
\s is any whitespace character (tab, whitespace, newline...)
[^\s-] is anything except a whitespace-like character or a -
Once again the problem is anchoring the regex so that the relevant part isn't completely optionnal: here the anchor ^ or a mandatory whitespace \s plays this role.
What we want to do
Basically you want to check if your expression (two dashes) is there or not, so you can use the ? operator:
(?:--)?
"Either two or none", (?:...) is a non capturing group.
Avoiding confusion
You want to match "zero or two dashes", so if this is your entire regex it will always find a match: in an empty string, in --, in -, in foobar... What will be match in these string will be an empty string, but the regex will return a match.
This is a common source of misunderstanding, so bear in mind the rule that if everything in your regex is optional, it will always find a match.
If you want to only return a match if your entire string is made of zero or two dashes, you need to anchor the regex:
^(?:--)?$
^$ match respectively the beginning and end of the string.
a(-{2})?(?!-)
This is using "a" as an example. This will match a followed by an optional 2 dashes.
Edit:
According to your example, this should work
(?<!-)(-{2})?projectName:"[a-zA-Z]*"
Edit 2:
I think Javascript has problems with lookbehinds.
Try this:
[^-](-{2})?projectName:"[a-zA-Z]*"
Debuggex Demo

RegEx in JS to find No 3 Identical consecutive characters

How to find a sequence of 3 characters, 'abb' is valid while 'abbb' is not valid, in JS using Regex (could be alphabets,numerics and non alpha numerics).
This question is a variation of the question that I have asked in here : How to combine these regex for javascript.
This is wrong : /(^([0-9a-zA-Z]|[^0-9a-zA-Z]))\1\1/ , so what is the right way to do it?
This depends on what you actually mean. If you only want to match three non-identical characters (that is, if abb is valid for you), you can use this negative lookahead:
(?!(.)\1\1).{3}
It first asserts, that the current position is not followed by three times the same character. Then it matches those three characters.
If you really want to match 3 different characters (only stuff like abc), it gets a bit more complicated. Use these two negative lookaheads instead:
(.)(?!\1)(.)(?!\1|\2).
First match one character. Then we assert, the this is not followed by the same character. If so, we match another character. Then we assert that these are followed neither by the first nor the second character. Then we match a third character.
Note that those negative lookaheads ((?!...)) do not consume any characters. That is why they are called lookaheads. They just check what is coming next (or in this case what is not coming next) and then the regex continues from where it left of. Here is a good tutorial.
Note also that this matches anything but line breaks, or really anything if you use the DOTALL or SINGLELINE option. Since you are using JavaScript you can just activate the option by appending s after the regexes closing delimiter. If (for some reason) you don't want to use this option, replace the .s by [\s\S] (this always matches any character).
Update:
After clarification in the comments, I realised that you do not want to find three non-identical characters, but instead you want to assert that your string does not contain three identical (and consecutive) characters.
This is a bit easier, and closer to your former question, since it only requires one negative lookahead. What we do is this: we search the string from the beginning for three consecutive identical characters. But since we want to assert that these do not exist we wrap this in a negative lookahead:
^(?!.*(.)\1\1)
The lookahead is anchored to the beginning of the string, so this is the only place where we will look. The pattern in the lookahead then tries to find three identical characters from any position in the string (because of the .*; the identical characters are matched in the same way as in your previous question). If the pattern finds these, the negative lookahead will thus fail, and so the string will be invalid. If not three identical characters can be found, the inner pattern will never match, so the negative lookahead will succeed.
To find non-three-identical characters use regex pattern
([\s\S])(?!\1\1)[\s\S]{2}

Categories

Resources