Regex lookaround for a group doesn't work - javascript

Happy Saturday,
I'm wondering if Stackoverflow's users could give me a clue about one specific Regex..
(^visite\d+)(?!\D)
The above regex works well..
It says that :
visite12345 --> is a good anwser (the string does match)
visite1a --> is not a good anwser (the string doesn't match)
However for:
visite12345a --> It doesn't work.
Indeed, the output is visite1234, whereas I'd like to get the same answer that for visite1a (string doesn't match)...
I use http://regexr.com/ to test my regexp.
Do you have any idea how to so?
Thank you very much.

The regex (^visite\d+)(?!\D) matches visite at the start of the string, followed with one or more digits that should not be followed with a non-digit.
The "issue" is that the engine can backtrack within \d+ pattern and it can match 2 digits if the third is not followed with a nondigit.
The best way to solve it is to check the actual requirements and adjust the pattern.
If the digits are the last characters in the string you just should replace the lookahead with the $ anchor.
A generic solution for this is making the subpattern atomic with a capturing group inside a positive lookahead and a backreference, and make sure the lookahead is changed to something like (?![a-zA-Z]) - fail if there is a letter):
/^visite(?=(\d+))\1(?![a-z])/i
See the regex demo
Or if a word boundary should follow the digits (i.e. digits should be followed with a letter, digit or an underscore), use \b instead of the lookahead:
/^visite\d+\b/
See another demo

Related

Regular expressions: prohibit the use of characters [duplicate]

I have a regex
/^([a-zA-Z0-9]+)$/
this just allows only alphanumerics but also if I insert only number(s) or only character(s) then also it accepts it. I want it to work like the field should accept only alphanumeric values but the value must contain at least both 1 character and 1 number.
Why not first apply the whole test, and then add individual tests for characters and numbers? Anyway, if you want to do it all in one regexp, use positive lookahead:
/^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+)$/
This RE will do:
/^(?:[0-9]+[a-z]|[a-z]+[0-9])[a-z0-9]*$/i
Explanation of RE:
Match either of the following:
At least one number, then one letter or
At least one letter, then one number plus
Any remaining numbers and letters
(?:...) creates an unreferenced group
/i is the ignore-case flag, so that a-z == a-zA-Z.
I can see that other responders have given you a complete solution. Problem with regexes is that they can be difficult to maintain/understand.
An easier solution would be to retain your existing regex, then create two new regexes to test for your "at least one alphabetic" and "at least one numeric".
So, test for this :-
/^([a-zA-Z0-9]+)$/
Then this :-
/\d/
Then this :-
/[A-Z]/i
If your string passes all three regexes, you have the answer you need.
The accepted answers is not worked as it is not allow to enter special characters.
Its worked perfect for me.
^(?=.*[0-9])(?=.*[a-zA-Z])(?=\S+$).{6,20}$
one digit must
one character must (lower or upper)
every other things optional
Thank you.
While the accepted answer is correct, I find this regex a lot easier to read:
REGEX = "([A-Za-z]+[0-9]|[0-9]+[A-Za-z])[A-Za-z0-9]*"
This solution accepts at least 1 number and at least 1 character:
[^\w\d]*(([0-9]+.*[A-Za-z]+.*)|[A-Za-z]+.*([0-9]+.*))
And an idea with a negative check.
/^(?!\d*$|[a-z]*$)[a-z\d]+$/i
^(?! at start look ahead if string does not
\d*$ contain only digits | or
[a-z]*$ contain only letters
[a-z\d]+$ matches one or more letters or digits until $ end.
Have a look at this regex101 demo
(the i flag turns on caseless matching: a-z matches a-zA-Z)
Maybe a bit late, but this is my RE:
/^(\w*(\d+[a-zA-Z]|[a-zA-Z]+\d)\w*)+$/
Explanation:
\w* -> 0 or more alphanumeric digits, at the beginning
\d+[a-zA-Z]|[a-zA-Z]+\d -> a digit + a letter OR a letter + a digit
\w* -> 0 or more alphanumeric digits, again
I hope it was understandable
What about simply:
/[0-9][a-zA-Z]|[a-zA-Z][0-9]/
Worked like a charm for me...
Edit following comments:
Well, some shortsighting of my own late at night: apologies for the inconvenience...
The - incomplete - underlying idea was that only one "transition" from a digit to an alpha or from an alpha to a digit was needed somewhere to answer the question.
But next regex should do the job for a string only comprised of alphanumeric characters:
/^[0-9a-zA-Z]*([0-9][a-zA-Z]|[a-zA-Z][0-9])[0-9a-zA-Z]*$/
which in Javascript can be furthermore simplified as:
/^[0-9a-z]*([0-9][a-z]|[a-z][0-9])[0-9a-z]*$/i
In IMHO it's more straigthforward to read and understand than some other answers (no backtraking and the like).
Hope this helps.
If you need the digit to be at the end of any word, this worked for me:
/\b([a-zA-Z]+[0-9]+)\b/g
\b word boundary
[a-zA-Z] any letter
[0-9] any number
"+" unlimited search (show all results)

Negative Lookahead & Lookbehind with Capture Groups and Word Boundaries

We are auto-formatting hyperlinks in a message composer but would like to avoid matching links that are already formatted.
Attempt: Build a regex that uses a negative lookbehind and negative lookahead to exclude matches where the link is surrounded by href=" and ".
Problem: Negative lookbehind/lookahead are not working with our regex:
Regex:
/(?<!href=")(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9#:%._+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_+.~#?&\/\/=;]*)(?!")/g
Usage:
html.match(/(?<!")(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=;]*)(?!")/g);
When testing, we notice that exchanging the negative lookahead/lookbehind with a positive version causes it to work. Thus, only negative lookbehind/lookaheads are not working.
Does anyone know why these negative lookbehind/lookaheads are not functioning with this regex?
Thank you!
With #Barmar's help in the question comments, it is clear that the problem lies in the optional beginning and end of the regex.
"Basically, anything that allows something to be optional next to a negative lookaround may negate the effect of the lookaround, if it can find a shorter match that isn't next to it. "
If using modern JS that supports variable length lookbehind assertions, you can
utilize non-greedy variability into the lookbehind.
This allows the regex to now introduce optional beginnings like what you have.
/(?<!href="[^"]*?)(?:https?:\/\/.)?(?:www\.)?[a-zA-Z0-9#%+\-.:=#_~]{2,256}\.[a-z]{2,6}\b[a-zA-Z0-9#%&+\--\/:;=?#_~]*(?!")/
https://regex101.com/r/OdJyZf/1
(?<! href=" [^"]*? )
(?: https?:// . )?
(?: www \. )?
[a-zA-Z0-9#%+\-.:=#_~]{2,256} \. [a-z]{2,6} \b [a-zA-Z0-9#%&+\--/:;=?#_~]*
(?! " )
I must make a correction. In my comments I said that
the word boundary \b here [a-z]{2,6}\b[a-zA-Z0-9#%&+\--/:;=?#_~] effectively removes the word class \w in the following class.
This is true but only for the first following letter. All the following chars seem to include word chars so it's needed.
It's a clear example of overthinking something that does not need to be.
The whole regex should be able to be rewritten using \w in the classes unless ASCII is required.
Note that this will only work for the new JS engine and C# (of course).

Regex for a valid hashtag

I need regular expression for validating a hashtag. Each hashtag should starts with hashtag("#").
Valid inputs:
1. #hashtag_abc
2. #simpleHashtag
3. #hashtag123
Invalid inputs:
1. #hashtag#
2. #hashtag#hashtag
I have been trying with this regex /#[a-zA-z0-9]/ but it is accepting invalid inputs also.
Any suggestions for how to do it?
The current accepted answer fails in a few places:
It accepts hashtags that have no letters in them (i.e. "#11111", "#___" both pass).
It will exclude hashtags that are separated by spaces ("hey there #friend" fails to match "#friend").
It doesn't allow you to place a min/max length on the hashtag.
It doesn't offer a lot of flexibility if you decide to add other symbols/characters to your valid input list.
Try the following regex:
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,30})(\b|\r)/g
It'll close up the above edge cases, and furthermore:
You can change {1,30} to your desired min/max
You can add other symbols to the [0-9_] and [a-zA-Z0-9_] blocks if you wish to later
Here's a link to the demo.
To answer the current question...
There are 2 issues:
[A-z] allows more than just letter chars ([, , ], ^, _, ` )
There is no quantifier after the character class and it only matches 1 char
Since you are validating the whole string, you also need anchors (^ and $)to ensure a full string match:
/^#\w+$/
See the regex demo.
If you want to extract specific valid hashtags from longer texts...
This is a bonus section as a lot of people seek to extract (not validate) hashtags, so here are a couple of solutions for you. Just mind that \w in JavaScript (and a lot of other regex libraries) equal to [a-zA-Z0-9_]:
#\w{1,30}\b - a # char followed with one to thirty word chars followed with a word boundary
\B#\w{1,30}\b - a # char that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars followed with one to thirty word chars followed with a word boundary
\B#(?![\d_]+\b)(\w{1,30})\b - # that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars (that cannot be just digits/underscores) followed with a word boundary
And last but not least, here is a Twitter hashtag regex from https://github.com/twitter/twitter-text/tree/master/js... Sorry, too long to paste in the SO post, here it is: https://gist.github.com/stribizhev/715ee1ee2dc1439ffd464d81d22f80d1.
You could try the this : /#[a-zA-Z0-9_]+/
This will only include letters, numbers & underscores.
A regex code that matches any hashtag.
In this approach any character is accepted in hashtags except main signs !##$%^&*()
(?<=(\s|^))#[^\s\!\#\#\$\%\^\&\*\(\)]+(?=(\s|$))
Usage Notes
Turn on "g" and "m" flags when using!
It is tested for Java and JavaScript languages via https://regex101.com and VSCode tools.
It is available on this repo.
Unicode general categories can help with that task:
/^#[\p{L}\p{Nd}_]+$/gu
I use \p{L} and \p{Nd} unicode categories to match any letter or decimal digit number. You can add any necessary category for your regex. The complete list of categories can be found here: https://unicode.org/reports/tr18/#General_Category_Property
Regex live demo:
https://regexr.com/5tvmo
useful and tested regex for detecting hashtags in the text
/(^|\s)(#[a-zA-Z\d_]+)/ig
examples of valid matching hashtag:
#abc
#ab_c
#ABC
#aBC
/\B(?:#|#)((?![\p{N}_]+(?:$|\b|\s))(?:[\p{L}\p{M}\p{N}_]{1,60}))/ug
allow any language characters or characters with numbers or _.
numbers alone or numbers with _ are not allowed.
It's unicode regex, so if you are using Python, you may need to install regex.
to test it https://regex101.com/r/NLHUQh/1

Regex which returns false when string contains 2 non consecutive forward slashes (negative lookahead)

I'm trying to build a regex which should match when only one forward slash is found and false when 2 or more forward slashes are found. The capturing group is not used, olny if it matches, and the regex is executed by javascript.
/this-should-match
/this-should/not-match
I've tried a couple of regexps, including using a negative lookahead, but I can't seem to find the solution. Some patterns I've tried:
/\/(.*)(?!\/)/i
/\/(.*)[?!\/]/i
/\/(.*[?!\/])/i
Any regex genius over here knows the solution? I'm aware regex is meant to find an occurrence of a pattern, but there should be some solution for this?
Use negated character class instead of look arounds.
^\/[^/]+$
^ Anchors the regex at the start of the string.
[^/] negated character class. Matches anything other than /
$ Anchors the regex at the end of the string. Ensures that nothing follows the string that is matched by the pattern.
Regex Demo
Example
"/this-should-match".match(/^\/[^/]+$/)
=> ["/this-should-match"]
"/this-should-match/not-match".match(/^\/[^/]+$/)
=> null

Regular Expression for a REST endpoint

Can someone please help me in defining a regular expression for an endpoint.
person/^((?!-).)*$/
This regex needs to match a number of things but mainly:
person/:id
it should NOT match
person/1234-5678-9123 (it's currently not matching this which is good)
the problem I have is that it should NOT match this but it is:
person/123456789123 (it's currently matching this but shouldn't)
To be clear, If you go to: http://regex101.com and paste in:
^((?!-).)*$
You can see that is matches 123456789123 WHICH IS WRONG
How can I change the RegEx so it doesn't match 123456789123
Cheers.
Your regex ^((?!-).)*$ is same as ^[^-]*$ that is match any charcater but not of - zero or more times.
The reason for why your regex not matches this person/1234-5678-9123 is because it has - symbol. But person/123456789123 string isn't has - symbol, so this got matched.
To match the string which has - between the numbers then you could try the below regex.
^.*?\d+-\d+.*$
OR
^(?=.*?-).+$
(?=.*?-) Positive lookahead asserts that the string must contain an - symbol.
DEMO

Categories

Resources