Pattern suggestions to work around alternation operator limitations - javascript

I'm trying to build a regex pattern matcher.
The complete string pattern is as follow:
AB123456C12
Letter A
Letter B
six digits
one letter
two digits.
I'm trying to match as much as possible, but partial inputs are allowed as long as the initial AB is present.
The RegEx engine is Javascript. Hoping to be fully cross-browser compatible.
I do have a pattern that works:
^AB([0-9]{6}[A-Z][0-9]{0,2}|[0-9]{0,6})$
But it only works when the arguments of the alternation operator are in this position. Said otherwise,
^AB([0-9]{0,6}|[0-9]{6}[A-Z][0-9]{0,2})$
doesn't work - which makes me believe that the solution may not work in some obscure browser.
So, any other way to define that pattern?
Thanks.
Edited for clarity: the followings are inputs that must be matched by the regex:
AB
AB123
AB123456Z
The followings input are to be rejected:
B
B123456Z12
ABC
123456

This may help
^AB[0-9]{6}[A-Z][0-9]{2}$

I think you are looking for this.
# ^AB(?:(?:[0-9]{6}(?:[A-Z][0-9]{0,2})?)|[0-9]{1,5})?$
^ # BOS
AB # AB
(?: # Optional cl-1
(?: # Required cl-2
[0-9]{6} # Required 6 digits
(?: # Optional cl-3
[A-Z] # Required A-Z letter
[0-9]{0,2} # Required 0-2 (0 means optional)
)? # End cl-3
) # End cl-2
| # or
[0-9]{1,5} # Required 1-5 digits
)? # End cl-1
$ # EOS

Related

JS regex to match a username with specific special characters and no consecutive spaces

I am pretty new to this reg ex world. Struck up with small task regarding Regex.
Before posting new question I have gone thru some answers which am able to understand but couldnt crack the solution for my problem
Appreciate your help on this.
My Scenario is:
Validating the Username base on below criteria
1- First character has to be a-zA-Z0-9_# (either of two special characters(_#) or alphanumeric)
2 - The rest can be any letters, any numbers and -#_ (either of three special characters and alphanumeric).
3 - BUT no consecutive spaces between words.
4- Max size should be 30 characters
my username might contain multiple words seperated by single space..for the first word only _# alphanumeric are allowed and for the second word onwards it can contain _-#aphanumeric
Need to ignore the Trailing spaces at the end of the username
Examples are: #test, _test, #test123, 123#, test_-#, test -test1, #test -_#test etc...
Appreciate your help on this..
Thanks
Arjun
Here you go:
^(?!.*[ ]{2,})[\w#][-#\w]{0,29}$
See it working on regex101.com.
Condition 3 is ambigouus though as you're not allowing spaces anyway. \w is a shortcut for [a-zA-Z_], (?!...) is called a neg. lookahead.
Broken down this says:
^ # start of string
(?!.*[ ]{2,}) # neg. lookahead, no consecutive spaces
[\w#] # condition 1
[-#\w]{0,29} # condition 2 and 4
$ # end of string
This might work ^(?=.{1,30}$)(?!.*[ ]{2})[a-zA-Z0-9_#]+(?:[ ][a-zA-Z0-9_#-]+)*$
Note - the check for no consecutive spaces (?! .* [ ]{2} ) is not really
necessary since the regex body only allows a single space between words.
It is left in for posterity, take it out if you want.
Explained
^ # BOS
(?= .{1,30} $ ) # Min 1 character, max 30
(?! .* [ ]{2} ) # No consecutive spaces (not really necessary here)
[a-zA-Z0-9_#]+ # First word only
(?: # Optional other words
[ ]
[a-zA-Z0-9_#-]+
)*
$ # EOS

Livecycle RegExp - trouble with decimal

Within Livecycle, I am validating that the number entered is a 0 through 10 and allows quarter hours. With the help of this post, I've written the following.
if (!xfa.event.newText.match(/^(([10]))$|^((([0-9]))$|^((([0-9]))\.?((25)|(50)|(5)|(75)|(0)|(00))))$/))
{
xfa.event.change = "";
};
The problem is periods are not being accepted. I have tried wrapping the \. in parenthesis but that did not work either. The field is a text field with no special formatting and the code in the change event.
Yikes, that's a convoluted regex. This can be simplified a lot:
/^(?:10|[0-9](?:\.(?:[27]?5)?0*)?)$/
Explanation:
^ # Start of string
(?: # Start of group:
10 # Either match 10
| # or
[0-9] # Match 0-9
(?: # optionally followed by this group:
\. # a dot
(?:[27]?5)? # either 25, 75 or 5 (also optional)
0* # followed by optional zeroes
)? # As said before, make the group optional
) # End of outer group
$ # End of string
Test it live on regex101.com.

Javascript Regular expression for currency amount with spaces

I have this regular expression
/^[',",\+,<,>,\(,\*,\-,%]?([£,$,€]?\d+([\,,\.]\d+)?[£,$,€]?\s*[\-,\/,\,,\.,\+]?[\/]?\s*)+[',",\+, <,>,\),\*,\-,%]?$/
It matches this very well $55.5, but in few of my test data I have some values like $ 55.5 (I mean, it has a space after $ sign).
The answers on this link are not working for me.
Currency / Percent Regular Expression
So, how can I change it to accept the spaces as well?
Try following RegEx:
/^[',",\+,<,>,\(,\*,\-,%]?([£,$,€]?\s*\d+([\,,\.]\d+)?[£,$,€]?\s*[\-,\/,\,,\.,\+]?[\/]?\s*)+[',",\+, <,>,\),\*,\-,%]?$/
Let me know if it worked!
Demo Here
TLDR:
/^[',",\+,<,>,\(,\*,\-,%]?([£,$,€]?\s*\d+([\,,\.]\d+)?[£,$,€]?\s*[\-,\/,\,,\.,\+]?[\/]?\s*)+[',",\+, <,>,\),\*,\-,%]?$/
The science bit
Ok, I'm guessing that you didn't construct the original regular expression, so here are the pieces of it, with the addition marked:
^ # match from the beginning of the string
[',",\+,<,>,\(,\*,\-,%]? # optionally one of these symbols
( # start a group
[£,$,€]? # optionally one of these symbols
\s* # <--- NEW ADDITION: optionally one or more whitespace
\d+ # then one or more decimal digits
( # start group
[\,,\.] # comma or a dot
\d+ # then one or more decimal digits
)? # group optional (comma/dot and digits or neither)
[£,$,€]? # optionally one of these symbols
\s* # optionally whitespace
[\-,\/,\,,\.,\+]? # optionally one of these symbols
[\/]? # optionally a /
\s* # optionally whitespace
)+ # this whole group one or more times
[',",\+, <,>,\),\*,\-,%]? # optionally one of these symbols
$ # match to the end of the string
Much of this is poking about matching stuff around the currency amount, so you could reduce that.

How can I make this regular expression not result in "catastrophic backtracking"?

I'm trying to use a URL matching regular expression that I got from http://daringfireball.net/2010/07/improved_regex_for_matching_urls
(?xi)
\b
( # Capture 1: entire matched URL
(?:
https?:// # http or https protocol
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)
Based on the answers to another question, it appears that there are cases that cause this regex to backtrack catastrophically. For example:
var re = /\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i;
re.test("http://google.com/?q=(AAAAAAAAAAAAAAAAAAAAAAAAAAAAA)")
... can take a really long time to execute (e.g. in Chrome)
It seems to me that the problem lies in this part of the code:
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
... which seems to be roughly equivalent to (.+|\((.+|(\(.+\)))*\))+, which looks like it contains (.+)+
Is there a change I can make that will avoid that?
Changing it to the following should prevent the catastrophic backtracking:
(?xi)
\b
( # Capture 1: entire matched URL
(?:
https?:// # http or https protocol
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)
The only change that was made was to remove the + after the first [^\s()<>] in each of the "balanced parens" portions of the regex.
Here is the one-line version for testing with JS:
var re = /\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i;
re.test("http://google.com/?q=(AAAAAAAAAAAAAAAAAAAAAAAAAAAAA")
The problem portion of the original regex is the balanced parentheses section, to simplify the explanation of why the backtracking occurs I am going to completely remove the nested parentheses portion of it because it isn't relevant here:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # original
\(([^\s()<>]+)*\) # expanded below
\( # literal '('
( # start group, repeat zero or more times
[^\s()<>]+ # one or more non-special characters
)* # end group
\) # literal ')'
Consider what happens here with the string '(AAAAA', the literal ( would match and then AAAAA would be consumed by the group, and the ) would fail to match. At this point the group would give up one A, leaving AAAA captured and attempting to continue the match at this point. Because the group has a * following it, the group can match multiple times so now you would have ([^\s()<>]+)* matching AAAA, and then A on the second pass. When this fails an additional A would be given up by the original capture and consumed by the second capture.
This would go on for a long while resulting in the following attempts to match, where each comma-separated group indicates a different time that the group is matched, and how many characters that instance matched:
AAAAA
AAAA, A
AAA, AA
AAA, A, A
AA, AAA
AA, AA, A
AA, A, AA
AA, A, A, A
....
I may have counted wrong, but I'm pretty sure it adds up to 16 steps before it is determined that the regex cannot match. As you continue to add additional characters to the string the number of steps to figure this out grows exponentially.
By removing the + and changing this to \(([^\s()<>])*\), you would avoid this backtracking scenario.
Adding the alternation back in to check for the nested parentheses doesn't cause any problems.
Note that you may want to add some sort of anchor to the end of the string, because currently "http://google.com/?q=(AAAAAAAAAAAAAAAAAAAAAAAAAAAAA" will match up to just before the (, so re.test(...) would return true because http://google.com/?q= matches.

Regex validation for comma separated string

I need to validate and input string client side.
Here is an example of the string:
1:30-1:34, 1:20-1:22, 1:30-1:37,
It's basically time codes for a video.
Can this be done with regex?
Banging my head against the wall...
^(?:\b\d+:\d+-\d+:\d+\b(?:, )?)+$
would probably work; at least it matches your example. But you might need to add a few edge cases to make the rules for matching/not matching clearer.
^ # Start of string
(?: # Try to match...
\b # start of a "word" (in this case, number)
\d+ # one or more digits
: # a :
\d+ # one or more digits
- # a dash
\d+ # one or more digits
: # a :
\d+ # one or more digits
\b # end of a "word"
(?:, )? # optional comma and space
)+ # repeat one or more times
$ # until the end of the string
The following is a simple representation. I have assumed that the string has the exact same form as you have shown. This may be a good starting point for you. I'll improve the regex if you provide more specific requirements.
([0-9]+:[0-9]{1,2}-[0-9]+:[0-9]{1,2},\w*)+
Explanation (inspired from Tim above)
[0-9]+   #One ore more digits
:      # A colon
[0-9]{1,2}  #A single digit or a pair of digits
-       #A dash
,       #A comma
\w*      #Optional whitespace

Categories

Resources