I'm validating a password for complexity in an ASP.NET MVC3 app. My current requirements are that it must contain at least one upper case letter, one lower case letter, one digit and no more than three repeated characters. I'd like to generalise those numbers though, and also add a condition for non-alphanumeric characters.
At present, I'm validating server-side only, so I'm able to call Regex.IsMatch multiple times using one regex for each condition. I want to be able to validate client-side too though. because unobtrusive jQuery validation will only allow one regex, I need to combine all five conditions into a single pattern.
I don't know much when it comes to regular expressions but I've been doing a bit of reading recently. I may be missing something simple but I can't find a way to AND multiple patterns together the way a | will OR them.
You can do this (in .NET) with several lookahead assertions in a single regex:
^(?=.*\p{Lu})(?:.*\p{Ll})(?=.*\d)(?=.*\W)(?!.*(.).*\1.*\1)
will match if all conditions are true.
^ # Match the start of the string
(?=.*\p{Lu}) # True if there is at least one uppercase letter ahead
(?=.*\p{Ll}) # True if there is at least one lowercase letter ahead
(?=.*\d) # True if there is at least one digit ahead
(?=.*\W) # True if there is at least one non-alnum character ahead
(?!.*(.).*\1.*\1) # True if there is no character repeated twice ahead
Note that the match is not going to consume any characters of the string - if you want the match operation to return the string you're matching against, add .* at the end of the regex.
In JavaScript, you can't use Unicode character properties. So instead you could use
^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[\W_])(?!.*(.).*\1.*\1)
which will of course only use ASCII letters for validation. If that's OK for you, fine. You could go and augment the character classes like [A-ZÄÖÜÀÈÌÒÙÁÉÍÓÚ] etc. etc. but you would probably never be complete with this. On the server side, if you want the validation to yield the same result, you'd have to specify RegexOptions.ECMAScript so the .NET regex engine behaves like the JavaScript engine (thanks Alan Moore for noticing!).
Related
I have a Rails backend that powers a Node frontend, and part of the Rails backend is validation on fields, that we than expose and pass to the Node frontend to consume.
Most of our validation rules are pretty simple things, blacklisting certain characters, etc.
But one rule is that a certain field can contain no more than 10 new lines.
Now, in Ruby this would be simple to achieve with the following:
/\A(^.*$\r?\n?){0,10}\z/
However, this is incompatible with Javascript, as the end of string and end of line characters are the same.
One way I tried which was compatible with both was the following:
/\A([^\n]*\n[^\n]*){10,}\z/
But whilst this worked fine on the Node side, this appears to be a case of Catastrophic backtracking as if the test string gets too complicated, it takes exponentially longer to complete the Regular Expression.
I know this would be a lot simpler without using a Regular Expression, but due to the current setup of our stack, it is not an option to use anything than is not supported by the Active Record Validations.
Any help would be greatly appreciated in this, as I'm banging my head into a brick wall trying to figure it out!
You cannot do this with a single regex that will work in both JavaScript and Ruby. Validating the entire contents of a string with a single regex match requires start-of-string and end-of-string anchors. JavaScript and Ruby use incompatible syntax for these.
In JavaScript you can use ^ and $ if you do not specify the /m flag to match the start and end of the string. JavaScript does not support \A and \z.
In Ruby you can use \A and \z to match the start and end of the string. Ruby does support ^ and $ but does not have an option to make these match at the start and end of the string only. They always match at embedded newlines.
The reason your regex devolves in to catastrophic backtracking is that you have [^\n]* at both the beginning and end of your repeated group. This means that a non-LF character could be matched either by the second [^\n]* or by the first [^\n]* during the next iteration. The regex engine will try all those permutations, which takes forever.
The solution is to have only one [^\n]* inside the group. For JavaScript:
^[^\n]*(?:\n[^\n]*){0,10}$
For Ruby:
\A[^\n]*(?>\n[^\n]*){0,10}\z
Putting \n at the start of the group ensures that it fails immediately when either [^\n]* is backtracked. In Ruby we can eliminate a lot of backtracking by using an atomic group.
I have a long Regex (JavaScript), and it contains the following construct:
((\\\\)|(\\[abc])|([^abc]))*
The regex says:
Match any String, that doesn't contain the letters a,b and c.
In except if they're escaped by a backslash.
If the backslash is escaped (eg. \\a), also don't match these letters.
Here's a simple match-example:
eeeaeaee\aee\\\\ae\\\\\aee
I wonder if it's possible to optimise this regulat expression. This is only a little example, the actual regex I'm using is bigger, and I have lots of code twice.
I think a more logical (and likely faster) regexp would be something like:
(?:[^abc\\]|\\.)*
In other words, a backslash will escape anything, including another backslash.
Note a few things: first, if you don't need to capture parts of the match, use non-capturing groups. That buys you a little performance. Second, when there are multiple alternatives, put the most common one first.
You might get even better performance this way (try it):
[^abc\\]*(?:\\.[^abc\\]*)*
Rather than going through the alternation for each and every character, that will "eat" runs of non-special characters with a single step. Nested * can be bad news, leading to quadratic (or worse) runtime in cases where the regex doesn't match, but in this case that won't happen.
When writing this answer, I discovered that JS's regex engine has no possessive matchers. That sucks -- you could get better worst-case performance if they were available. (An important tip for working towards regex mastery: when performance testing a regex, always test cases where it does match AND where it doesn't match. The worst-case performance generally occurs when it doesn't.)
You can match any character after a backslash or any character that is not in [abc]:
(\\.|[^abc])*
That will match the exact same language.
I think it's actually more clear what you're intention is if you flip it around like:
([^abc]|\\.)*
I can't seem to wrap my head around this one and thought I'd ask for some help here!
Basically I am validating a password field and the requirements are as follows:
- Must contain 3 consecutive letters
- Must contain at least 2 digits
- Can be in any order (e.g. 1abc342, abc24g3, 11abcsjf)
Here is what I have so far but I believe it needs some tweaking:
/[a-z]{3}[0-9][0-9]/i
The regex you are describing can be written like so:
/(?=.*?[a-z]{3})(?=.*?\d.*?\d)/
The first lookahead searches for three letters in a row, in any position. The second lookahead looks for a digit in any position, followed by a digit further ahead.
You should probably do this in two separate regular expressions: one to test for three consecutive letters and one to test for at least two digits:
/[a-z]{3}/i
/\d.*d/
Make sure both conditions are met. You could use lookahead to combine this into one regex, but I think two regexes is clearer code and a better solution.
But if I may inject some opinion on the matter: Unless you have no control over this (client specified this), I'd highly recommend not imposing password restrictions like this. They actually make your password system far less secure, not more secure. Some reading on why:
http://jimpravetz.com/blog/2011/06/cheap-gpus-are-rendering-strong-passwords-use/
http://jimpravetz.com/blog/2012/02/stupid-password-rules/
I have been researching a regular expression for the better part of about six hours today. For the life of me, I can not figure it out. I have tried what feels like about a hundred different approaches to no avail. Any help is greatly appreciated!
The basic rules:
1 - Exclude these characters in the address portion (before the # symbol): "()<>#,;:\[]*&^%$#!{}/"
2 - The address can contain a ".", but not two in a row.
I have an elegant solution to the rule number one, however, rule number two is killing me! Here is what I have so far. (I'm only including the portion up to the # sign to keep it simple). Also, it is important to note that this regular expression is being used in JavaScript, so no conditional IF is allowed.
/^[^()<>#,;:\\[\]*&^%$#!{}//]+$/
First of all, I would suggest you always choose what characters you want to allow instead of the opposite, you never know what dangerous characters you might miss.
Secondly, this is the regular expression I always use for validating emails and it works perfectly. Hope it helps you out.
/^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,6}$/i
Rule number 2
/^(?:\.?[^.])+\.?$/
which means any number of sequences of (an optional dot followed by a mandatory non dot) with an optional dot at the end.
Consider four two character sequences
xx matches as two non dot characters.
.x matches as an optional dot followed by a non-dot.
x. matches as a non-dot followed by an optional dot at the end.
.. does not match because there is no non-dot after the first dot.
One thing to remember about email addresses is that dots can appear in tricky places
"..#"#.example.com
is a valid email address.
The "..#" is a perfectly valid quoted local-part production, and .example.com is just a way of saying example.com but resolved against the root DNS instead of using a host search path. example.com might resolve to example.com.myintranet.com if myintranet.com is on the host search path but .example.com always resolves to the absolute host example.com.
First of all, to your specifications:
^(?![\s\S]*\.\.)[^()<>#,;:\\[\]*&^%$#!{}/]#.*$
It's just your regex with (?!.*\.\.) tacked onto the front. That's a negative lookahead, which doesn't match if there are any two consecutive periods anywhere in the string.
Properly matching email addresses is quite a bit harder, however.
In my web application, I create some framework that use to bind model data to control on page. Each model property has some rule like string length, not null and regular expression. Before submit page, framework validate any binded control with defined rules.
So, I want to detect what character that is allowed in each regular expression rule like the following example.
"^[0-9]+$" allow only digit characters like 1, 2, 3.
"^[a-zA-Z_][a-zA-Z_\-0-9]+$" allow only a-z, - and _ characters
However, this function should not care about grouping, positioning of allowed character. It just tells about possible characters only.
Do you have any idea for creating this function?
PS. I know it easy to create specified function like numeric only for allowing only digit characters. But I need share/reuse same piece of code both data tier(contains all model validator) and UI tier without modify anything.
Thanks
You can't solve this for the general case. Regexps don't generally ‘fail’ at a particular character, they just get to a point where they can't match any more, and have to backtrack to try another method of matching.
One could make a regex implementation that remembered which was the farthest it managed to match before backtracking, but most implementations don't do that, including JavaScript's.
A possible way forward would be to match first against ^pattern$, and if that failed match against ^pattern without the end-anchor. This would be more likely to give you some sort of match of the left hand part of the string, so you could count how many characters were in the match, and say the following character was ‘invalid’. For more complicated regexps this would be misleading, but it would certainly work for the simple cases like [a-zA-Z0-9_]+.
I must admit that I'm struggling to parse your question.
If you are looking for a regular expression that will match only if a string consists entirely of a certain collection of characters, regardless of their order, then your examples of character classes were quite close already.
For instance, ^[A-Za-z0-9]+$ will only allow strings that consist of letters A through Z (upper and lower case) and numbers, in any order, and of any length.