Performance issue while evaluating email address with a regular expression

Performance issue while evaluating email address with a regular expression - javascript

I am using below regular expression to validate email address.
/^\w+([\.-]?\w+)*#\w+([\.-]?w+)*(\.\w{2,3})+$/
Javascript Code:
var email = 'myname#company.com';
var pattern = /^\w+([\.-]?\w+)*#\w+([\.-]?w+)*(\.\w{2,3})+$/;
if(pattern.test(email)){
return true;
}
The regex evaluates quickly when I provide the below invalid email:
aseflj#$kajsdfklasjdfklasjdfklasdfjklasdjfaklsdfjaklsdjfaklsfaksdjfkasdasdklfjaskldfjjdkfaklsdfjlak#company.com
(I added #$ in the middle of the name)
However when I try to evaluate this email it takes too much time and the browser hangs.
asefljkajsdfklasjdfklasjdfklasdfjklasdjfaklsdfjaklsdjfaklsfaksdjfkasdasdklfjaskldfjjdkfaklsdfjlak#company.com1
(I added com1 in the end)
I'm sure that the regex is correct but not sure why its taking so much time to evaluate the second example. If I provide an email with shorter length it evaluates quickly. See the below example
dfjjdkfaklsdfjlak#company.com1
Please help me fix the performance issue

Your regex runs into catastrophic backtracking. Since [\.-]? in ([\.-]?\w+)* is optional, it makes the group degenerates to (\w+)*, which is a classic case of catastrophic backtracking.
Remove the ? resolves the issue.
I also remove the redundant escape of . inside character class, and changed the regex a bit.
^\w+([.-]\w+)*#\w+([.-]\w+)*\.\w{2,3}$
Do note that many new generic TLDs have more than 3 characters. Even some of the gTLD before the expansion have more than 3 characters, such as .info.
And as it is, the regex also doesn't support internationalized domain name.

Related

How do i allow only one (dash or dot or underscore) in a user form input using regular expression in javascript?

I'm trying to implement a username form validation in javascript where the username
can't start with numbers
can't have whitespaces
can't have any symbols but only One dot or One underscore or One dash
example of a valid username: the_user-one.123
example of invalid username: 1----- user
i've been trying to implement this for awhile but i couldn't figure out how to have only one of each allowed symbol:-
const usernameValidation = /(?=^[\w.-]+$)^\D/g
console.log(usernameValidation.test('1username')) //false
console.log(usernameValidation.test('username-One')) //true

How about using a negative lookahead at the start:
^(?!\d|.*?([_.-]).*\1)[\w.-]+$
This will check if the string
neither starts with digit
nor contains two [_.-] by use of capture and backreference
See this demo at regex101 (more explanation on the right side)

Preface: Due to my severe carelessness, I assumed the context was usage of the HTML pattern attribute instead of JavaScript input validation. I leave this answer here for posterity in case anyone really wants to do this with regex.
Although regex does have functionality to represent a pattern occuring consecutively within a certain number of times (via {<lower-bound>,<upper-bound>}), I'm not aware of regex having "elegant" functionality to enforce a set of patterns each occuring within a range of number of times but in any order and with other patterns possibly in between.
Some workarounds I can think of:
Make a regex that allows for one of each permutation of ordering of special characters (note: newlines added for readability):
^(?:
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*-?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*_?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*\.?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*_?[A-Za-z0-9]*\.?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*\.?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*-?[A-Za-z0-9]*\.?)
)[A-Za-z0-9]*$
Note that the above regex can be simplified if you don't want usernames to start with special characters either.
Friendly reminder to also make sure you use the HTML attributes to enforce a minimum and maximum input character length where appropriate.
If you feel that regex isn't well suited to your use-case, know that you can do custom validation logic using javascript, which gives you much more control and can be much more readable compared to regex, but may require more lines of code to implement. Seeing the regex above, I would personally seriously consider the custom javascript route.
Note: I find https://regex101.com/ very helpful in learning, writing, and testing regex. Make sure to set the "flavour" to "JavaScript" in your case.

I have to admit that Bobble bubble's solution is the better fit. Here ia a comparison of the different cases:
console.log("Comparison between mine and Bobble Bubble's solution:\n\nusername mine,BobbleBubble");
["valid-usrId1","1nvalidUsrId","An0therVal1d-One","inva-lid.userId","anot-her.one","test.-case"].forEach(u=>console.log(u.padEnd(20," "),chck(u)));
function chck(s){
return [!!s.match(/^[a-zA-Z][a-zA-Z0-9._-]*$/) && ( s.match(/[._-]/g) || []).length<2, // mine
!!s.match(/^(?!\d|.*?([_.-]).*\1)[\w.-]+$/)].join(","); // Bobble bulle
}
The differences can be seen in the last three test cases.

Regular expression fails to match the plus sign ('+') in Angular 2, but it works fine in testers

Here is the code:
export const PASSWORD_PATTERN: RegExp = /^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9`~!##$%^&*()\-_+={[}\]|\\:;"'<,>.?/]{8,}$/;
This variable is being used like this elsewhere:
Validators.pattern(PASSWORD_PATTERN);
The intention is for this code to validate passwords by making sure that they contain one lowercase letter, one uppercase letter, and one number. Passwords may contain any number of special characters, and those characters are the ones that can be found on a standard keyboard (e.g. ~ * ( } ; + ). As of now, the regular expression will match passwords containing every single special character except for the plus sign ('+'). I've tried replacing '+' with '\+' and '\\+' in the regex, but that hasn't changed the result. At one point, I got rid of every special character in the regex except for the plus sign, to test it by itself, and once again using '+', '\+', and '\\+' in the regex wouldn't produce any matches for passwords containing a plus sign.
Using the regexp I pasted earlier, this password is considered a match:
Password1`~!##$%^&*()-_=[{]}\|;:'",<.>/?
While this password isn't considered a match:
Password1`~!##$%^&*()-_=[{]}\|;:'",<.>/?+
The only difference between those two passwords is the single plus sign at the end, and the second password isn't a match whether the regex contains +, \+, or \\+.
The regular expression is working completely on the backend, though it has been modified for the language being used primarily on the backend.

Not the prettiest thing, but this seems to work (or at least go in the right direction):
let characters = /((.*[A-Z])((.*[a-z].*\d)|(.*\d.*[a-z])).*)|((.*[a-z])((.*[A-Z].*\d)|(.*\d.*[A-Z])).*)|((.*\d)((.*[a-z].*[A-Z])|(.*[A-Z].*[a-z])).*)/;
let length = /.{8,}/;
let good1 = 'abc+Def8';
let good2 = '8b2cDde+';
let bad1 = 'abc+def8';
let bad2 = 'Ab+cde';
let bad3 = 'ab+cD2'
console.log(characters.test(good1) && length.test(good1), characters.test(good2) && length.test(good2), characters.test(bad1) && length.test(bad1), characters.test(bad2) && length.test(bad2), characters.test(bad3) && length.test(bad3));

Testing it using this site, and the JS code at the bottom it looks like it is working. How are you testing to make sure the passwords are valid?
In the new regEx I changed all the symbols to [\S]. That will match all chars that are not whitespace. Do you want to limit the passwords to just special chars that are on are normally used on a keyboard, or all of the possible ones? If so you should use the [\S]. Infact there is no real reason why you shouldn't allow all characters (except end line) in a password. In which case you should replace [\S] with ..
How are you handling the password value? Are you passing it to the back end in plain text? A nice way to handle it might be to check if the password matched the regex you have on the front end, hash the password using sha-256 (and salt it with a unique user string if you want to go nice and overkill), then pass that back the the server. The server would then salt and hash the password again before storing it in a table to be compared to when the user logs in next.
This helps add an extra layer of security for the user. Assuming you are using a ssl connection between the user and the server this is not really needed (ALWAYS USE SSL), but it is always nice to be a little more on the safe side when it comes to users passwords. That being said this will not prevent someone from logging in as a user if they get a successful man in the middle attack off, because when hashing a password user side it just becomes the validation that is sent to the server and validated against. However, this avoids you ever having knowledge of the users actual password, so if someone does intercept the hash (or say the server leaks some of its pre server side hashing data. Looking at you heartbleed!) someone can't easily go a bunch of other sites and try the user's username/password combo.
const regEx = /^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])[\S]{8,}$/;
const good1 = 'A12345678a+'
const good2 = 'Aaascfsfas1##$#+#$%'
const example1 = 'Password1`~!##$%^&*()-_=[{]}\|;:",<.>/?'
const example2 = 'Password1`~!##$%^&*()-_=[{]}\|;:",<.>/?+'
const bad1 = 'a'
const bad2 = 'aasdfasfdsfads#+#$%$##'
console.log(
regEx.test(good1), regEx.test(good2),
regEx.test(example1), regEx.test(example2),
regEx.test(bad1), regEx.test(bad2)
)

It turns out that when I was appending the password to parameters I needed to wrap it in the encodeURIcomponent() method. Not sure why, but now all of the regular expression patterns I create are now working as expected.

Email Validation RegEx username/local name length check not running

I've debugged for a few hours now and have hit a wall - regex has never been my strongsuit. I have been able to alter the following regex to restrict 255 characters for domain fine, however, in trying to restrict the local/username portion of an email address I'm running into issues implementing a 64 character limit. I've gone through regex101 replacing +s and *s and attempting to understand what each pass is doing - however, even when I add a check against all non-whitespace characters with a limit of 64 it seems like the other checks pass and take precedence - although I'm not sure. Below is my regex currently without any of the 64 character checks that I've broken it with:
var emailCheck = new RegExp(/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.{0,1}([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))#((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]){1,255}([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]){1,255}([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.*$/i);
What I have so far can be seen at http://jsfiddle.net/mtqx0tz1/ , I've made other slight alterations (e.g. not allowing consecutive dots) but for the most part this regex comes from another stack post without the character limits.
Lastly, I'm aware this isn't the 'standard' so to speak and emails are checked server-side, however, I would like to be more safe than sorry...as well as work on some of my regex. Sorry if this question isn't worthy of an actual post - I'm just simply not seeing where in my passes {1,64} is failing. At this point I'm thinking about just sub-stringing the portion of the string up to the # sign and checking length that way...but it would be nice to include it in this statement since all the checks are done here to begin with.

I have used this regex validation and it works good.
The e-mail address is in the variable strIn
try
{
return Regex.IsMatch(strIn,
#"^(?("")("".+?(?<!\\)""#)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])#))" +
#"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$",
RegexOptions.IgnoreCase, TimeSpan.FromMilliseconds(250));
}
catch (RegexMatchTimeoutException)
{
return false;
}

Regular expression to match at least two special characters in any order

I have to do jQuery form validation for password.
The password should contain at least two special characters in any order. I have tried with
Regular Expression for password validation but it does not address that two random special characters can come at any order.
How do I do it using a JavaScript regular expression?

You do not have to use look-arounds in cases when you do not have to.
If you only need to make sure the string has at least 2 characters of a specific set, use this kind of a regex (with a negated class to make it more robust):
/(?:[^`!##$%^&*\-_=+'\/.,]*[`!##$%^&*\-_=+'\/.,]){2}/
See demo

In javascript it worked for me:
/(?=(.*[`!##$%\^&*\-_=\+'/\.,]){2})/

var goodtogo = false;
var pass = 'simp!le#'; //example
var times = pass.match(/[\\\[\]\/\(\)\+\*\?`!##$%\^&_=-]/g).length;
if(times >= 2)
goodtogo = true;
Now I advice you to try several passwords and if you find a bug or something don't hesitate to yell back.
And if you have more special chars just add them to the parameter for match.
Hope it helps.

Phone number validation - excluding non repeating separators

I have the following regex for phone number validation
function validatePhonenumber(phoneNum) {
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{4}$/;
return regex.test(phoneNum);
}
However, I would liek to make sure it doesn;t pass for different separators such as in
111-222.3333
Any ideas how to make sure the separators are the same always?

Just make sure beforehand that there is at most one kind of separator, then pass the string through the regex as you were doing.
function validatePhonenumber(phoneNum) {
var separators = extractSeparators(phoneNum);
if(separators.length > 1) return false;
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{3}$/;
return regex.test(phoneNum);
}
function extractSeparators(str){
// Return an array with all the distinct chars
// that are present in the passed string
// and are not numeric (0-9)
}

You can use the following regex instead:
\d{3}([-\s\.])?\d{3}\1?\d{4}
Here is a working example:
http://regex101.com/r/nN9nT7/1
As result it will match the following result:
111-222-3333 --> ok
111.222.3333 --> ok
111 222 3333 --> ok
111-222.3333
111.222-3333
111-222 3333
111 222-3333
EDIT: after Alan Moore's suggestion:
Also matches 111-2223333. That's because you made the \1 optional,
which isn't necessary. One of JavaScript's stranger quirks is that a
backreference to a group that did not participate in the match,
succeeds anyway. So if there's no first separator, ([-\s.])? succeeds
because the ? made it optional, and \1 succeeds because it's
JavaScript. But I would have used ([-\s.]?) to capture the first
separator (which might be nothing), and \1 to match the same thing
again. This works in any flavor, including JavaScript.
We can improve the regex to:
^\d{3}([-\s\.]?)\d{3}\1\d{4}$

You'll need at least two passes to keep this maintainable and extensible.
JS' RegEx doesn't allow for creating variables for use later in the RegEx, if you want to support older browsers.
If you are only supporting modern browsers, Fede's answer is just fine...
As such, with ghetto-support, you aren't going to be able to reliably check that one separator is the same value every time, without writing a really, really, really, stupidly-long RegEx, using | to basically write out the RegEx 3 times.
A better way might be to grab all of the separators, and use a reduction or a filter to check that they all have the same value.
var userEnteredNumber = "999.231 3055";
var validNumber = numRegEx.test(userEnteredNumber);
var separators = userEnteredNumber.replace(/\d+/g, "").split("");
var firstSeparator = separators[0];
var uniformSeparators = separators.every(function (separator) { return separator === firstSeparator; });
if (!uniformSeparators) { /* also not valid */ }
You could make that a little neater, using closures and some applied functions, but that's the idea.
Alternatively, here's the big, ugly RegEx that would allow you to test exactly what the user entered.
var separatorTest = /^([0-9]{3}\.[0-9]{3}\.[0-9]{3,4})|([0-9]{3}-[0-9]{3}-[0-9]{3,4})|([0-9]{3} [0-9]{3} [0-9]{3,4})|([0-9]{9,10})$/;
Notice I had to include the exact same number-test three times, wrap each one in parens (to be treated as a single group), and then separate each group with an | to check each group, like an if, else if, else... ...and then plug in a separate special case for having no separator at all...
...not pretty.
I'm also not using \d, just because it's easy to forget that - and . are both accepted "digit"s, when trying to maintain one of these abominations.
Now, a word or two of warning:
People are liable to enter all kinds of crap; if this is for a commercial site, it's likely better to just strip separators entirely and validate the number is the right size, and conforms to some specifics (eg: doesn't start with /^555555/).
If not given any instruction about number format, people will happily use either no separator or a formal number, like (555) 555-5555 (or +1 (555) 555-5555 for the really pedantic), which is obviously going to fail hard, in this system (see point #1).
Be prepared to trim what you get, before validating.
Depending on your country/region/etc laws about data-security and consumer-vs-transaction record-keeping (again, may or may not be more important in a commercial setting), it's likely better to store both a "user-given" ugly number, and a system-usable number, which you either clean on the back-end, or submit along with the user-entered text.
From a user-interaction perspective, either forcing the number to conform, explicitly (placeholders showing them xxx-xxx-xxxx right above the input, in bold), or accepting any text, and prepping it yourself, is going to be 1000x better than accepting certain forms, but not bothering to tell the user up-front, and instead telling them what they did was wrong, after they try.
It's not cool for relationships; it's equally not cool, here.
You've got 9-digit and 10-digit numbers, so if you're trying for an international solution, be prepared to deal with all international separators (, \.\-\(\)\+) etc... again, why stripping is more useful, because THAT RegEx would be insane.

Develop Reference

JavaScript is the programming language of the Web.

Performance issue while evaluating email address with a regular expression - javascript

Related

How do i allow only one (dash or dot or underscore) in a user form input using regular expression in javascript?

Regular expression fails to match the plus sign ('+') in Angular 2, but it works fine in testers

Email Validation RegEx username/local name length check not running

Regular expression to match at least two special characters in any order

Phone number validation - excluding non repeating separators

Categories

Resources