Javascript regEx email limited to 3 domains - javascript

I need to validate certain email addresses on the client side before the server side, they are limited to 3 domains, so for example
email#me.com
email#you.com
email#us.com
It can contain the standard combo of letters, numbers, underscore, hyphen, period etc before the # but the key requirement being "me.com", "you.com" or "us.com".
I'm shocking at regexes and have been at http://gskinner.com/RegExr/ for about 30mins but cant get anywhere close...
Any help is greatly appreciated.
Thanks

Just find any email regex, take the part before the #, and replace the part after it with
(me|you|us)\.com$

Related

Alter Regex to validate email Domain on RFC 5322 from Java to Oracle sql

I need to alter the below Regex to validate emails domains in a oracle database, according to RFC 5322,
and ensuring that the criteria below are respected as well.
Domain rules that must be respected:
must start an end with a letter or digit and be between 1 and 63 characters long.
may contain uppercase and lowercase Latin letters (A to Z and a to z).
may contain digits 0 to 9, provided that top-level domain names are not all-numeric.
may contain hyphen -, provided that it is not the first or last character, and not consecutive also.
must have at least 2 or more characters (abc#t.com is not valid, but abc#tt.com is valid).
I found on the internet the regex below, that works very well and ensure the rules posted above, in Javascript.
My problem is that ORACLE does not support look-ahead/-behind.
#(?:(?=[A-Z0-9-]{1,63}\.)[A-Z0-9]+(?:-[A-Z0-9]+)*\.){1,8}[A-Z]{2,63}$
So, can anyone please help me on making the necessary modifications in order to work in Oracle sql?

Trying to make URL-Matching RegEx faster for an IRC bot

Hello fellow programmers, long time lurker here xD
So I was writing this IRC bot on Node.js, and one of the main functionalities is to automatically timeout users that post links without having permissions.
After much testing and researching I came up with this regex that would match almost any URLs, considering that users will often try to circumvent the bot to post links without permission.
/((?!\w+\.+\s\w+\b)\w+\W*(\.|dot|d0t)\W*(aero|asia|biz|cat|com|coop|info|int|jobs|mobi|museum|name|net|org|post|pro|tel|travel|xxx|edu|gov|mil|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b)/i
It takes into consideration users adding spaces between dots, replacing dots with "dot" or adding special characters between the dots, while ignoring matches when users type something like "word. It was good" (since it is a valid url extension).
This regex takes care of almost any cases of users trying to circumvent the url protection, while matching almost no false positives, however my concern is that it might be a bit slow.
Does anyone know of a better regex that has the same function that runs faster or maybe know how to make improvements for this one to run faster?
Regex explanation:
Full regex:
/((?!\w+\.+\s\w+\b)\w+\W*(\.|dot|d0t)\W*(aero|asia|biz|cat|com|coop|info|int|jobs|mobi|museum|name|net|org|post|pro|tel|travel|xxx|edu|gov|mil|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b)/i
Groups:
(?!\w+\.+\s\w+) - Negative lookahead - Checks if user typed a word (\w+) followed by a dot or multiple dots (.+) and a space (\s), if so, check if the next characters are a word (\w+). If this regex group matches, then most likely the user is ending a sentence with a full stop or ellipsis, followed by another sentence, and therefore the regex shouldnt match, even if the second sentence starts with a possible url extension such as "is" or "so", and therefore the negative lookahead should stop the url matching.
\w+ - A word - this is the first part of the url considering a url such as google.com (this ignores the url protocol, if present, and the first part of the url, usually www, since our goal is just to detect urls, and not actually extract them for some other purpose).
\W*(\.|dot|d0t)\W* - Any number of non-alphanumerical characters followed by a dot (or ways to circumvent dot) followed by any number of non-alphanumerical characters - This prevents users from circumventing the filter by typing urls such as google(dot)com as well as spacing between the url words and the dots such as google . com.
(aero|asia|biz|cat|com|coop|info|int|jobs|mobi|museum|name|net|org|post|pro|tel|travel|xxx|edu|gov|mil|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw) - Matches any possible url domain extension - Not much to say here, this prevents false positives from users that do weird punctuations such as "phrase . Next phrase"
\b - A boundary match (boundary character or end-of-string)
Edit: Ive made the obvious improvement from (ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au...) to (a[cdefgilmnoqrstuwxz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnorsuvxyz]|d[dejkmoz]|e[ceghrstu]|f[ijkmor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdeghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eosuw]|s[abcdeghijklmnorstuvxyz]|t[cdfghjklmnoprtvwz]|u[agksyz]|v[aceginu]|w[fs]|y[etu]|z[amw]) does anyone know of any more improvements, or a better way to do this?
Thanks in advance,
Gabriel.
Possibly faster:
domainExtTable = { aero: true, asia: true, biz: true, ... }; // init just once
results = text.match(/((?!\w+\.+\s\w+\b)\w+\W*(\.|dot|d0t)\W*(\w{2,4})\b)/i);
domainExt = results[4];
if (domainExt in domainExtTable) { ... } // this is a match
It is hard to say, depends on how good the regexp compiler is.
Removing the lookahead is likely to speed this much more. Just to be sure, you want to NOT match "google. com"?

Requiring Letters and Numbers in form field with JavaScript

I have a form and I need to require letters and numbers. All the solutions I have seen, simply allow only letters and numbers but do not require both.
I have this Regex: /^[0-9a-zA-Z]+$/ which allows one or the other. How can I make this a requirement, meaning the text must contain at least a number.
Thanks my friends.
Guy
To break this down, we're requiring at least 2 characters, a letter and a number. In the code we start with the possibility of an alpha-numeric character. I'm not using \w because it also allows _ characters. In the group we have an or that looks for either a letter before a number, or a number before a letter. Then after the group we're requiring if anything exists that it also be alpha-numeric.
/^[A-Za-z0-9]*([A-Za-z][0-9]|[0-9][A-Za-z])[A-Za-z0-9]*$/i
As a recommendation, it's always best to use a server-side language as your front-line defense when validating a form instead of a Javascript-only approach. The reasons:
Someone can disable Javascript
The server needs to be protected from malicious attack (SQL or XSS injection)
Someone can bypass your form altogether by directly linking to the handler (if you're not requiring a valid referrer)
Some browsers like Lynx do not use Javascript, so it's not user friendly for people who need to use screen reading devices

Email Regular Expression - Excluded Specified Set

I have been researching a regular expression for the better part of about six hours today. For the life of me, I can not figure it out. I have tried what feels like about a hundred different approaches to no avail. Any help is greatly appreciated!
The basic rules:
1 - Exclude these characters in the address portion (before the # symbol): "()<>#,;:\[]*&^%$#!{}/"
2 - The address can contain a ".", but not two in a row.
I have an elegant solution to the rule number one, however, rule number two is killing me! Here is what I have so far. (I'm only including the portion up to the # sign to keep it simple). Also, it is important to note that this regular expression is being used in JavaScript, so no conditional IF is allowed.
/^[^()<>#,;:\\[\]*&^%$#!{}//]+$/
First of all, I would suggest you always choose what characters you want to allow instead of the opposite, you never know what dangerous characters you might miss.
Secondly, this is the regular expression I always use for validating emails and it works perfectly. Hope it helps you out.
/^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,6}$/i
Rule number 2
/^(?:\.?[^.])+\.?$/
which means any number of sequences of (an optional dot followed by a mandatory non dot) with an optional dot at the end.
Consider four two character sequences
xx matches as two non dot characters.
.x matches as an optional dot followed by a non-dot.
x. matches as a non-dot followed by an optional dot at the end.
.. does not match because there is no non-dot after the first dot.
One thing to remember about email addresses is that dots can appear in tricky places
"..#"#.example.com
is a valid email address.
The "..#" is a perfectly valid quoted local-part production, and .example.com is just a way of saying example.com but resolved against the root DNS instead of using a host search path. example.com might resolve to example.com.myintranet.com if myintranet.com is on the host search path but .example.com always resolves to the absolute host example.com.
First of all, to your specifications:
^(?![\s\S]*\.\.)[^()<>#,;:\\[\]*&^%$#!{}/]#.*$
It's just your regex with (?!.*\.\.) tacked onto the front. That's a negative lookahead, which doesn't match if there are any two consecutive periods anywhere in the string.
Properly matching email addresses is quite a bit harder, however.

Regex Comma Separated Emails

I am trying to get this Regex statement to work
^([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})+(\s?[,]\s?|$))+$
for a string of comma separated emails in a textbox using jQuery('#textbox').val(); which passes the values into the Regex statement to find errors for a string like:
"test#test.com, test1#test.com,test2#test.com"
But for some reason it is returning an error. I tried running it through http://regexpal.com/ but i'm unsure ?
NB: This is just a basic client-side test. I validate emails via the MailClass on the server-side using .NET4.0 - so don't jump down my throat re-this. The aim here is to eliminate simple errors.
Escaped Version:
^([_a-z0-9-]+(\\.[_a-z0-9-]+)*#[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,3})+(\\s?[,]\\s?|$))+$
You can greatly simplify things by first splitting on commas, as Pablo said, then repeatedly applying the regex to validate each individual email. You can also then point out the one that's bad -- but there's a big caveat to that.
Take a look at the regex in the article Comparing E-mail Address Validating Regular Expressions. There's another even better regex that I couldn't find just now, but the point is a correct regex for checking email is incredibly complicated, because the rules for a valid email address as specified in the RFC are incredibly complicated.
In yours, this part (\.[a-z]{2,3})+ jumped out at me; the two-or-three-letters group {2,3} I often see as an attempt to validate the top-level domain, but (1) your regex allows one or more of these groups and (2) you will exclude valid email addresses from domains such as .info or .museum (Many sites reject my .us address because they thought only 3 letter domains were legal.)
My advice to reject seriously invalid addresses, while leaving the final validation to the server, is to allow basically (anything)#(anything).(anything) -- check only for an "at" and a "dot", and of course allow multiple dots.
EDIT: Example for "simple" regex
[^#]+#[^.]+(\.[^.]+)+
This matches
test#test.com
test1#test.com
test2#test.com
foo#bar.baz.co.uk
myname#modern.museum
And doesn't match foo#this....that
Note: Even this will reject some valid email addresses, because anything is allowed on the left of the # - even another # - if it's all escaped properly. But I've never seen that in 25 years of using email in Real Life.

Categories

Resources