Regular expression to avoid control characters - javascript

I am working on flex and using regExp to check the value entered from UI. I want to ensure that entered value does not have any control characters and will give warning based on that. Since we support many languages, I can't have regex which have all possible positive values, thus I need to use blacklist control characters regular expression. I tried ^[^\x00-\x1F\x7F\u2028\u2029]*$ but it matches successfully if there is any regular character other than control character. I want it should return no match in case even a single control character is present. What should I change in this regular expression?
Will appreciate for the help.

You can use the following trick (put your negated set in a lookahead followed by a . and capture as a whole):
^((?=[^\x00-\x1F\x7F\u2028\u2029]).)*$

Related

Regex - Validate that the local part of the email is not ending with a dot while only allowing certain characters without using a lookbehind

I was using a lookbehind to check for a dot before the # but just realized not all browsers are supporting lookbehinds. It works perfect in Chrome but fails in Firefox and IE.
This is what I came up with but it certainly is messy
^([a-zA-Z0-9&^*%#~{}=+?`_-]\.?)*[a-zA-Z0-9&^*%#~{}=+?`_-]#([a-zA-Z0-9]+\.)+[a-zA-Z]$
Is there a simpler and/or more elegant way to do this? I don't think I can negate the dot (^.) because I'm only allowing certain characters to be present in the local part.
This ([a-zA-Z0-9&^*%#~{}=+?`_-].?)*[a-zA-Z0-9&^*%#~{}=+?`_-] part is not messy, but inefficient, because the * quantifies a group containing an obligatory part, [...], and an optional \.?. Instead of (ab?)*a, you may use a+(?:ba+)* that will make matching linear and swift, in your case, [a-zA-Z0-9&^*%#~{}=+?`_-]+(?:.[a-zA-Z0-9&^*%#~{}=+?`_-]+)*.
More, [a-zA-Z0-9_] equals \w in JS regex, you may use this to shorten the pattern.
Besides, the last [a-zA-Z]$ pattern only matches a single letter, you most probably need [a-zA-Z]{2}$ there, as TLDs consist of 2+ letters.
So, you may use
^[\w&^*%#~{}=+?`-]+(?:\.[\w&^*%#~{}=+?`-]+)*#(?:[a-zA-Z0-9]+\.)+[a-zA-Z]{2,}$
See the regex demo.

JavaScript Regex - custom characters and numbers

I am building a RegEx that is almost complete, but I can not get it to check for digits (0 - 9):
So for example: Jones-Parry is valid but Jones-Parry1 is not. The regex at present looks like this:
^([\\w\\s,'\\-ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸçÇŒœßØøÅåÆæÞþÐð]){0,80}$
I have tried using \d and [0-9] but to no avail. All else is working with the regex aside from the numbers. It validates special characters etc.
Any pointers greatly appreciated!
The problem is \w expands to A-Za-z0-9_, which includes digits 0-9. This explains why strings with digit pass your test.
You may want to specify A-Za-z_ directly instead of \w in your regex. It will fix your problem.
As georg has pointed out in the comment, your regex is very weak, since aside from the length requirement, it only checks whether it does not contain any character outside your allowed character set. A string with only spaces, or a string with only punctuation would pass the test.
Anyway, I doubt validating name is a good idea in general. Many assumptions programmers make about name are wrong. Depending on your requirement, you can give user a field for display name, where user can type anything in, and another field for username, where you only allow a strict set of characters.

Regex to match multiple patterns in any order

I'm validating a password for complexity in an ASP.NET MVC3 app. My current requirements are that it must contain at least one upper case letter, one lower case letter, one digit and no more than three repeated characters. I'd like to generalise those numbers though, and also add a condition for non-alphanumeric characters.
At present, I'm validating server-side only, so I'm able to call Regex.IsMatch multiple times using one regex for each condition. I want to be able to validate client-side too though. because unobtrusive jQuery validation will only allow one regex, I need to combine all five conditions into a single pattern.
I don't know much when it comes to regular expressions but I've been doing a bit of reading recently. I may be missing something simple but I can't find a way to AND multiple patterns together the way a | will OR them.
You can do this (in .NET) with several lookahead assertions in a single regex:
^(?=.*\p{Lu})(?:.*\p{Ll})(?=.*\d)(?=.*\W)(?!.*(.).*\1.*\1)
will match if all conditions are true.
^ # Match the start of the string
(?=.*\p{Lu}) # True if there is at least one uppercase letter ahead
(?=.*\p{Ll}) # True if there is at least one lowercase letter ahead
(?=.*\d) # True if there is at least one digit ahead
(?=.*\W) # True if there is at least one non-alnum character ahead
(?!.*(.).*\1.*\1) # True if there is no character repeated twice ahead
Note that the match is not going to consume any characters of the string - if you want the match operation to return the string you're matching against, add .* at the end of the regex.
In JavaScript, you can't use Unicode character properties. So instead you could use
^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[\W_])(?!.*(.).*\1.*\1)
which will of course only use ASCII letters for validation. If that's OK for you, fine. You could go and augment the character classes like [A-ZÄÖÜÀÈÌÒÙÁÉÍÓÚ] etc. etc. but you would probably never be complete with this. On the server side, if you want the validation to yield the same result, you'd have to specify RegexOptions.ECMAScript so the .NET regex engine behaves like the JavaScript engine (thanks Alan Moore for noticing!).

Regular expression Modification

I have one asp.net application, in which i have one text box for URL. And i am using the regular expression for validating. My regular expression is like this:^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$
But now i have one enhancement like the text always keeps the text of http://. at that time the validation of this expression have to ignore the default text (http://). How it possible? Please help me by resolving this issue.
Your expression matches the http:// part, so it "keeps" that part of the match. If the text box you're validating doesn't contain that part at all, simply drop (ht|f)tp(s?)\:\/\/ from your regex.
If it is part of the text box, but you want to ignore it after having matched it, then you can put capturing parenthesess around your intended match. Your original regex would then look like this:
^(ht|f)tp(s?)\:\/\/([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?)$
Now the part without http:// or ftp:// etc will be in backreference number 3.
That said, your regex as it stands now is pretty bad and also incorrect (lots of unnecessary escapes, unnecessary parentheses, wrongly constructed character classes (URLs with port number will fail here), and I'm pretty sure that you don't want & in there)...
It is not easy to validate URLs with regexes. What are your intentions? What should be valid, what shouldn't be?
You can try this -
Use ((ht|f)tp(s?)\:\/\/)? in the starting of your regular expression which makes http:// or ftp:// as optional.
Your complete regex would be -
^((ht|f)tp(s?)\:\/\/)?[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$

How to detect what allowed character in current Regular Expression by using JavaScript?

In my web application, I create some framework that use to bind model data to control on page. Each model property has some rule like string length, not null and regular expression. Before submit page, framework validate any binded control with defined rules.
So, I want to detect what character that is allowed in each regular expression rule like the following example.
"^[0-9]+$" allow only digit characters like 1, 2, 3.
"^[a-zA-Z_][a-zA-Z_\-0-9]+$" allow only a-z, - and _ characters
However, this function should not care about grouping, positioning of allowed character. It just tells about possible characters only.
Do you have any idea for creating this function?
PS. I know it easy to create specified function like numeric only for allowing only digit characters. But I need share/reuse same piece of code both data tier(contains all model validator) and UI tier without modify anything.
Thanks
You can't solve this for the general case. Regexps don't generally ‘fail’ at a particular character, they just get to a point where they can't match any more, and have to backtrack to try another method of matching.
One could make a regex implementation that remembered which was the farthest it managed to match before backtracking, but most implementations don't do that, including JavaScript's.
A possible way forward would be to match first against ^pattern$, and if that failed match against ^pattern without the end-anchor. This would be more likely to give you some sort of match of the left hand part of the string, so you could count how many characters were in the match, and say the following character was ‘invalid’. For more complicated regexps this would be misleading, but it would certainly work for the simple cases like [a-zA-Z0-9_]+.
I must admit that I'm struggling to parse your question.
If you are looking for a regular expression that will match only if a string consists entirely of a certain collection of characters, regardless of their order, then your examples of character classes were quite close already.
For instance, ^[A-Za-z0-9]+$ will only allow strings that consist of letters A through Z (upper and lower case) and numbers, in any order, and of any length.

Categories

Resources