Validating any string with RegEx - javascript

I want to validate any string that contains çÇöÖİşŞüÜğĞ chars and starting at least 5 chars.String to validate can contain spaces.RegEx must validate like "asd Çğ ğT i" for example.
Any reply will helpful.
Thanks.

You can use escape sequences of the form
\uXXXX
where each "X" can be any hex digit. Thus:
\u0020
is the same as a plain space character, and
\u0041
is upper-case "A". Thus you can encode the Unicode values for the characters you're interested in and then include them in a regex character class. To make sure the string is at least five characters long, you can use a quantifier in the regex.
You'll end up with something like:
var regex = /^[A-Za-z\u00nn\u00nn\u00nn]{5,}$/;
where those "00nn" things would be the appropriate values. As to exactly what those values are, you should be able to find them on a reference site like this one or maybe this one. For example I think that "Ö" is \u00D6. (Some of your characters are in the Unicode Latin-1 Supplement, while others are in Latin Extended A.)

Related

Javascript regex character set restriction related query

I am using a JavaScript RegEx which is mentioned below:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*([-_.])).+$
This accepts only that text which has at least 1: uppercase letter, lowercase letter, number & a special symbol from .-_.
Now assume I supply User-123 as the user id which confirms to the above RegEx & I use the onscreen keyboard to type in a character from Finnish language, which results in User-123Ã.
The RegEx being fulfilled, the text is accepted by my JavaScript code, but I want it to only accept Alphanumeric input in English, and nothing else.
How should I enhance this RegEx to do so?
This string "User-123Ã" have contain Unicode "Ã" not alphabets, so how can identify js code,
[Code] [Glyph] [Decimal] [HTML] Description [#]
U+00C3 Ã Ã Ã Latin Capital letter A with tilde 0131
Try this link also,
How to find whether a particular string has unicode characters
I am not sure this will solve the issue, but in most cases when you want to restrict the input itself to some characters, your consuming pattern should only match those characters you allow. The lookahead restrictions just require or forbid certain characters to appear certain number of times at certain positions, but what you match in the consuming part is crucial.
.+$ allows all letters. Replace it with [\w.-]+$ (\w = [a-zA-Z0-9_]) instead to restrict to the characters you require in the lookaheads.

Regex: any character that is NOT a letter (but not only English letters)

I want to delete from string all characters that are not letters.
I know that there is something like \W in regex, but it considers non-English characters as not letters. For example my script deletes all Polish letters (like "ą", "ć", "ó"), but I need them.
How to tell regex to do this?
Code:
var text = text.replace(/\W/g, ' ');
You can either use Steve Levithan's XRegExp library (with Unicode plugins), or you have to define the Unicode character range manually, since JavaScript doesn't support Unicode properties.
[^\u0041-\u005A\u0061-\u007A\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02C1\u02C6-\u02D1\u02E0-\u02E4\u02EC\u02EE\u0345\u0370-\u0374\u0376\u0377\u037A-\u037D\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03F5\u03F7-\u0481\u048A-\u0527\u0531-\u0556\u0559\u0561-\u0587\u05B0-\u05BD\u05BF\u05C1\u05C2\u05C4\u05C5\u05C7\u05D0-\u05EA\u05F0-\u05F2\u0610-\u061A\u0620-\u0657\u0659-\u065F\u066E-\u06D3\u06D5-\u06DC\u06E1-\u06E8\u06ED-\u06EF\u06FA-\u06FC\u06FF\u0710-\u073F\u074D-\u07B1\u07CA-\u07EA\u07F4\u07F5\u07FA\u0800-\u0817\u081A-\u082C\u0840-\u0858\u08A0\u08A2-\u08AC\u08E4-\u08E9\u08F0-\u08FE\u0900-\u093B\u093D-\u094C\u094E-\u0950\u0955-\u0963\u0971-\u0977\u0979-\u097F\u0981-\u0983\u0985-\u098C\u098F\u0990\u0993-\u09A8\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09BD-\u09C4\u09C7\u09C8\u09CB\u09CC\u09CE\u09D7\u09DC\u09DD\u09DF-\u09E3\u09F0\u09F1\u0A01-\u0A03\u0A05-\u0A0A\u0A0F\u0A10\u0A13-\u0A28\u0A2A-\u0A30\u0A32\u0A33\u0A35\u0A36\u0A38\u0A39\u0A3E-\u0A42\u0A47\u0A48\u0A4B\u0A4C\u0A51\u0A59-\u0A5C\u0A5E\u0A70-\u0A75\u0A81-\u0A83\u0A85-\u0A8D\u0A8F-\u0A91\u0A93-\u0AA8\u0AAA-\u0AB0\u0AB2\u0AB3\u0AB5-\u0AB9\u0ABD-\u0AC5\u0AC7-\u0AC9\u0ACB\u0ACC\u0AD0\u0AE0-\u0AE3\u0B01-\u0B03\u0B05-\u0B0C\u0B0F\u0B10\u0B13-\u0B28\u0B2A-\u0B30\u0B32\u0B33\u0B35-\u0B39\u0B3D-\u0B44\u0B47\u0B48\u0B4B\u0B4C\u0B56\u0B57\u0B5C\u0B5D\u0B5F-\u0B63\u0B71\u0B82\u0B83\u0B85-\u0B8A\u0B8E-\u0B90\u0B92-\u0B95\u0B99\u0B9A\u0B9C\u0B9E\u0B9F\u0BA3\u0BA4\u0BA8-\u0BAA\u0BAE-\u0BB9\u0BBE-\u0BC2\u0BC6-\u0BC8\u0BCA-\u0BCC\u0BD0\u0BD7\u0C01-\u0C03\u0C05-\u0C0C\u0C0E-\u0C10\u0C12-\u0C28\u0C2A-\u0C33\u0C35-\u0C39\u0C3D-\u0C44\u0C46-\u0C48\u0C4A-\u0C4C\u0C55\u0C56\u0C58\u0C59\u0C60-\u0C63\u0C82\u0C83\u0C85-\u0C8C\u0C8E-\u0C90\u0C92-\u0CA8\u0CAA-\u0CB3\u0CB5-\u0CB9\u0CBD-\u0CC4\u0CC6-\u0CC8\u0CCA-\u0CCC\u0CD5\u0CD6\u0CDE\u0CE0-\u0CE3\u0CF1\u0CF2\u0D02\u0D03\u0D05-\u0D0C\u0D0E-\u0D10\u0D12-\u0D3A\u0D3D-\u0D44\u0D46-\u0D48\u0D4A-\u0D4C\u0D4E\u0D57\u0D60-\u0D63\u0D7A-\u0D7F\u0D82\u0D83\u0D85-\u0D96\u0D9A-\u0DB1\u0DB3-\u0DBB\u0DBD\u0DC0-\u0DC6\u0DCF-\u0DD4\u0DD6\u0DD8-\u0DDF\u0DF2\u0DF3\u0E01-\u0E3A\u0E40-\u0E46\u0E4D\u0E81\u0E82\u0E84\u0E87\u0E88\u0E8A\u0E8D\u0E94-\u0E97\u0E99-\u0E9F\u0EA1-\u0EA3\u0EA5\u0EA7\u0EAA\u0EAB\u0EAD-\u0EB9\u0EBB-\u0EBD\u0EC0-\u0EC4\u0EC6\u0ECD\u0EDC-\u0EDF\u0F00\u0F40-\u0F47\u0F49-\u0F6C\u0F71-\u0F81\u0F88-\u0F97\u0F99-\u0FBC\u1000-\u1036\u1038\u103B-\u103F\u1050-\u1062\u1065-\u1068\u106E-\u1086\u108E\u109C\u109D\u10A0-\u10C5\u10C7\u10CD\u10D0-\u10FA\u10FC-\u1248\u124A-\u124D\u1250-\u1256\u1258\u125A-\u125D\u1260-\u1288\u128A-\u128D\u1290-\u12B0\u12B2-\u12B5\u12B8-\u12BE\u12C0\u12C2-\u12C5\u12C8-\u12D6\u12D8-\u1310\u1312-\u1315\u1318-\u135A\u135F\u1380-\u138F\u13A0-\u13F4\u1401-\u166C\u166F-\u167F\u1681-\u169A\u16A0-\u16EA\u16EE-\u16F0\u1700-\u170C\u170E-\u1713\u1720-\u1733\u1740-\u1753\u1760-\u176C\u176E-\u1770\u1772\u1773\u1780-\u17B3\u17B6-\u17C8\u17D7\u17DC\u1820-\u1877\u1880-\u18AA\u18B0-\u18F5\u1900-\u191C\u1920-\u192B\u1930-\u1938\u1950-\u196D\u1970-\u1974\u1980-\u19AB\u19B0-\u19C9\u1A00-\u1A1B\u1A20-\u1A5E\u1A61-\u1A74\u1AA7\u1B00-\u1B33\u1B35-\u1B43\u1B45-\u1B4B\u1B80-\u1BA9\u1BAC-\u1BAF\u1BBA-\u1BE5\u1BE7-\u1BF1\u1C00-\u1C35\u1C4D-\u1C4F\u1C5A-\u1C7D\u1CE9-\u1CEC\u1CEE-\u1CF3\u1CF5\u1CF6\u1D00-\u1DBF\u1E00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FBC\u1FBE\u1FC2-\u1FC4\u1FC6-\u1FCC\u1FD0-\u1FD3\u1FD6-\u1FDB\u1FE0-\u1FEC\u1FF2-\u1FF4\u1FF6-\u1FFC\u2071\u207F\u2090-\u209C\u2102\u2107\u210A-\u2113\u2115\u2119-\u211D\u2124\u2126\u2128\u212A-\u212D\u212F-\u2139\u213C-\u213F\u2145-\u2149\u214E\u2160-\u2188\u24B6-\u24E9\u2C00-\u2C2E\u2C30-\u2C5E\u2C60-\u2CE4\u2CEB-\u2CEE\u2CF2\u2CF3\u2D00-\u2D25\u2D27\u2D2D\u2D30-\u2D67\u2D6F\u2D80-\u2D96\u2DA0-\u2DA6\u2DA8-\u2DAE\u2DB0-\u2DB6\u2DB8-\u2DBE\u2DC0-\u2DC6\u2DC8-\u2DCE\u2DD0-\u2DD6\u2DD8-\u2DDE\u2DE0-\u2DFF\u2E2F\u3005-\u3007\u3021-\u3029\u3031-\u3035\u3038-\u303C\u3041-\u3096\u309D-\u309F\u30A1-\u30FA\u30FC-\u30FF\u3105-\u312D\u3131-\u318E\u31A0-\u31BA\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FCC\uA000-\uA48C\uA4D0-\uA4FD\uA500-\uA60C\uA610-\uA61F\uA62A\uA62B\uA640-\uA66E\uA674-\uA67B\uA67F-\uA697\uA69F-\uA6EF\uA717-\uA71F\uA722-\uA788\uA78B-\uA78E\uA790-\uA793\uA7A0-\uA7AA\uA7F8-\uA801\uA803-\uA805\uA807-\uA80A\uA80C-\uA827\uA840-\uA873\uA880-\uA8C3\uA8F2-\uA8F7\uA8FB\uA90A-\uA92A\uA930-\uA952\uA960-\uA97C\uA980-\uA9B2\uA9B4-\uA9BF\uA9CF\uAA00-\uAA36\uAA40-\uAA4D\uAA60-\uAA76\uAA7A\uAA80-\uAABE\uAAC0\uAAC2\uAADB-\uAADD\uAAE0-\uAAEF\uAAF2-\uAAF5\uAB01-\uAB06\uAB09-\uAB0E\uAB11-\uAB16\uAB20-\uAB26\uAB28-\uAB2E\uABC0-\uABEA\uAC00-\uD7A3\uD7B0-\uD7C6\uD7CB-\uD7FB\uF900-\uFA6D\uFA70-\uFAD9\uFB00-\uFB06\uFB13-\uFB17\uFB1D-\uFB28\uFB2A-\uFB36\uFB38-\uFB3C\uFB3E\uFB40\uFB41\uFB43\uFB44\uFB46-\uFBB1\uFBD3-\uFD3D\uFD50-\uFD8F\uFD92-\uFDC7\uFDF0-\uFDFB\uFE70-\uFE74\uFE76-\uFEFC\uFF21-\uFF3A\uFF41-\uFF5A\uFF66-\uFFBE\uFFC2-\uFFC7\uFFCA-\uFFCF\uFFD2-\uFFD7\uFFDA-\uFFDC]
matches a character that isn't a Unicode letter.
It depends on what engine you are working with. It also depends on how your Unicode characters are encoded — are they encoded as a single character, or as a character+mark combination?
You can try the following: \p{L} to target character+mark combinations, and \P{M}\p{M}*+ for the single character encodings.
So, finally I decided to write my own regex condition, because it seems like it there isn't any fast&simple way to do that in javascript.
I added here all unnecessary characters that came to my mind, could be in typical website and aren't needed to understand single word (I left ' character because in English it is quite important ;) ). If you want you can edit my answer and add your own ones.
[:;.,\?!-()~\/"|®##$%^&*+-]
JS:
text = text.replace(/[:;\.,\?!\-\(\)~\\\/"|®##$%^&*+-]/, "");

Regular Expressions - Match all alphanumeric characters except individual numbers

I would like to create a RegEx to match only english alphanumeric characters but ignore (or discard) isolated numbers in Ruby (and if possible in JS too).
Examples:
1) I would like the following to be matched:
4chan
9gag
test91323432
asf5asdfaf35edfdfad
afafaffe
But not:
92342424
343424
34432
and so on..
The above is exactly what I would want.
Edit: I deleted the second sub-question. Just focus on the first one, thank you very much for your answers!!
Sorry, my regex skills aren't that great (hence this question!)
Thank you.
You can try the following expression (works both in Ruby and Javascript):
^(?!^\d+$)[[:alnum:]]+$
This first ensures the string is not just digits by using a negative look ahead (?!^[0-9]+$), then it matches one or more alphanumeric character, Unicode characters are supported which means this works with French letters too.
EDIT: If you only want English alphabet:
^(?!^\d+$)\w+$
Rubular Demo
For any Latin letters:
/(?=.*\p{Alpha})\p{Alnum}+/
I'm pretty sure that you can't do what you want to do with one regex. A single alpha character, anywhere in a group of numbers, will make it a valid match, and there is no way to represent that in regex, because what you are really saying is something along the lines of "a letter is required at the front of this word, but only if there isn't a letter in the middle or at the end", and regex won't do that.
Your best bet is to do two passes:
one that matches your alphanumeric, plus special "French" characters (pattern: TBD, based on what special characters you want to accept), and
one that matches numbers only (pattern: would include [0-9]+ . . . need more information about the specific situation to give you a final, complete regex)
The values that you want in the end would need to pass the first regex and fail the second one.
Also . . .
To give you a better answer, we'll need to know a couple of things:
Are you testing that an entire string matches the pattern?
Are you trying to capture a single instance of the pattern in a bigger string?
Are you trying to capture all of the instances of the pattern in a bigger string?
The answers to those questions have a big impact on the final regex pattern that you will need.
And, finally . . .
A note on the "French" characters . . . you need to be very specific about which special characters are acceptable and which aren't. There are three main approaches to special character matching in regex: groups, additive, and subtractive
groups - these are characters that represent a preset group of characters in the version of regex that you are using. For example, \s matches all whitespaces
additive - this is the process of listing out each acceptable character (or range of characters) in your regex. This is better when you have a small group of acceptable characters
subtractive - this is the process of listing out each UNacceptable character (or range of characters) in your regex. This is better when you have a large group of acceptable characters
If you can clear up some of these questions, we should be able to give you a better answer.
Maybe this ^(?![0-9]+$)[a-zA-Z0-9\x80-\xa5]+$
Edit - fixed cut&paste error and added Extended character range \x80-\xa5
which includes the accent chars (depending on locale set, the figures may be different)

regex for alphaspecialnumeric

I would like to check few of my text boxes that must satisfy the following conditions:
Alphabets i meant are from a-z(uppercase and lower case) numbers 0-9 and special characters are ~`!##$%^&*()-_+={}[];:'",.<>/?
It can contain only alphabets
It cannot contain only numbers
It cannot contain only special characters
It cannot contain only numbers and special characters
It can contain alphabets,numbers and special characters
It can contain alphabets and numbers
It can contain alphabets and special charcters
I found a solution but seems not working for me:
/^[a-z0-9/. -!##$%^&*(){}:;"',/?]+$/i
I am checking it as:
var alpha=/^[a-z0-9/. -!##$%^&*(){}:;"',/?]+$/i;
if (!alpha.test(username.value))
{
alert('Invalid username');
document.theForm.username.focus();
return false;
}
The problem can be restated as that of matching a string containing ONLY the characters
A-Za-z0-9~`!##$%^&*()-_+={}[];:'",.<>/?
such that at least one of them is a letter.
Fortunately, you've covered all the printable characters in the range U+0021 to U+007F, so that the desired regex is simply
[!-~]*[A-Za-z][!-~]*
EDIT: On closer reading, I noticed you did not allow the backslash! If you want to allow the backslash, the regex above is okay; if not, you should modify it like so:
[!-\[\]-~]*[A-Za-z][!-\[\]-~]*
It's a bit uglier, because to exclude the backslash we have to say
All characters in the range ! to [ union characters in the range ] to ~, and the explicit mention of [ and ] requires escaping with, you guessed it, the \.
Hopefully you meant to allow the \ so you can use the simpler regex above.
EDIT 2
To make the regex more efficient, you should use a reluctant quantifier (as kcsoft did):
[!-~]*?[A-Za-z][!-~]*
Also for JavaScript, but not for Java if you are using matches, you should anchor the regex to match the whole string, giving this in JavaScript:
/^[!-~]*?[A-Za-z][!-~]*$/
And, as you did in your question, you can shorten it a bit more by using the i modifier:
/^[!-~]*?[A-Z][!-~]*$/i
Can you give some input examples. Can you try this?
/.*?[a-zA-Z]+.*/
Or if you need to specify the list of special characters:
/[list of chars]*?[a-zA-Z]+[list of chars]*/

Regular expression to allow all alphabet characters plus unicode characters

I need a regular expression to allow all alphabet characters plus Greek/German alphabet in a string but replace those symbols ?,&,^,". with *
I skipped the list with characters to escape to made the question simple.
I really want to see how to construct this and afterwards include alphabet sets using ASCII codes.
if you have a finite and short set of elements to replace you could just use a class e.g.
string.replace(/[?\^&]/g, '*');
and add as many symbols as you want to reject. you could also add ranges of unicode symbols you want to replace (e.g. \u017F-\036F\u0400-\uFFFF )
otherwise use a a class to specify what symbols don't need to be replaced, like a-z, accented/diacritic letters and greek symbols
string.replace(/[^a-z\00C0-\017E\u0370-\03FF]/gi, '*');
You have to use the XRegexp plugin, along with the Unicode add-on.
Once you have that, you can use modern regexes like /[\p{L}\p{Nl}]/, which necessarily also includes those \p{Greek} code points which are letters or letter-numbers. But you could also match /[\p{Latin}\p{Greek}]/ if you wanted.
Javascript’s own regexes are terrible. Use XRegexp.
So something like: /^[^?&\^"]*$/ (that means the string is composed only of characters outside the five you listed)...
But if you want to have the greek characters and the unicode characters (what are unicode characters? àèéìòù? Japanese?) perhaps you'll have to use http://xregexp.com/ It is a regex library for javascript that includes character classes for the various unicode character classes (I know I'm repeating myself) plus other "commands" for unicode handling.

Categories

Resources