How to chech Bosnian-specific characters in RegEx? - javascript

I have this Regular Expression pattern, which is quite simple and it validates if the provided string is "alpha" (both uppercase and lowercase):
var pattern = /^[a-zA-Z]+$/gi;
When I trigger pattern.test('Zlatan Omerovic') it returns true, however if I:
pattern.test('Zlatan Omerović');
It returns false and it fails my validation.
In Bosnian language we have these specific characters:
š đ č ć ž
And uppercased:
Š Đ Č Ć Ž
Is it possible to validate these characters (both cases) with JavaScript regular expression?

Sure, you can just add those characters to the list of characters your matching. Also, since you're doing a case insensitive match (the i flag), you don't need the uppercase characters.
var pattern = /^[a-zšđčćž ]+$/gi;
Fiddle here: http://jsfiddle.net/ryanbrill/KB74b/
Here's an alternate pattern, which uses the unicode representation, which might be better (embedding the characters won't work if the file isn't saved with the proper encoding, for instance)
var pattern = /^[a-z\u0161\u0111\u010D\u0107\u017E ]+$/gi;
http://jsfiddle.net/ryanbrill/KB74b/2/

a-zA-Z means exactly that, and in an English-centric way: abcdefghijklmnopqrstuvwxyz. Sadly, with JavaScript's regular expressions, if you want to test other alphabetic characters, you have to specify them specifically. JavaScript doesn't have a locale-sensitive "alpha" definition. To include non-English alphabetic characters, you have to include them on purpose. You can either do that literally (for instance, by including š in the regular expression), or using Unicode escape sequences (such as \u0161). If the additional Bosnian alphabetic characters in question have a contiguous range, you can use the - notation with them as well, but it has to be separate from the a-z, which is defined in English terms.

To include in test result the first (S-based) symbol of your five I did:
var pattern = /^[a-zA-Z\u0160-\u0161]+$/g;
Try to add all the symbols you need this way ;)

Related

Can you help me rewrite my javascript regex using php preg_replace?

I have created a javascript regular expression in order to validate comments entered by users in my app. The regex allows letters, numbers some special symbols and a range of emojis
I received help here to correctly format my javascript regular expression and the final expression I am using is as follows:
Javascript Regex:
commentRegex = /^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/;
I was advised to perform the same validation on the server side (with php) and so I am trying to perform a similar process using preg_replace().
So I would like to replace all characters (that are not contained in the regex), with the empty string. Here is my attempt however it is not working. thanks for any help
PHP
$commentText = preg_replace('#^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$#', '', $commentText);
Edit:
After taking your advice in the comments I now have the following regex.
$postText = preg_replace('/^(?:[A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?\#\+%:;\<\[\]\r\n]|(?:\x{d83c}[\x{df00}-\x{dfff}])|(?:\x{d83d}[\x{dc00}-\x{de4f}\x{de80}-\x{deff}]))*$/', '', $postText);
However I am getting a warning
<b>Warning</b>: preg_replace(): Compilation failed: character value in \x{} or \o{} is too large at offset 30 in <b>submit_post.php</b> on line <b>37
In short: use
$re = '/[^A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?#+%:;<[\]\r\n\x{1F300}-\x{1F3FF}\x{1F400}-\x{1F64F}\x{1F680}-\x{1F6FF}]+/u';
$text = 'test>><<<®¥§';
echo preg_replace($re, '', $text);
See the PHP demo.
A bit of an explanation:
Escape only special regex metacharacters inside the pattern AND the regex delimiters (if you choose a # as a regex delimiter, escape the # in the pattern, and then there is no need to escape /)
\uXXXX in PCRE must be replaced with \x{XXXX} notation
Since the text to be processed is Unicode and the chars you have in your pattern are out of the ASCII range, you have to use /u UNICODE modifier
As most emojis come outside the BMP plane, and the string now treated as a chain of Unicode code points, these symbols must be written using the extended \x notation, not as two byte notation used in JavaScript
Your 3 alternatives can be merged into 1 big character class and then you want to negated it by adding ^ at its start to make it a negated character class.
The regex in PHP has a character, which sourrounds the regex. In your case you are using the hash (#), but the character should not occour in the regex itslef, which it does...
You have to excape this character inside, or use another char. Why did you not use the same "/" as in the JS Version? The benefit is, it is already escaped.
I have not looked, if the rest would work, but I think so.
$commentText = preg_replace('/^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/', '', $commentText);
should work.
convert the \u.... sequences to \x{....}, and the result appears to be a valid PHP regular expression.
pattern: \\u(\w{4})
replace: \\x{$1}
regex101 demo

regex for alphaspecialnumeric

I would like to check few of my text boxes that must satisfy the following conditions:
Alphabets i meant are from a-z(uppercase and lower case) numbers 0-9 and special characters are ~`!##$%^&*()-_+={}[];:'",.<>/?
It can contain only alphabets
It cannot contain only numbers
It cannot contain only special characters
It cannot contain only numbers and special characters
It can contain alphabets,numbers and special characters
It can contain alphabets and numbers
It can contain alphabets and special charcters
I found a solution but seems not working for me:
/^[a-z0-9/. -!##$%^&*(){}:;"',/?]+$/i
I am checking it as:
var alpha=/^[a-z0-9/. -!##$%^&*(){}:;"',/?]+$/i;
if (!alpha.test(username.value))
{
alert('Invalid username');
document.theForm.username.focus();
return false;
}
The problem can be restated as that of matching a string containing ONLY the characters
A-Za-z0-9~`!##$%^&*()-_+={}[];:'",.<>/?
such that at least one of them is a letter.
Fortunately, you've covered all the printable characters in the range U+0021 to U+007F, so that the desired regex is simply
[!-~]*[A-Za-z][!-~]*
EDIT: On closer reading, I noticed you did not allow the backslash! If you want to allow the backslash, the regex above is okay; if not, you should modify it like so:
[!-\[\]-~]*[A-Za-z][!-\[\]-~]*
It's a bit uglier, because to exclude the backslash we have to say
All characters in the range ! to [ union characters in the range ] to ~, and the explicit mention of [ and ] requires escaping with, you guessed it, the \.
Hopefully you meant to allow the \ so you can use the simpler regex above.
EDIT 2
To make the regex more efficient, you should use a reluctant quantifier (as kcsoft did):
[!-~]*?[A-Za-z][!-~]*
Also for JavaScript, but not for Java if you are using matches, you should anchor the regex to match the whole string, giving this in JavaScript:
/^[!-~]*?[A-Za-z][!-~]*$/
And, as you did in your question, you can shorten it a bit more by using the i modifier:
/^[!-~]*?[A-Z][!-~]*$/i
Can you give some input examples. Can you try this?
/.*?[a-zA-Z]+.*/
Or if you need to specify the list of special characters:
/[list of chars]*?[a-zA-Z]+[list of chars]*/

Can it be done with regex?

Having the following regex: ([a-zA-Z0-9//._-]{3,12}[^//._-]) used like pattern="([a-zA-Z0-9/._-]{3,12}[^/._-])" to validate an HTML text input for username, I wonder if is there anyway of telling it to check that the string has only one of the following: ., -, _
By that I mean, that I'm in need of regex that would accomplish the following (if possible)
alex-how => Valid
alex-how. => Not valid, because finishing in .
alex.how => Valid
alex.how-ha => Not valid, contains already a .
alex-how_da => Not valid, contains already a -
The problem with my current regex, is that for some reason, accepts any character at the end of the string that is not ._-, and can't figure it out why.
The other problem, is that it doesn't check to see that it contains only of the allowed special characters.
Any ideas?
Try this one out:
^(?!(.*[.|_|-].*){2})(?!.*[.|_|-]$)[a-zA-Z0-9//._-]{3,12}$
Regexpal link. The regex above allow at max one of ., _ or -.
What you want is one or more strings containing all upper, lower and digit characters
followed by either one or none of the characters in "-", ".", or "_", followed by at least one character:
^[a-zA-Z0-9]+[-|_|\.]{0,1}[a-zA-Z0-9]+$
Hope this will work for you:-
It says starts with characters followed by (-,.,_) and followed and end with characters
^[\w\d]*[-_\.\w\d]*[\w\d]$
Seems to me you want:
^[A-Za-z0-9]+(?:[\._-][A-Za-z0-9]+)?$
Breaking it down:
^: beginning of line
[A-Za-z0-9]+: one or more alphanumeric characters
(?:[\._-][A-Za-z0-9]+)?: (optional, non-captured) one of your allowed special characters followed by one or more alphanumeric characters
$: end of line
It's unclear from your question if you wanted one of your special characters (., -, and _) to be optional or required (e.g., zero-or-one versus exactly-one). If you actually wanted to require one such special character, you would just get rid of the ? at the very end.
Here's a demonstration of this regular expression on your example inputs:
http://rubular.com/r/SQ4aKTIEF6
As for the length requirement (between 3 and 12 characters): This might be a cop-out, but personally I would argue that it would make more sense to validate this by just checking the length property directly in JavaScript, rather than over-complicating the regular expression.
^(?=[a-zA-Z0-9/._-]{3,12}$)[a-zA-Z0-9]+(?:[/._-][a-zA-Z0-9]+)?$
or, as a JavaScript regex literal:
/^(?=[a-zA-Z0-9\/._-]{3,12})[a-zA-Z0-9]+(?:[\/._-][a-zA-Z0-9]+)?$/
The lookahead, (?=[a-zA-Z0-9/._-]{3,12}$), does the overall-length validation.
Then [a-zA-Z0-9]+ ensures that the name starts with at least one non-separator character.
If there is a separator, (?:[/._-][a-zA-Z0-9]+)? ensures that there's at least one non-separator following it.
Note that / has no special meaning in a regex. You only have to escape it if you're using a regex literal (because / is the regex delimiter), and you escape it by prefixing with a backslash, not another forward-slash. And inside a character class, you don't need to escape the dot (.) to make it match a literal dot.
The dot in regex has a special meaning: "any character here".
If you mean a literal dot, you should escape it to tell the regex parser so.
Escape dot in a regex range

Regular expression to allow all alphabet characters plus unicode characters

I need a regular expression to allow all alphabet characters plus Greek/German alphabet in a string but replace those symbols ?,&,^,". with *
I skipped the list with characters to escape to made the question simple.
I really want to see how to construct this and afterwards include alphabet sets using ASCII codes.
if you have a finite and short set of elements to replace you could just use a class e.g.
string.replace(/[?\^&]/g, '*');
and add as many symbols as you want to reject. you could also add ranges of unicode symbols you want to replace (e.g. \u017F-\036F\u0400-\uFFFF )
otherwise use a a class to specify what symbols don't need to be replaced, like a-z, accented/diacritic letters and greek symbols
string.replace(/[^a-z\00C0-\017E\u0370-\03FF]/gi, '*');
You have to use the XRegexp plugin, along with the Unicode add-on.
Once you have that, you can use modern regexes like /[\p{L}\p{Nl}]/, which necessarily also includes those \p{Greek} code points which are letters or letter-numbers. But you could also match /[\p{Latin}\p{Greek}]/ if you wanted.
Javascript’s own regexes are terrible. Use XRegexp.
So something like: /^[^?&\^"]*$/ (that means the string is composed only of characters outside the five you listed)...
But if you want to have the greek characters and the unicode characters (what are unicode characters? àèéìòù? Japanese?) perhaps you'll have to use http://xregexp.com/ It is a regex library for javascript that includes character classes for the various unicode character classes (I know I'm repeating myself) plus other "commands" for unicode handling.

Validating any string with RegEx

I want to validate any string that contains çÇöÖİşŞüÜğĞ chars and starting at least 5 chars.String to validate can contain spaces.RegEx must validate like "asd Çğ ğT i" for example.
Any reply will helpful.
Thanks.
You can use escape sequences of the form
\uXXXX
where each "X" can be any hex digit. Thus:
\u0020
is the same as a plain space character, and
\u0041
is upper-case "A". Thus you can encode the Unicode values for the characters you're interested in and then include them in a regex character class. To make sure the string is at least five characters long, you can use a quantifier in the regex.
You'll end up with something like:
var regex = /^[A-Za-z\u00nn\u00nn\u00nn]{5,}$/;
where those "00nn" things would be the appropriate values. As to exactly what those values are, you should be able to find them on a reference site like this one or maybe this one. For example I think that "Ö" is \u00D6. (Some of your characters are in the Unicode Latin-1 Supplement, while others are in Latin Extended A.)

Categories

Resources