Detecting characters having a similar connotation as in ASCII set

Detecting characters having a similar connotation as in ASCII set - javascript

To detect if the string is composed of ASCII characters, I am using a regex that looks as follows:
"string".match(/^[\x00-\x7F]*$/gm)
This works fine in detecting the ASCII characters. But for this leaves the characters that are similar in meaning to ascii codes. For example a double quote that falls out of ASCII set and is included in unicode set. For example:
"see the difference in double quotes“
With the above regex, this string will fail the detection test because of “. How could I extend the above regex to include characters such as these that are very similar to meaning in ASCII set. For example, , [comma], "[double quote], etc.

Regex doesn't understand the meaning of anything, it only follows its rules to match the sequence of characters.
If you want to match a comma, you need to put a comma in your character set. If you are looking for "similar" characters, you need to identify each and every one of them and put them inside the character set.
[,"]
will match "comma" and "double quote".

Related

Cannot get a regex to work in JavaScript that allows whitespace and backslash

I have a regular expression as below. It should allow alphabets, digits, round brackets, square brackets, backslash and following punctuation marks: period, comma, semi-colon, full colon, exclamation, percentage and dash.
^[(a-z)(A-Z) .,;:!'%\-(0-9)(\\)\(\)[\]\s]+$
Question : I have tried this regular expression with some text at this online tester: https://regex101.com/r/kO5tW2/2, but it always comes up with no matches. What is causing the expression to fail in above case? To me, the string being tested should come back as valid, but it's not.

Your spec does not mention a question mark. However, the test text you give does include a question mark. You could have tested this easily enough by removing one character at a time from the test text until you got a match, which would have happened when you removed the question mark.
Either add the question mark to the regexp, or remove it from your test test.
Also, you do not need to (and should not) enclose ranges in parentheses.
In the below, I've also removed escaping for characters which do not need to be escaped:
^[a-zA-Z .,;:!'%\-0-9\\()[\]\s?]+$
^
https://regex101.com/r/kO5tW2/4

Try adding m (multiline) modifier to regex
If you have a string consisting of multiple lines, like first line\nsecond line (where \n indicates a line break), it is often desirable to work with lines, rather than the entire string. Therefore, all the regex engines discussed in this tutorial have the option to expand the meaning of both anchors. ^ can then match at the start of the string (before the f in the above string), as well as after each line break (between \n and s). Likewise, $ still matches at the end of the string (after the last e), and also before every line break (between e and \n). Source

regular expression incorrectly matching % and $

I have a regular expression in JavaScript to allow numeric and (,.+() -) character in phone field
my regex is [0-9-,.+() ]
It works for numeric as well as above six characters but it also allows characters like % and $ which are not in above list.

Even though you don't have to, I always make it a point to escape metacharacters (easier to read and less pain):
[0-9\-,\.+\(\) ]
But this won't work like you expect it to because it will only match one valid character while allowing other invalid ones in the string. I imagine you want to match the entire string with at least one valid character:
^[0-9\-,\.\+\(\) ]+$
Your original regex is not actually matching %. What it is doing is matching valid characters, but the problem is that it only matches one of them. So if you had the string 435%, it matches the 4, and so the regex reports that it has a match.
If you try to match it against just one invalid character, it won't match. So your original regex doesn't match the string %:
> /[0-9\-,\.\+\(\) ]/.test("%")
false
> /[0-9\-,\.\+\(\) ]/.test("44%5")
true
> "444%6".match(/[0-9\-,\.+\(\) ]/)
["4"] //notice that the 4 was matched.
Going back to the point about escaping, I find that it is easier to escape it rather than worrying about the different rules where specific metacharacters are valid in a character class. For example, - is only valid in the following cases:
When used in an actual character class with proper-order such as [a-z] (but not [z-a])
When used as the first or last character, or by itself, so [-a], [a-], or [-].
When used after a range like [0-9-,] or [a-d-j] (but keep in mind that [9-,] is invalid and [a-d-j] does not match the letters e through f).
For these reasons, I escape metacharacters to make it clear that I want to match the actual character itself and to remove ambiguities.

You just need to anchor your regex:
^[0-9-,.+() ]+$
In character class special char doesn't need to be escaped, except ] and -.
But, these char are not escaped when:
] is alone in the char class []]
- is at the begining [-abc] or at the end [abc-] of the char class or after the last end range [a-c-x]

Escape characters with special meaning in your RegExp. If you're not sure and it isn't an alphabet character, it usually doesn't hurt to escape it, too.
If the whole string must match, include the start ^ and end $ of the string in your RegExp, too.
/^[\d\-,\.\+\(\) ]*$/

Can it be done with regex?

Having the following regex: ([a-zA-Z0-9//._-]{3,12}[^//._-]) used like pattern="([a-zA-Z0-9/._-]{3,12}[^/._-])" to validate an HTML text input for username, I wonder if is there anyway of telling it to check that the string has only one of the following: ., -, _
By that I mean, that I'm in need of regex that would accomplish the following (if possible)
alex-how => Valid
alex-how. => Not valid, because finishing in .
alex.how => Valid
alex.how-ha => Not valid, contains already a .
alex-how_da => Not valid, contains already a -
The problem with my current regex, is that for some reason, accepts any character at the end of the string that is not ._-, and can't figure it out why.
The other problem, is that it doesn't check to see that it contains only of the allowed special characters.
Any ideas?

Try this one out:
^(?!(.*[.|_|-].*){2})(?!.*[.|_|-]$)[a-zA-Z0-9//._-]{3,12}$
Regexpal link. The regex above allow at max one of ., _ or -.

What you want is one or more strings containing all upper, lower and digit characters
followed by either one or none of the characters in "-", ".", or "_", followed by at least one character:
^[a-zA-Z0-9]+[-|_|\.]{0,1}[a-zA-Z0-9]+$

Hope this will work for you:-
It says starts with characters followed by (-,.,_) and followed and end with characters
^[\w\d]*[-_\.\w\d]*[\w\d]$

Seems to me you want:
^[A-Za-z0-9]+(?:[\._-][A-Za-z0-9]+)?$
Breaking it down:
^: beginning of line
[A-Za-z0-9]+: one or more alphanumeric characters
(?:[\._-][A-Za-z0-9]+)?: (optional, non-captured) one of your allowed special characters followed by one or more alphanumeric characters
$: end of line
It's unclear from your question if you wanted one of your special characters (., -, and _) to be optional or required (e.g., zero-or-one versus exactly-one). If you actually wanted to require one such special character, you would just get rid of the ? at the very end.
Here's a demonstration of this regular expression on your example inputs:
http://rubular.com/r/SQ4aKTIEF6
As for the length requirement (between 3 and 12 characters): This might be a cop-out, but personally I would argue that it would make more sense to validate this by just checking the length property directly in JavaScript, rather than over-complicating the regular expression.

^(?=[a-zA-Z0-9/._-]{3,12}$)[a-zA-Z0-9]+(?:[/._-][a-zA-Z0-9]+)?$
or, as a JavaScript regex literal:
/^(?=[a-zA-Z0-9\/._-]{3,12})[a-zA-Z0-9]+(?:[\/._-][a-zA-Z0-9]+)?$/
The lookahead, (?=[a-zA-Z0-9/._-]{3,12}$), does the overall-length validation.
Then [a-zA-Z0-9]+ ensures that the name starts with at least one non-separator character.
If there is a separator, (?:[/._-][a-zA-Z0-9]+)? ensures that there's at least one non-separator following it.
Note that / has no special meaning in a regex. You only have to escape it if you're using a regex literal (because / is the regex delimiter), and you escape it by prefixing with a backslash, not another forward-slash. And inside a character class, you don't need to escape the dot (.) to make it match a literal dot.

The dot in regex has a special meaning: "any character here".
If you mean a literal dot, you should escape it to tell the regex parser so.
Escape dot in a regex range

Regular expression to allow all alphabet characters plus unicode characters

I need a regular expression to allow all alphabet characters plus Greek/German alphabet in a string but replace those symbols ?,&,^,". with *
I skipped the list with characters to escape to made the question simple.
I really want to see how to construct this and afterwards include alphabet sets using ASCII codes.

if you have a finite and short set of elements to replace you could just use a class e.g.
string.replace(/[?\^&]/g, '*');
and add as many symbols as you want to reject. you could also add ranges of unicode symbols you want to replace (e.g. \u017F-\036F\u0400-\uFFFF )
otherwise use a a class to specify what symbols don't need to be replaced, like a-z, accented/diacritic letters and greek symbols
string.replace(/[^a-z\00C0-\017E\u0370-\03FF]/gi, '*');

You have to use the XRegexp plugin, along with the Unicode add-on.
Once you have that, you can use modern regexes like /[\p{L}\p{Nl}]/, which necessarily also includes those \p{Greek} code points which are letters or letter-numbers. But you could also match /[\p{Latin}\p{Greek}]/ if you wanted.
Javascript’s own regexes are terrible. Use XRegexp.

So something like: /^[^?&\^"]*$/ (that means the string is composed only of characters outside the five you listed)...
But if you want to have the greek characters and the unicode characters (what are unicode characters? àèéìòù? Japanese?) perhaps you'll have to use http://xregexp.com/ It is a regex library for javascript that includes character classes for the various unicode character classes (I know I'm repeating myself) plus other "commands" for unicode handling.

Validating any string with RegEx

I want to validate any string that contains çÇöÖİşŞüÜğĞ chars and starting at least 5 chars.String to validate can contain spaces.RegEx must validate like "asd Çğ ğT i" for example.
Any reply will helpful.
Thanks.

You can use escape sequences of the form
\uXXXX
where each "X" can be any hex digit. Thus:
\u0020
is the same as a plain space character, and
\u0041
is upper-case "A". Thus you can encode the Unicode values for the characters you're interested in and then include them in a regex character class. To make sure the string is at least five characters long, you can use a quantifier in the regex.
You'll end up with something like:
var regex = /^[A-Za-z\u00nn\u00nn\u00nn]{5,}$/;
where those "00nn" things would be the appropriate values. As to exactly what those values are, you should be able to find them on a reference site like this one or maybe this one. For example I think that "Ö" is \u00D6. (Some of your characters are in the Unicode Latin-1 Supplement, while others are in Latin Extended A.)

Develop Reference

JavaScript is the programming language of the Web.

Detecting characters having a similar connotation as in ASCII set - javascript

Related

Cannot get a regex to work in JavaScript that allows whitespace and backslash

regular expression incorrectly matching % and $

Can it be done with regex?

Regular expression to allow all alphabet characters plus unicode characters

Validating any string with RegEx

Categories

Resources