Regex what exactly does [ ] do? [duplicate] - javascript

This question already has answers here:
Why does character range class [A-z] match underscore?
(1 answer)
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
Regular expressions that have [] have always confused me a bit. Below are some common patterns for the use of []
/[0-9]/ Captures all numbers
/[A-Z]/ Captures all 26 uppercase letters
/[a-z]/ captures all 26 lowercase letters
But what about
/[A-Za-z0-9] captures all numbers, uppercase letters, and lowercase letters
Which could also be written as
/[0-z] which also captures all numbers, uppercase letters, and lowercase letters. But it also captures ^ and | as well, among other characters
why is this?

Its because of ASCII Tables
/[0-z] captures all ASCII values from 48 to 122
[A-Za-z0-9] does not

The [] in a regular expression denotes a character set. It tells the pattern matcher to match any character that appears inside the brackets. So, for instance,
/[abc]/
will match any one of 'a', 'b', or 'c'.
Inside the brackets, however, the hyphen ('-') has a special meaning: it denotes the entire range of characters between the character just before and just after the hyphen (inclusive). That is, the above regex could have been written:
/[a-c]/
If you want to include a literal hyphen in the list of characters in the set, you need to escape it. That is:
/[a\-c]/
will match any one of 'a', '-', or 'c' (and not 'b'). You can also suppress the special meaning of the hyphen by making it the first or last character in the set, so:
/[-ac]/
will also match any one of 'a', '-', or 'c'.
This explains why /[A-Za-z0-9]/ is not the same thing as /[0-z]/: the range of characters between '0' and 'z' simply includes additional characters, as you noted in your question. That's all there is to it.
As a technical detail, Javascript uses the Unicode standard to define what characters fall within a range. If you're sticking with the 7-bit ASCII character set, you'll get the same results using an ASCII chart. But don't use an ASCII chart for character codes above 0x7F. You need to consult the Unicode charts instead.

Related

Regex - how to ignore order of the matched groups? [duplicate]

This question already has answers here:
Password REGEX with min 6 chars, at least one letter and one number and may contain special characters
(10 answers)
Closed 2 years ago.
I'm trying to create a regex validation for a password which is meant to be:
6+ characters long
Has at least one a-z
Has at least one A-Z
Has at leat one 0-9
So, in other words, the match will have :
at least one a-z, A-Z, 0-9
at least 3 any other characters
I've came up with:
((.*){3,}[a-z]{1,}[A-Z]{1,}[0-9]{1,})
it seems pretty simple and logical to me, but 2 things go wrong:
quantifier {3,} for (.*) somehow doesn't work and destroys whole regex. At first I had {6,} at the end but then regex would affect the quantifiers in inner groups, so it will require [A-Z]{6,} instead of [A-Z]{1,}
when I remove {3,} the regex works, but will match only if the groups are in order - so that it will match aaBB11, but not BBaa11
This is a use case where I wouldn't use a single regular expression, but multiple simpler ones.
Still, to answer your question: If you only want to validate that the password matches those criteria, you could use lookaheads:
^(?=.{6})(?=.*?[a-z])(?=.*?[A-Z])(?=.*?[0-9])
You're basically looking for a position from which you look at
6 characters (and maybe more to follow, doesn't matter): (?=.{6})
maybe something, then a lowercase letter: (?=.*?[a-z])
maybe something, then an uppercase letter: (?=.*?[A-Z])
maybe something, then a digit: (?=.*?[0-9])
The order of appearance is arbitrary due to the maybe something parts.
(Note that I've interpreted 6 characters long as at least 6 characters long.)
I believe this is what you want:
^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])[!-~]{6,}$
If we follow your spec to the letter, your validation password looks like this:
^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).{6,}$
However, we need to improve on this, because apart from the number, lower-case and upper-case letter, are you really willing to accept any character? For instance, can the user use a character in the Thai language? A space character? A tab? Didn't think so. :)
If you want to allow all the printable ASCII characters apart from space, instead of a dot, we can use this character range: [!-~]
How does it work?
The ^ anchor makes sure we start the match at the start of the string
The (?=.*[a-z]) lookahead ensures we have a lower-case character
The (?=.*[A-Z]) lookahead ensures we have an upper-case character
The (?=.*[0-9]) lookahead ensures we a digit
The (?=.*[a-z]) lookahead ensures we have a lower-case character
The [!-~]{6,} matches six or more ASCII printable ASCII characters that are not space.
The $ ensures we have reached the end of the string (otherwise, the password could contain more characters that are not allowed).
you could use this pattern ^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).{6,}

Phone/Fax regular expression in JavaScript [duplicate]

This question already has answers here:
RegEx for Phone number in JavaScript
(3 answers)
Closed 8 years ago.
I am a beginner in regex.
I need a regular expression which satisfies following criteria. I tried lot of things but couldn't make it.
total no. of digits can be 10, 11 or 12
expression can except characters like -, (, ), space, /, \
expression can start with any digit/character mentioned above
max length of the expression is 16.
All digits and characters can appear in random order in expression
Can anyone please help me?
this pattern seems to work as requested ^(?=(?:\D*\d){10,12}\D*$)[0-9 \-()\\\/]{1,16}$
Demo
The expression [ 0-9()\-/\\]{10,16} fulfills all your requirements.
[...] is a positive character class definition. A matching character can be one of the characters in the square brackets.
The first character in the square brackets is a space character.
0-9 defines all digits (from charcter 0 to character 9). \d could be also used for any digit.
( and ) are also valid characters.
The character - has a special meaning in square brackets as you can see on 0-9 and therefore must be escaped in square brackets by the backslash character when it should be interpreted as literal character.
The slash is the next character. Please note that also the slash must be escaped with a backslash when using this regular expression in a JavaScript RegExp object.
And the last character with the square brackets is the backslash character which must be always escaped with one more backslash as it is the escape character if the backslash should be interpreted as literal character.
{10,16} ... means that the preceding expression must be positively applied on a string for at least 10 but not more than 16 characters.
But you should really search for expressions matching phone numbers in WWW as this is a very often needed expression. You should not reinvent the wheel respectively the expression.
I recommend using http://regexlib.com for your RegEx needs. Very good site with tons of RegEx's that you can browse on. I also recommend http://regex101.com for testing regular expressions. Has a very great tool to help you build/modify/test your expressions.

Javascript regex "replace(/[ -_]/g)" deletes numbers?

I was doing some tests in Javascript with the replace javascript function.
Consider the following examples executed on a node REPL.
It's a replace that deletes spaces, hyphens and underscores from a string.
> "call this 9344 5 66 22".replace(/[ _-]/g, '');
'callthis934456622'
That was what I was expecting. To only delete the spaces.
However take a look at this:
> "call this 9344 5 66 22".replace(/[ -_]/g, '');
'callthis'
Why when I put this regex combination exact like this -_ (space, hyphen, underscore) it deletes the numbers in the string?
More tests I did:
-(space, hyphen) does not deletes numbers
_(space, underscore) does not deletes numbers
_-(space, underscore, hyphen) does not deletes numbers
-_(hyphen, underscore, space) does not deletes numbers
_-(underscore, hyphen, space) REPL blocks??
-_(space, hyphen, underscore) does deletes numbers
[ -_] means characters from space (ASCII 32) to _ (ASCII 95) which includes, among other things, numbers and capital letters.
What you are looking for is [ \-_]. Escaping the - will make it act like the character instead of the meta-character for ranges.
Hyphen if not present at start or end position in a character class needs to be escaped otherwise it represents a range.
So this regex:
[ -_]
will match anything from space to underscore i.e. ASCII 32-95
The - character has special meaning in character classes. When it appears between two characters, it represents a character range — e.g. [a-z] matches any character with a character code between a and z, inclusive.
However, as you've observed, when it's placed at the beginning or end of the character class, it just represents a literal - character. This can also be accomplished by escaping the - within the character class — i.e. [ \-_].
"call this 9344 5 66 22".replace(/(\s|-|_)/g, '');
In a class, the dash - character has special meaning as a range operator ONLY when
it doesn't separate clauses, parsed left to right.
Otherwise it is considered no different than any other literal.
Regular expression parsers have no time to worry about good form.
So you can put the dash anywhere you want as a literal, as long as it separates clauses (i.e. its not ambigous).
Most people put it at the end or beginning or escape it so no conceptual errors occur.
Example of clauses, which are hilighted, and literal dashes:
[-a-z-\p{L}-0-9-\x00-\x09-\x20-]

Regex which accepts alphanumerics only, except for one hyphen in the middle

I am trying to construct a regular expression which accepts alphanumerics only ([a-zA-Z0-9]), except for a single hyphen (-) in the middle of the string, with a minimum of 9 characters and a maximum of 20 characters.
I have verified the following expression, which accepts a hyphen in the middle.
/^[a-zA-Z0-9]+\-?[a-zA-Z0-9]+$/
How can I set the minimum 9 and maximum 20 characters for the above regex? I have already used quantifiers + and ? in the above expression.
How would I apply {9,20} to the above expression? Are there any other suggestions for the expression?
/^[a-zA-Z0-9]+\-?[a-zA-Z0-9]+$/
can be simplified to
/^[a-z0-9]+(?:-[a-z0-9]+)?$/i
since if there is no dash then you don't need to look for more letters after it, and you can use the i flag to match case-insensitively and avoid having to reiterate both lower-case and upper-case letters.
Then split your problem into two cases:
9-20 alpha numerics
10-21 characters, all of which are alpha numerics except one dash
You can check the second using a positive lookahead like
/^(?=.{10,21}$)/i
to check the number of characters without consuming them.
Combining these together gives you
/^(?:[a-z0-9]{9,20}|(?=.{10,21}$)[a-z0-9]+-[a-z0-9]+)$/i
You can do this provided you don't want - to be present exactly in middle
/^(?=[^-]+-?[^-]+$)[a-zA-Z\d-]{9,20}$/
[^-] matches any character that is not -

Regex to match card code input

How can I write a regex to match strings following these rules?
1 letter followed by 4 letters or numbers, then
5 letters or numbers, then
3 letters or numbers followed by a number and one of the following signs: ! & # ?
I need to allow input as a 15-character string or as 3 groups of 5 chars separated by one space.
I'm implementing this in JavaScript.
I'm not going to write out the whole regex for you since this is homework, but here are some hints which should help you out:
Use character classes. [A-Z] matches all uppercase. [a-z] matches all lowercase. [0-9] matches numbers. You can combine them like so [A-Za-z0-9].
Use quantifiers like {n} so [A-Z]{3} gives you 3 uppercase letters.
You can put other characters in character classes. Let's say you wanted to match % or # or #, you could do [%##] which would match any of those characters.
Some meta-characters (characters which have special meaning in the context of regular expressions) will need to be escaped like so: \$ (since $ matches the end of a line)
^ and $ match the beginning and end of the line respectively.
\s matches white-space, but if you sanitize your input, you shouldn't need to use this.
Flags after the regex do special things. For example in /[a-z]/i, the i ignores case.
This should be it:
/^[a-z][a-z0-9]{4} ?[a-z0-9]{5} ?[a-z0-9]{3}[0-9][!&#?]$/i
Feel free to change 0-9 and [0-9] with \d if you see fit.
The regex is simple and readable enough. ^ and $ make sure this is a whole match, so there aren't extra characters before or after the code, and the /i flag allows upper or lower case letters.
I would start with a tutorial.
Pay attention to the quantifiers (like {N}) and character classes (like [a-zA-Z])
^[a-zA-Z][a-zA-Z0-9]{4} ?[a-zA-Z0-9]{5} ?[a-zA-Z0-9]{3}[\!\&\#\?]$

Categories

Resources