Why is 'B' matched by [a-z]? - javascript

a very simple & naive question:
why this is true?
new RegExp('^[a-z]+$', 'i').test('B')
apparently 'B' is out of [a-z]?

Yes, but you have the i parameter which tells the regex to ignore case.
From the MDN documentation for RegEx:
Parameters
pattern
The text of the regular expression.
flags
If specified, flags can have any combination of the following values:
...
i
ignore case

It's defining a class, which is to say [a-z] is symbolic of "any character, from a to z."
Regex is, by nature, case SensAtiVe as well, so [a-z] varies from [A-Z] (unless you use the i (case insensitive) flag, like you've demonstrated).
e.g.
/[a-z]/ -- Any single character, a through z
/[A-Z]/ -- Any single uppercase letter, A through Z
/[a-zA-Z]/ -- Any single upper or lowercase letter, a through z
/[a-z]/i or /[A-Z]/i -- (note the i) Any upper or lowercase letter, a through z

Summary
The [a-z] means a character set containing characters a-z.
The ^ is an anchor which means the set must begin with the first character of input.
The + means you must match on one or more from the character set.
The $ is an end anchor meaning the set must end the last character of input.
The i means to ignore case on your input letters.

It means any character between a and z.
As you specified the i flag (case insensitive), it contains also B.
The whole regexp checks that the string contains at least one character and that all characters are in a-z or A-Z.
You can check that new RegExp('^[a-z]+$', 'i').test('B') returns true.

Related

Finding all words ending in "ion" with regex in JavaScript [duplicate]

I need help putting together a regex that will match word that ends with "Id" with case sensitive match.
Try this regular expression:
\w*Id\b
\w* allows word characters in front of Id and the \b ensures that Id is at the end of the word (\b is word boundary assertion).
Gumbo gets my vote, however, the OP doesn't specify whether just "Id" is an allowable word, which means I'd make a minor modification:
\w+Id\b
1 or more word characters followed by "Id" and a breaking space. The [a-zA-Z] variants don't take into account non-English alphabetic characters. I might also use \s instead of \b as a space rather than a breaking space. It would depend if you need to wrap over multiple lines.
This may do the trick:
\b\p{L}*Id\b
Where \p{L} matches any (Unicode) letter and \b matches a word boundary.
How about \A[a-z]*Id\z? [This makes characters before Id optional. Use \A[a-z]+Id\z if there needs to be one or more characters preceding Id.]
I would use
\b[A-Za-z]*Id\b
The \b matches the beginning and end of a word i.e. space, tab or newline, or the beginning or end of a string.
The [A-Za-z] will match any letter, and the * means that 0+ get matched. Finally there is the Id.
Note that this will match words that have capital letters in the middle such as 'teStId'.
I use http://www.regular-expressions.info/ for regex reference
Regex ids = new Regex(#"\w*Id\b", RegexOptions.None);
\b means "word break" and \w means any word character. So \w*Id\b means "{stuff}Id". By not including RegexOptions.IgnoreCase, it will be case sensitive.

Removing Special Character in JavaScript

string = string.replace(/[^a-zA-Z0-9]/g, '');
Can anyone Explain this snippet (/[^a-zA-Z0-9]/g, '') ???
What it means?
I know what it does. But how
The snippet
string = string.replace(/[^a-zA-Z0-9]/g, '');
Uses RegEx (Regular Expression) to find a paten and then replace it.
This is how it works:
The .replace function replaces a specific character or patern in a string (first argument) with another (second argument)
The /[^a-zA-Z0-9]/g is the regex. a-z means lower case characters from a-z, A-Z means upper case characters from a-z and 0-9 means number characters from 0-9. The ^ means not, so, not a letter, uppercase letter or number. The g at the end stands for global meaning it will not just find one match for the patern but all the matches
The empty string is what to replace the patern with. So if it is not a upper/lower case letter or number character it will be replaced with nothing
Read about RegEx here

JavaScript Regex start of string clarification + str.replace()

got a question about the start of string regex anchor tag ^.
I was trying to sanitize a string to check if it's a palindrome and found a solution to use regex but couldn't wrap my head around the explanations I found for the start of string anchor tag:
To my understanding:
^ denotes that whatever expression that follows must match, starting from the beginning of the string.
Question:
Why then is there a difference between the two output below:
1)
let x = 'A man, a plan, a canal: Panama';
const re = new RegExp(/[^a-z]/, 'gi');
console.log(x.replace(re, '*'));
Output: A*man**a*plan**a*canal**Panama
VS.
2)
let x = 'A man, a plan, a canal: Panama';
const re = new RegExp(/[a-z]/, 'gi');
console.log(x.replace(re, '*'));
Output: * ***, * ****, * *****: ******
VS.
3)
let x = 'A man, a plan, a canal: Panama';
const re = new RegExp(/^[a-z]/, 'gi');
console.log(x.replace(re, '*'));
Output: * man, a plan, a canal: Panama
Please let me know if my explanation for each of the case above is off:
1) Confused about this one. If it matches a character class of [a-z] case insensitive + global find, with start of string anchor ^ denoting that it must match at the start of each string, should it not return all the words in the sentence? Since each word is a match of [a-z] insensitive characters that occurs at the start of each string per global find iteration?
(i.e.
finds "A" at the start
then on the next iteration, it should start search on the remaining string " man"
finds a space...and moves on to search "man"?
and so on and so forth...
Q: Why does it then when I call replace does it only targets the non alpha stuff? Should I in this case be treating ^ as inverting [a-z]?
2) This seems pretty straight forward, finds all occurrence of [a-z]and replaces them with the start. Inverse case of 1)??
3) Also confused about this one. I'm not sure how this is different from 1).
/^[a-z]/gi to me means: "starting at the start of the string being looked at, match all alpha characters, case insensitive. Repeat for global find".
Compared to:
1) /[^a-z]/gi to me means: "match all character class that starts each line with alpha character. case insensitive, repeat search for global find."
To mean they mean exactly the same #_#. Please let me know how my understanding is off for the above cases.
Your first expression [^a-z] matches anything other than an alphabetic, lower case letter, therefore that's why when you replace with * all the special characters such as whitespace, commas and colons are replaced.
Your second expression [a-z] matches any alphabetic, lower case letter, therefore the special characters mentioned are not replaced by *.
Your third expression ^[a-z] matches a alphabetic, lower case letter at the start of the string, therefore only the first letter is replaced by *.
For the first two expressions, the global flag g ensures that all characters that match the specified pattern, regardless of their position in the string, are replaced. For the third pattern however, since ^ anchors the pattern at the beginning of the string, only the first letter is replaced.
As you mentioned, the i flag ensures case insensitivity, so that all three patterns operate on both lower and upper case alphabetic letters, from a to z and A to Z.
The character ^ therefore has two meanings:
It negates characters in a character set.
It asserts position at the start of string.
^ denotes that whatever expression that follows must match, starting from the beginning of the string.
That's only when it's the first thing in the regex; it has other purposes when used elsewhere:
/[^a-z]/gi
In the above regex, the ^ does not indicate anchoring the match to the beginning of a string; it inverts the rest of the contents of the [] -- so the above regex will match any single character except a-z. Since you're using the g flag it will repeat that match for all characters in the string.
/[a-z]/gi
The above is not inverted, so will match a single instance of any character from a-z (and again because of the g flag will repeat to match all of those instances.)
/^[a-z]/gi
In this last example, the caret anchors the match to the beginning of the string; the bracketed portion will match any single a-z character. The g flag is still in use, so the regex would try to continue matching more characters later in the string -- but none of them except the first one will will meet the anchored-to-start requirement, so this will end up matching only the first character (if it's within a-z), exactly as if the g flag was not in use.
(When used anywhere in a regex other than the start of the regex or the start of a [] group, the ^ will be treated as a literal ^.)
If you're trying to detect palindromes, you'll want to remove everything except letter characters (and will probably want to convert everything to the same letter case, instead of having to detect that "P" == "p":)
const isPalindrome = function(input) {
let str = input.toLowerCase().replace(/[^a-z]/g,'');
return str === str.split('').reverse().join('')
}
console.log(isPalindrome("Able was I, ere I saw Elba!"))
console.log(isPalindrome("No, it never propagates if I set a ”gap“ or prevention."))
console.log(isPalindrome("Are we not pure? “No, sir!” Panama’s moody Noriega brags. “It is garbage!” Irony dooms a man –– a prisoner up to new era."))
console.log(isPalindrome("Taco dog is not a palindrome."))

regular expression incorrectly matching % and $

I have a regular expression in JavaScript to allow numeric and (,.+() -) character in phone field
my regex is [0-9-,.+() ]
It works for numeric as well as above six characters but it also allows characters like % and $ which are not in above list.
Even though you don't have to, I always make it a point to escape metacharacters (easier to read and less pain):
[0-9\-,\.+\(\) ]
But this won't work like you expect it to because it will only match one valid character while allowing other invalid ones in the string. I imagine you want to match the entire string with at least one valid character:
^[0-9\-,\.\+\(\) ]+$
Your original regex is not actually matching %. What it is doing is matching valid characters, but the problem is that it only matches one of them. So if you had the string 435%, it matches the 4, and so the regex reports that it has a match.
If you try to match it against just one invalid character, it won't match. So your original regex doesn't match the string %:
> /[0-9\-,\.\+\(\) ]/.test("%")
false
> /[0-9\-,\.\+\(\) ]/.test("44%5")
true
> "444%6".match(/[0-9\-,\.+\(\) ]/)
["4"] //notice that the 4 was matched.
Going back to the point about escaping, I find that it is easier to escape it rather than worrying about the different rules where specific metacharacters are valid in a character class. For example, - is only valid in the following cases:
When used in an actual character class with proper-order such as [a-z] (but not [z-a])
When used as the first or last character, or by itself, so [-a], [a-], or [-].
When used after a range like [0-9-,] or [a-d-j] (but keep in mind that [9-,] is invalid and [a-d-j] does not match the letters e through f).
For these reasons, I escape metacharacters to make it clear that I want to match the actual character itself and to remove ambiguities.
You just need to anchor your regex:
^[0-9-,.+() ]+$
In character class special char doesn't need to be escaped, except ] and -.
But, these char are not escaped when:
] is alone in the char class []]
- is at the begining [-abc] or at the end [abc-] of the char class or after the last end range [a-c-x]
Escape characters with special meaning in your RegExp. If you're not sure and it isn't an alphabet character, it usually doesn't hurt to escape it, too.
If the whole string must match, include the start ^ and end $ of the string in your RegExp, too.
/^[\d\-,\.\+\(\) ]*$/

Regex to match card code input

How can I write a regex to match strings following these rules?
1 letter followed by 4 letters or numbers, then
5 letters or numbers, then
3 letters or numbers followed by a number and one of the following signs: ! & # ?
I need to allow input as a 15-character string or as 3 groups of 5 chars separated by one space.
I'm implementing this in JavaScript.
I'm not going to write out the whole regex for you since this is homework, but here are some hints which should help you out:
Use character classes. [A-Z] matches all uppercase. [a-z] matches all lowercase. [0-9] matches numbers. You can combine them like so [A-Za-z0-9].
Use quantifiers like {n} so [A-Z]{3} gives you 3 uppercase letters.
You can put other characters in character classes. Let's say you wanted to match % or # or #, you could do [%##] which would match any of those characters.
Some meta-characters (characters which have special meaning in the context of regular expressions) will need to be escaped like so: \$ (since $ matches the end of a line)
^ and $ match the beginning and end of the line respectively.
\s matches white-space, but if you sanitize your input, you shouldn't need to use this.
Flags after the regex do special things. For example in /[a-z]/i, the i ignores case.
This should be it:
/^[a-z][a-z0-9]{4} ?[a-z0-9]{5} ?[a-z0-9]{3}[0-9][!&#?]$/i
Feel free to change 0-9 and [0-9] with \d if you see fit.
The regex is simple and readable enough. ^ and $ make sure this is a whole match, so there aren't extra characters before or after the code, and the /i flag allows upper or lower case letters.
I would start with a tutorial.
Pay attention to the quantifiers (like {N}) and character classes (like [a-zA-Z])
^[a-zA-Z][a-zA-Z0-9]{4} ?[a-zA-Z0-9]{5} ?[a-zA-Z0-9]{3}[\!\&\#\?]$

Categories

Resources