JavaScript replace regex :( - javascript

I want to replace invalid characters when entering a phone number
The rules i want are below.
The first character can be "+"
The remaining characters have to be digits 0-9
This is what I have already
phoneNumber.getValue().replace(/[^0-9,+]+/g, "");
this works, kind of but not fully
however I can have a "+" anywhere in the string
I want to remove this if it is not the first character does anyone now how to-do this
Thanks
AJ

Assuming that you don't want to include commas, (i.e. the result should be only decimal digits preceded by an optional +), then this will do it:
phoneNumber.getValue().replace(/(^\+)|\D+/g, '$1');
Notes:
Makes use of the \D non-decimal digit character class shorthand. (i.e. \D is the same as: [^0-9].)
Makes use of the fact that when a capturing group does not participate in the match, it can still be referenced in the replacement string - (it is replaced with the empty string).

Make a slightly smarter regex.
phoneNumber.getValue().replace(/.*(?=\+)|[^0-9]/g,'')
This regex sort of abuses the fact that alternation in replacements work from left to right. Because the + has sort of been "covered" by the lookahead in the first alternation, it is effectively immune to being deleted by the [^0-9] part.

Another one would be
phoneNumber.getValue().replace(/[^\+0-9]/g, '')
When using "+1234abc+5678" as value this results in "+1234+5678"
Whereas the regex of Niet the Dark Absol results in "+5678"
Probably you have to combine that using substring to remove the second plus.
That would be pretty easy.

Related

How to use regex ?: operator and get the right group in my case? [duplicate]

This is an example string:
123456#p654321
Currently, I am using this match to capture 123456 and 654321 in to two different groups:
([0-9].*)#p([0-9].*)
But on occasions, the #p654321 part of the string will not be there, so I will only want to capture the first group. I tried to make the second group "optional" by appending ? to it, which works, but only as long as there is a #p at the end of the remaining string.
What would be the best way to solve this problem?
You have the #p outside of the capturing group, which makes it a required piece of the result. You are also using the dot character (.) improperly. Dot (in most reg-ex variants) will match any character. Change it to:
([0-9]*)(?:#p([0-9]*))?
The (?:) syntax is how you get a non-capturing group. We then capture just the digits that you're interested in. Finally, we make the whole thing optional.
Also, most reg-ex variants have a \d character class for digits. So you could simplify even further:
(\d*)(?:#p(\d*))?
As another person has pointed out, the * operator could potentially match zero digits. To prevent this, use the + operator instead:
(\d+)(?:#p(\d+))?
Your regex will actually match no digits, because you've used * instead of +.
This is what (I think) you want:
(\d+)(?:#p(\d+))?

Regex for digits and hyphen only

I am trying to understand regex, for digits of length 10 I can simply do
/^[0-9]{10}$/
for hyphen only I can do
/^[-]$/
combining the two using group expression will result in
/^([0-9]{10})|([-])$/
This expression does not work as intended, it somehow will match part of the string instead of not match at all if the string is invalid.
How do I make the regex expression that accepts only "-" or 10 digits?
It would have worked fine to combine your two regexps exactly as you had them. In other words, just use the alternation/pipe operator to combine
/^[0-9]{10}$/
and
/^[-]$/
as is, directly into
/^[0-9]{10}$|^[-]$/
↑↑↑↑↑↑↑↑↑↑↑ ↑↑↑↑↑ YOUR ORIGINAL REGEXPS, COMBINED AS IS WITH |
This can be represented as
and that would have worked fine. As others have pointed out, you don't need to specify the hyphen in a character class, so
/^[0-9]{10}$|^-$/
↑ SIMPLIFY [-] TO JUST -
Now, we notice that each of the two alternatives has a ^ at the beginning and a $ at the end. That is a bit duplicative, and it also makes it little harder to see immediately that the regexp is always matching things from beginning to end. Therefore, we can rewrite this, as explained in other answers, by taking the ^ and $ out of both sub-regexps, and combine their contents using the grouping operator ():
/^([0-9]{10}|-)$/
↑↑↑↑↑↑↑↑↑↑↑↑↑ GROUP REGEXP CONTENTS WITH PARENS, WITH ANCHORS OUTSIDE
The corresponding visualization is
That would also work fine, but you could use \d instead of [0-9], so the final, simplest version is:
/^(\d{10}|-)$/
↑↑ USE \d FOR DIGITS
and this visualizes as
If for some reason you don't want to "capture" the group, use (?:, as in
/^(?:\d{10}|-)$/
↑↑ DON'T CAPTURE THE GROUP
and the visualization now shows that group is not captured:
By the way, in your original attempt to combine the two regexps, I noticed that you parenthesized them as in
/^([0-9]{10})|([-])$/
↑↑↑↑↑↑↑↑↑↑↑ ↑↑↑↑↑ YOU PARENTHESIZED THE SUB-REGEXPS
But actually this is not necessary, because the pipe (alternation, of "or") operator has low precedence already (actually it has the lowest precedence of any regexp operator); "low precedence" means it will apply only after things on both side are already processed, so what you wrote here is identical to
/^[0-9]{10}|[-]$/
which, however, still won't work for the reasons mentioned in other answers, as is clear from its visualization:
How do I make the regex expression that accepts only "-" or 10 digits?
You can use:
/^([0-9]{10}|-)$/
RegEx Demo
Your regex is just asserting presence of hyphen in the end due to misplacements of parentheses.
Here is the effective breakdown of OP's regex:
^([0-9]{10}) # matches 10 digits at start
| # OR
([-])$ # matches hyphen at end
which will cause OP's regex to match any input starting with 10 digits or ending with hyphen making these invalid inputs also a valid match:
1234567890111
1234----
------------------
1234567890--------
To get the regex expression that accepts only "-" or 10 digits - change your regexp as shown below:
^(\d{10}|-)$
DEMO link
The problem with your regex is it's looking for strings either
starting with 10 digits i.e. ^([0-9]{10}) or
ends with "-" - i.e. ([-])$
You needs an addtional wrapping ^( .. )$ to get this work. i.e.
/^(([0-9]{10})|([-]))$/
Better yet /^([0-9]{10}|-)$/ since [-] and - are both the same.

Javascript regex: how to not capture an optional string on the right side

For example /(www\.)?(.+)(\.com)?/.exec("www.something.com") will result with 'something.com' at index 1 of the resulting array. But what if we want to capture only 'something' in a capturing group?
Clarifications:
The above string is just for example - we dont want to assume anything about the suffix string (.com above). It could as well be orange.
Just this part can be solved in C# by matching from right to left (I dont know of a way of doing that in JS though) but that will end up having www. included then!
Sure, this problem as such is easily solvable mixing regex with other string methods like replace / substring. But is there a solution with only regex?
(?:www\.)?(.+?)(?:\.com|$)
This will give only something ingroups.Just make other groups non capturing.See demo.
https://regex101.com/r/rO0yD8/4
Just removing the last character (?) from the regex does the trick:
https://regex101.com/r/uR0iD2/1
The last ? allows a valid output without the (\.com) matching anything, so the (.+) can match all the characters after the www..
Another option is to replace the greedy quantifier +, which always tries to match as much characters as possible, with the +?, which tries to match as less characters as possible:
(www\.)?(.+?)(\.com)?$
https://regex101.com/r/oY7fE0/2
Note that it is necessary to force a match with the entire string through the end of line anchor ($).
If you only want to capture "something", use non-capturing groups for the other sections:
/(?:www\.)?(.+)(?:\.com)?/.exec("www.something.com")
The ?: denotes the groups as non-capturing.

RegEx in JS to find No 3 Identical consecutive characters

How to find a sequence of 3 characters, 'abb' is valid while 'abbb' is not valid, in JS using Regex (could be alphabets,numerics and non alpha numerics).
This question is a variation of the question that I have asked in here : How to combine these regex for javascript.
This is wrong : /(^([0-9a-zA-Z]|[^0-9a-zA-Z]))\1\1/ , so what is the right way to do it?
This depends on what you actually mean. If you only want to match three non-identical characters (that is, if abb is valid for you), you can use this negative lookahead:
(?!(.)\1\1).{3}
It first asserts, that the current position is not followed by three times the same character. Then it matches those three characters.
If you really want to match 3 different characters (only stuff like abc), it gets a bit more complicated. Use these two negative lookaheads instead:
(.)(?!\1)(.)(?!\1|\2).
First match one character. Then we assert, the this is not followed by the same character. If so, we match another character. Then we assert that these are followed neither by the first nor the second character. Then we match a third character.
Note that those negative lookaheads ((?!...)) do not consume any characters. That is why they are called lookaheads. They just check what is coming next (or in this case what is not coming next) and then the regex continues from where it left of. Here is a good tutorial.
Note also that this matches anything but line breaks, or really anything if you use the DOTALL or SINGLELINE option. Since you are using JavaScript you can just activate the option by appending s after the regexes closing delimiter. If (for some reason) you don't want to use this option, replace the .s by [\s\S] (this always matches any character).
Update:
After clarification in the comments, I realised that you do not want to find three non-identical characters, but instead you want to assert that your string does not contain three identical (and consecutive) characters.
This is a bit easier, and closer to your former question, since it only requires one negative lookahead. What we do is this: we search the string from the beginning for three consecutive identical characters. But since we want to assert that these do not exist we wrap this in a negative lookahead:
^(?!.*(.)\1\1)
The lookahead is anchored to the beginning of the string, so this is the only place where we will look. The pattern in the lookahead then tries to find three identical characters from any position in the string (because of the .*; the identical characters are matched in the same way as in your previous question). If the pattern finds these, the negative lookahead will thus fail, and so the string will be invalid. If not three identical characters can be found, the inner pattern will never match, so the negative lookahead will succeed.
To find non-three-identical characters use regex pattern
([\s\S])(?!\1\1)[\s\S]{2}

Please explain some Javascript Regular Expressions

I'm learning Javascript via an online tutorial, but nowhere on that website or any other I googled for was the jumble of symbols explained that makes up a regular expression.
Check if all numbers: /^[0-9]+$/
Check if all letters: /^[a-zA-Z]+$/
And the hardest one:
Validate Email: /^[\w-.+]+\#[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
What do all the slashes and dollar signs and brackets mean? Please explain.
(By the way, what languages are required to create a flexible website? I know a bit of Javascript and wanna learn jQuery and PHP. Anything else needed?)
Thanks.
There are already a number of good sites that explain regular expressions so I'll just dive a bit into how each of the specific examples you gave translate.
Check if all numbers: ^ anchors the start of the expression (e.g. start at the beginning of the text). Without it a match could be found anywhere. [0-9] finds the characters in that character class (e.g. the numbers 0-9). The + after the character class just means "one or more". The ending $ anchors the end of the text (e.g. the match should run to the end of the input). So if you put that together, that regular expression would allow for only 1 or more numbers in a string. Note that the anchors are important as without them it might match something like "foo123bar".
Check if all letters: Pretty much the same as above but the character classes are different. In this example the character class [a-zA-Z] represents all lowercase and uppercase characters.
The last one actually isn't any more difficult than the other two it's just longer. This answer is getting quite long so I'll just explain the new symbols. A \w in a character class will match word characters (which are defined per regex implementation but are generally 0-9a-zA-Z_ at least). The backslash before the # escapes the # so that it isn't seen as a token in the regex. A period will match any character so .+ will match one or more of any character (e.g. a, 1, Z, 1a, etc). The last part of the regex ({2,4}) defines an interval expression. This means that it can match a minimum of 2 of the thing that precedes it, and a maximum of 4.
Hope you got something out of the above.
There is an awesome explanation of regular expressions at http://www.regular-expressions.info/ including notes on language and implementation specifics.
Let me explain:
Check if all numbers: /^[0-9]+$/
So, first thing we see is the "/" at the beginning and the end. This is a deliminator, and only serves to show the beginning and end of the regular expression.
Next, we have a "^", this means the beginning of the string. [0-9] means a number from 0-9. + is a modifier, which modifies the term in front of it, in this case, it means you can have one or more of something, so you can have one or more numbers from 0-9.
Finally, we end with "$", which is the opposite of "^", and means the end of the string. So put that all together and it basically makes sure that inbetween the start and end of the string, there can be any number of digits from 0-9.
Check if all letters: /^[a-zA-Z]+$/
We notice this is very similar, but instead of checking for numbers 0-9, it checks for letters a-z (lowercase) and A-Z (uppercase).
And the hardest one:
Validate Email: /^[\w-.+]+\#[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
"\w" means that it is a word, in this case we can have any number of letters or numbers, as well as the period means that it can be pretty much any character.
The new thing here is escape characters. Many symbols cannot be used without escaping them by placing a slash in front, as is the case with "\#". This means it is looking directly for the symbol "#".
Now it looks for letters and symbols, a period (this one seems incorrect, it should be escaping the period too, though it will still work, since an unescaped period will make any symbol). Numbers inside {} mean that there is inbetween this many terms in the previous term, so of the [a-zA-Z0-9], there should be 2-4 characters (this part here is the website domain, such as .com, .ca, or .info). Note there's another error in this one here, the [a-zA-z0-9] should be [a-zA-Z0-9] (capital Z).
Oh, and check out that site listed above, it is a great set of tutorials too.
Regular Expressions is a complex beast and, as already pointed out, there are quite a few guides off of google you can go read.
To answer the OP questions:
Check if all numbers: /^[0-9]+$/
regexps here are all delimated with //, much like strings are quoted with '' or "".
^ means start of string or line (depending on what options you have about multiline matching)
[...] are called character classes. Anything in [] is a list of single matching characters at that position in this case 0-9. The minus sign has a special meaning of "sequence of characters between". So [0-9] means "one of 0123456789".
+ means "1 or more" of the preceeding match (in this case [0-9]) so one or more numbers
$ means end of string/line match.
So in summary find any string that contains only numbers, i.e '0123a' will not match as [0-9]+ fails to match a before $).
Check if all letters: /^[a-zA-Z]+$/
Hopefully [A-Za-z] makes sense now (A-Z = ABCDEF...XYZ and a-z abcdef...xyz)
Validate Email: /^[\w-.+]+\#[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
Not all regexp parses know the \w sequence. Javascript, java and perl I know do support it.
I have already have covered '/^ at the beginning, for this [] match we are looking for
\w - . and +. I think that regexp is incorrect. Either the minus sign should be escaped with \ or it should be at the end of the [] (i.e [\w+.-]). But that is an aside they are basically attempting to allow anything of abcdefghijklmnopqrstuvwxyz01234567890-.+
so fred.smith-foo+wee#mymail.com will match but fred.smith%foo+wee#mymail.com wont (the % is not matched by [\w.+-]).
\# is the litteral atsil sign (it is escaped as perl expands # an array variable reference)
[a-zA-Z0-9.-]+ is the same as [\w.-]+. Very much like the user part of the match, but does not match +. So this matches foo.com. and google.co. but not my+foo.com or my***domain.co.
. means match any one character. This again is incorrect as fred#foo%com will match as . matches %*^%$£! etc. This should of been written as \.
The last character class [a-zA-z0-9]{2,4} looks for between 2 3 or 4 of the a-zA-Z0-9 specified in the character class (much like + looks for "1 more more" {2,4} means at least 2 with a maximum of 4 of the preceeding match. So 'foo' matches, '11' matches, '11111' does not match and 'information' does not.
The "tweaked" regexp should be:
/^[\w.+-]+\#[a-zA-Z0-9.-]+\.[a-zA-z0-9]{2,4}$/
I'm not doing a tutorial on RegEx's, that's been done really well already, but here are what your expressions mean.
/^<something>$/ String begins, has something in the middle, and then immediately ends.
/^foo$/.test('foo'); // true
/^foo$/.test('fool'); // false
/^foo$/.test('afoo'); // false
+ One or more of something:
/a+/.test('cot');//false
/a+/.test('cat');//true
/a+/.test('caaaaaaaaaaaat');//true
[<something>] Include any characters found between the brackets. (includes ranges like 0-9, a-z, and A-Z, as well as special codes like \w for 0-9a-zA-Z_-
/^[0-9]+/.test('f00')//false
/^[0-9]+/.test('000')//true
{x,y} between X and Y occurrences
/^[0-9]{1,2}$/.test('12');// true
/^[0-9]{1,2}$/.test('1');// true
/^[0-9]{1,2}$/.test('d');// false
/^[0-9]{1,2}$/.test('124');// false
So, that should cover everything, but for good measure:
/^[\w-.+]+\#[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
Begins with at least character from \w, -, +, or .. Followed by an #, followed by at least one in the set a-zA-Z0-9.- followed by one character of anything (. means anything, they meant \.), followed by 2-4 characters of a-zA-z0-9
As a side note, this regular expression to check emails is not only dated, but it is very, very, very incorrect.

Categories

Resources