How to allow use sign ' with regular expression? - javascript

At the momemnt I'm using this regular expression to validate address
(!/^[A-Za-z]/i.test(street))
When I use address like this - Esterwergen - it works.
But when I added the sign before the title - 'Esterwergen - it shows my error.
Ho I can modify my RegEx and allow to use this sign before the name?

To allow an optional leading ', you'd change your regexp from
/^[A-Za-z]/
to
/^'?[A-Za-z]/
where the ? means "zero or one times".
If you want to allow the ' anywhere in your string,
/^['A-Za-z]/
would do the trick.
In addition, be sure that you realize that you're only checking the first character of the string as it is.
Right now you will allow Ester9ui4y6ewigkdlLNDSKJ#€=# :::.
To constrain that, you'll need the + quantifier and the $ (end-of-string) anchor.
/^[A-Za-z]+$/

Lets see what is your RegExp targeting:
/^[A-Za-z]/i
^: Asserts position at start of the string.
[]: Match a single character depending on what's inside.
A-Z: Match uppercase letters from A to Z.
a-z: Match lowercase letters from a to z.
i: Case-insensitive.
Consider this:
Using [A-Za-z] along with i flag is redundant. Use /^[a-z]/i or /^[A-z]/ instead.
Using [a-zA-Z\u00C0-\u00FF] for example extends matching to latin characters using UNICODE syntax. See full UNICODE reference here.
Use /^['a-z]+/i to allow ' anywhere in the string.
Use /^'?[a-z]+/i to allow ' only at the beginning of the string. ? means '1 time or 0 times'.
To play around with RegExp you can use tools like this.

Related

How to implement character restriction using regex in java?

Basically i have String restrict = "Hello+Hi"; which i want to restrict all characters other than (/^[a-zA-Z0-9~!##\(\.)]$/) using regex.
in javascript this is how it's being done
field.value.match(/^[a-zA-Z0-9~!##$%&^*()-_=.<>?)\(\)]+$/)
which will restrict all the characters between parenthesis.
i want my string to only contain (/^[a-zA-Z0-9~!##\(\.)]$/)
i really appreciate if somebody tell me how to do this in java.
You can use String.matches, which will match the whole String with a pattern.
In this case: "Hello+Hi".matches("\\p{Alpha}+") will return false because + is not a word character.
To avoid confusion: the + in the pattern is a greedy quantifier for 1+ character repeats.
The p{Alpha} represents alphabetic characters, and requires double-escaping.
See docs here.
Edit
Since you edited your requirement, just use the custom class as follows, plus the quantifier:
"Hello+World".matches("[a-zA-Z0-9~!##\\().]+") // returns false because of the +

Is this regex the most efficient way of parsing my string?

First off, here are the parameters to follow in the string I allow the user to input:
If there is a slash, it has to appear at the start of the string, nowhere else, is limited to 1, is optional and must be succeeded by [a-zA-Z].
If there is a tilde, it has to appear after a space " ", nothing else, is optional and must be succeeded by [a-zA-Z]. Also, this expression is limited to 2. (ie: ~exa ~mple is passed but ~exa ~mp ~le is not passed)
The slash followed by a word is an instruction, like /get or /post.
The tilde followed by a word is a parameter like ~now or ~later.
String format:
[instruction] (optional) [query] [extra parameters] (optional)
[instruction] - Must contain / succeeded with [a-zA-Z] only
[query] - Can contain [\w\s()'-] (alphanumeric, whitespace, parentheses, apostrophe, dash)
[extra parameters] - ~ preceded by whitespace, succeeded with only [a-zA-Z]
String examples that should work:
/get D0cUm3nt ex4Mpl3' ~now
D0cUm3nt ex4Mpl3'
/post T(h)(i5 s(h)ou__ld w0rk t0-0'
String examples that shouldn't work:
//get document~now
~later
example ~now~later
Before I pass the string through the regex I trim any whitespace at the start and end of the string (before any text is seen) but I don't trim double whitespaces within the string as some queries require them.
Here is the regex I used:
^(/{0,1}[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$
To break it down slightly:
[instruction check] - (/{0,1}[a-zA-Z])?
[query check] - [\w\s()'-]*
[parameter check] - ((\s~[a-zA-Z]*){0,2})?
This is the first time I've actually done any serious regex away from a tutorial so I'm wondering is there anything I can change within my regex to make it more compact/efficient?
All fresh perspectives are appreciated!
Thanks.
From your regex: ^(/{0,1}[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$,
you can change {0,1} to ? that is a shortcut to say 0 or 1 times:
^(/?[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$
The last part is present 0,1 or 2 times, then the ? is superfluous:
^(/?[a-zA-Z])?[\w\s()'-]*(\s~[a-zA-Z]*){0,2}$
The first part may be simplified too, the ? just after the / is superfluous:
^(/[a-zA-Z])?[\w\s()'-]*(\s~[a-zA-Z]*){0,2}$
If you don't use the captured groups, you can change them to non-capture group: (?: ) that are more efficient
^(?:/[a-zA-Z])?[\w\s()'-]*(?:\s~[a-zA-Z]*){0,2}$
You can also use the case-insensitive modifier (?i):
^(?i)(?:/[a-z])?[\w\s()'-]*(?:\s~[a-z]*){0,2}$
Finally, as said in OP, ~ must be followed by [a-zA-Z], so change the last * by +:
^(?i)(?:/[a-z])?[\w\s()'-]*(?:\s~[a-z]+){0,2}$
This looks slightly better:
^(?:/?[a-zA-Z]*\s)?[\w\s()'-]*(?:\s~[a-zA-Z]*)*$
https://codereview.stackexchange.com/ is more the place for this kind of thing
Assuming that capture groups are useful to you:
^((?:\/|\s~)[a-z]+)?([\w\s()'-]+)(~[a-z]+)?$
Regex101 Demo
Maybe this is what you're looking for:
var regex = /^((\/)?[a-zA-Z]+)?[\w\s()'-]*((\s~)?[a-zA-Z]+){0,2}$/;

Can it be done with regex?

Having the following regex: ([a-zA-Z0-9//._-]{3,12}[^//._-]) used like pattern="([a-zA-Z0-9/._-]{3,12}[^/._-])" to validate an HTML text input for username, I wonder if is there anyway of telling it to check that the string has only one of the following: ., -, _
By that I mean, that I'm in need of regex that would accomplish the following (if possible)
alex-how => Valid
alex-how. => Not valid, because finishing in .
alex.how => Valid
alex.how-ha => Not valid, contains already a .
alex-how_da => Not valid, contains already a -
The problem with my current regex, is that for some reason, accepts any character at the end of the string that is not ._-, and can't figure it out why.
The other problem, is that it doesn't check to see that it contains only of the allowed special characters.
Any ideas?
Try this one out:
^(?!(.*[.|_|-].*){2})(?!.*[.|_|-]$)[a-zA-Z0-9//._-]{3,12}$
Regexpal link. The regex above allow at max one of ., _ or -.
What you want is one or more strings containing all upper, lower and digit characters
followed by either one or none of the characters in "-", ".", or "_", followed by at least one character:
^[a-zA-Z0-9]+[-|_|\.]{0,1}[a-zA-Z0-9]+$
Hope this will work for you:-
It says starts with characters followed by (-,.,_) and followed and end with characters
^[\w\d]*[-_\.\w\d]*[\w\d]$
Seems to me you want:
^[A-Za-z0-9]+(?:[\._-][A-Za-z0-9]+)?$
Breaking it down:
^: beginning of line
[A-Za-z0-9]+: one or more alphanumeric characters
(?:[\._-][A-Za-z0-9]+)?: (optional, non-captured) one of your allowed special characters followed by one or more alphanumeric characters
$: end of line
It's unclear from your question if you wanted one of your special characters (., -, and _) to be optional or required (e.g., zero-or-one versus exactly-one). If you actually wanted to require one such special character, you would just get rid of the ? at the very end.
Here's a demonstration of this regular expression on your example inputs:
http://rubular.com/r/SQ4aKTIEF6
As for the length requirement (between 3 and 12 characters): This might be a cop-out, but personally I would argue that it would make more sense to validate this by just checking the length property directly in JavaScript, rather than over-complicating the regular expression.
^(?=[a-zA-Z0-9/._-]{3,12}$)[a-zA-Z0-9]+(?:[/._-][a-zA-Z0-9]+)?$
or, as a JavaScript regex literal:
/^(?=[a-zA-Z0-9\/._-]{3,12})[a-zA-Z0-9]+(?:[\/._-][a-zA-Z0-9]+)?$/
The lookahead, (?=[a-zA-Z0-9/._-]{3,12}$), does the overall-length validation.
Then [a-zA-Z0-9]+ ensures that the name starts with at least one non-separator character.
If there is a separator, (?:[/._-][a-zA-Z0-9]+)? ensures that there's at least one non-separator following it.
Note that / has no special meaning in a regex. You only have to escape it if you're using a regex literal (because / is the regex delimiter), and you escape it by prefixing with a backslash, not another forward-slash. And inside a character class, you don't need to escape the dot (.) to make it match a literal dot.
The dot in regex has a special meaning: "any character here".
If you mean a literal dot, you should escape it to tell the regex parser so.
Escape dot in a regex range

Please explain some Javascript Regular Expressions

I'm learning Javascript via an online tutorial, but nowhere on that website or any other I googled for was the jumble of symbols explained that makes up a regular expression.
Check if all numbers: /^[0-9]+$/
Check if all letters: /^[a-zA-Z]+$/
And the hardest one:
Validate Email: /^[\w-.+]+\#[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
What do all the slashes and dollar signs and brackets mean? Please explain.
(By the way, what languages are required to create a flexible website? I know a bit of Javascript and wanna learn jQuery and PHP. Anything else needed?)
Thanks.
There are already a number of good sites that explain regular expressions so I'll just dive a bit into how each of the specific examples you gave translate.
Check if all numbers: ^ anchors the start of the expression (e.g. start at the beginning of the text). Without it a match could be found anywhere. [0-9] finds the characters in that character class (e.g. the numbers 0-9). The + after the character class just means "one or more". The ending $ anchors the end of the text (e.g. the match should run to the end of the input). So if you put that together, that regular expression would allow for only 1 or more numbers in a string. Note that the anchors are important as without them it might match something like "foo123bar".
Check if all letters: Pretty much the same as above but the character classes are different. In this example the character class [a-zA-Z] represents all lowercase and uppercase characters.
The last one actually isn't any more difficult than the other two it's just longer. This answer is getting quite long so I'll just explain the new symbols. A \w in a character class will match word characters (which are defined per regex implementation but are generally 0-9a-zA-Z_ at least). The backslash before the # escapes the # so that it isn't seen as a token in the regex. A period will match any character so .+ will match one or more of any character (e.g. a, 1, Z, 1a, etc). The last part of the regex ({2,4}) defines an interval expression. This means that it can match a minimum of 2 of the thing that precedes it, and a maximum of 4.
Hope you got something out of the above.
There is an awesome explanation of regular expressions at http://www.regular-expressions.info/ including notes on language and implementation specifics.
Let me explain:
Check if all numbers: /^[0-9]+$/
So, first thing we see is the "/" at the beginning and the end. This is a deliminator, and only serves to show the beginning and end of the regular expression.
Next, we have a "^", this means the beginning of the string. [0-9] means a number from 0-9. + is a modifier, which modifies the term in front of it, in this case, it means you can have one or more of something, so you can have one or more numbers from 0-9.
Finally, we end with "$", which is the opposite of "^", and means the end of the string. So put that all together and it basically makes sure that inbetween the start and end of the string, there can be any number of digits from 0-9.
Check if all letters: /^[a-zA-Z]+$/
We notice this is very similar, but instead of checking for numbers 0-9, it checks for letters a-z (lowercase) and A-Z (uppercase).
And the hardest one:
Validate Email: /^[\w-.+]+\#[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
"\w" means that it is a word, in this case we can have any number of letters or numbers, as well as the period means that it can be pretty much any character.
The new thing here is escape characters. Many symbols cannot be used without escaping them by placing a slash in front, as is the case with "\#". This means it is looking directly for the symbol "#".
Now it looks for letters and symbols, a period (this one seems incorrect, it should be escaping the period too, though it will still work, since an unescaped period will make any symbol). Numbers inside {} mean that there is inbetween this many terms in the previous term, so of the [a-zA-Z0-9], there should be 2-4 characters (this part here is the website domain, such as .com, .ca, or .info). Note there's another error in this one here, the [a-zA-z0-9] should be [a-zA-Z0-9] (capital Z).
Oh, and check out that site listed above, it is a great set of tutorials too.
Regular Expressions is a complex beast and, as already pointed out, there are quite a few guides off of google you can go read.
To answer the OP questions:
Check if all numbers: /^[0-9]+$/
regexps here are all delimated with //, much like strings are quoted with '' or "".
^ means start of string or line (depending on what options you have about multiline matching)
[...] are called character classes. Anything in [] is a list of single matching characters at that position in this case 0-9. The minus sign has a special meaning of "sequence of characters between". So [0-9] means "one of 0123456789".
+ means "1 or more" of the preceeding match (in this case [0-9]) so one or more numbers
$ means end of string/line match.
So in summary find any string that contains only numbers, i.e '0123a' will not match as [0-9]+ fails to match a before $).
Check if all letters: /^[a-zA-Z]+$/
Hopefully [A-Za-z] makes sense now (A-Z = ABCDEF...XYZ and a-z abcdef...xyz)
Validate Email: /^[\w-.+]+\#[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
Not all regexp parses know the \w sequence. Javascript, java and perl I know do support it.
I have already have covered '/^ at the beginning, for this [] match we are looking for
\w - . and +. I think that regexp is incorrect. Either the minus sign should be escaped with \ or it should be at the end of the [] (i.e [\w+.-]). But that is an aside they are basically attempting to allow anything of abcdefghijklmnopqrstuvwxyz01234567890-.+
so fred.smith-foo+wee#mymail.com will match but fred.smith%foo+wee#mymail.com wont (the % is not matched by [\w.+-]).
\# is the litteral atsil sign (it is escaped as perl expands # an array variable reference)
[a-zA-Z0-9.-]+ is the same as [\w.-]+. Very much like the user part of the match, but does not match +. So this matches foo.com. and google.co. but not my+foo.com or my***domain.co.
. means match any one character. This again is incorrect as fred#foo%com will match as . matches %*^%$£! etc. This should of been written as \.
The last character class [a-zA-z0-9]{2,4} looks for between 2 3 or 4 of the a-zA-Z0-9 specified in the character class (much like + looks for "1 more more" {2,4} means at least 2 with a maximum of 4 of the preceeding match. So 'foo' matches, '11' matches, '11111' does not match and 'information' does not.
The "tweaked" regexp should be:
/^[\w.+-]+\#[a-zA-Z0-9.-]+\.[a-zA-z0-9]{2,4}$/
I'm not doing a tutorial on RegEx's, that's been done really well already, but here are what your expressions mean.
/^<something>$/ String begins, has something in the middle, and then immediately ends.
/^foo$/.test('foo'); // true
/^foo$/.test('fool'); // false
/^foo$/.test('afoo'); // false
+ One or more of something:
/a+/.test('cot');//false
/a+/.test('cat');//true
/a+/.test('caaaaaaaaaaaat');//true
[<something>] Include any characters found between the brackets. (includes ranges like 0-9, a-z, and A-Z, as well as special codes like \w for 0-9a-zA-Z_-
/^[0-9]+/.test('f00')//false
/^[0-9]+/.test('000')//true
{x,y} between X and Y occurrences
/^[0-9]{1,2}$/.test('12');// true
/^[0-9]{1,2}$/.test('1');// true
/^[0-9]{1,2}$/.test('d');// false
/^[0-9]{1,2}$/.test('124');// false
So, that should cover everything, but for good measure:
/^[\w-.+]+\#[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
Begins with at least character from \w, -, +, or .. Followed by an #, followed by at least one in the set a-zA-Z0-9.- followed by one character of anything (. means anything, they meant \.), followed by 2-4 characters of a-zA-z0-9
As a side note, this regular expression to check emails is not only dated, but it is very, very, very incorrect.

Removing Numbers from a String using Javascript

How do I remove numbers from a string using Javascript?
I am not very good with regex at all but I think I can use with replace to achieve the above?
It would actually be great if there was something JQuery offered already to do this?
//Something Like this??
var string = 'All23';
string.replace('REGEX', '');
I appreciate any help on this.
\d matches any number, so you want to replace them with an empty string:
string.replace(/\d+/g, '')
I've used the + modifier here so that it will match all adjacent numbers in one go, and hence require less replacing. The g at the end is a flag which means "global" and it means that it will replace ALL matches it finds, not just the first one.
Just paste this into your address bar to try it out:
javascript:alert('abc123def456ghi'.replace(/\d+/g,''))
\d indicates a character in the range 0-9, and the + indicates one or more; so \d+ matches one or more digits. The g is necessary to indicate global matching, as opposed to quitting after the first match (the default behavior).

Categories

Resources