How to match "two or more words"

How to match "two or more words" - javascript

In a given string, I'm trying to verify that there are at least two words, where a word is defined as any non-numeric characters so for example
// Should pass
Phil D'Sousa
Billy - the - Kid
// Should Fail
Joe
454545 354434
I thought this should work:
(\b\D*?\b){2,}
But it does not.

You forgot to allow for a space between your "words":
\b\D*?\b(?:\s+\b\D*?\b)+
^^^
There are a number of other problems I can see:
I'm also rather suspicious of your definition of "word". Any non-numeric character also includes punctuation and whitespace. That's probably not what you really mean. You might want to try defining word like this instead: [^\d\s]+. This still allows words to contain punctuation, but it disallows both numerals and whitespace.
There is a problem with your usage of word boundaries - if a word can consist of punctuation then words beginning or ending on punctuation won't have a word boundary so your regular expression will miss them.
Are you searching for a string that contains at least two "words", and possibly also some numbers? Or must the string consist only of "words" and no numbers at all anywhere in the string? Currently your regular expression is looking for two consecutive "words" but in general they might not be consecutive.

You can globally search for a "word" and check the length of the .match() if a match is found:.
If two or more words are found, you're good:
var matches = string.match(/\b[^\d\s]+\b/g);
if ( matches && matches.length >= 2 )
{ /* Two or more words ... */ };
You can define a word as \b[^d\s]+\b, which is a word boundary \b, one or more non digits and non whitespaces [^d\s]+, and another word boundary \b. You have to make sure to use the global option g for the regex to find all the possible matches.
You can tweak the definition of a word in your regex. The trick is to make use of the length property of the .match(), but you should not check this property if there are no matches, since it'll break the script, so you must do if (matches && matches.length ...).
Additionally it's quite simple to modify the above code for X words where X is either a number or a variable.
jsFiddle example with your 4 examples

This seems to work, for your definition of "word".
/((\W|^)\D+?(\W|$).*){2}/
Here are your four examples, plus some more added after editing and fixing this answer:
>>> r = /((\W|^)\D+?(\W|$).*){2}/
/((\W|^)\D+?(\W|$).*){2}/
>>> !!"Phil D'Sousa".match(r)
true
>>> !!"Billy - the - Kid".match(r)
true
>>> !!"Joe".match(r)
false
>>> !!"54545 354434".match(r)
false
>>> !!"foo bar baz".match(r)
true
>>> !!"123 foo 456".match(r)
false
>>> !!"123 foo 456 bar".match(r)

Looks good, bcherry EXCEPT for the fact that it will not match "foo bar":
>>> !!"foo bar".match(r)
false
However, "2 or more words" ( >= 2) will also include "foo bar" as well.

Related

Regex: how to exclude empty match from somthing like (RegexA)?(RegexB)?(RegexA)? [duplicate]

I have regex which works fine in my application, but it matches an empty string too, i.e. no error occurs when the input is empty. How do I modify this regex so that it will not match an empty string ? Note that I DON'T want to change any other functionality of this regex.
This is the regex which I'm using: ^([0-9\(\)\/\+ \-]*)$
I don't know a lot about regex formulation myself, which is why I'm asking. I have searched for an answer, but couldn't find a direct one. Closest I got to was this: regular expression for anything but an empty string in c#, but that doesn't really work for me ..

Replace "*" with "+", as "*" means "0 or more occurrences", while "+" means "at least one occurrence"

There are a lot of pattern types that can match empty strings. The OP regex belongs to an ^.*$ type, and it is easy to modify it to prevent empty string matching by replacing * (= {0,}) quantifier (meaning zero or more) with the + (= {1,}) quantifier (meaning one or more), as has already been mentioned in the posts here.
There are other pattern types matching empty strings, and it is not always obvious how to prevent them from matching empty strings.
Here are a few of those patterns with solutions:
[^"\\]*(?:\\.[^"\\]*)* ⇒ (?:[^"\\]|\\.)+
abc||def ⇒ abc|def (remove the extra | alternation operator)
^a*$ ⇒ ^a+$ (+ matches 1 or more chars)
^(a)?(b)?(c)?$ ⇒ ^(?!$)(a)?(b)?(c?)$ (the (?!$) negative lookahead fails the match if end of string is at the start of the string)
or ⇒ ^(?=.)(a)?(b)?(c?)$ (the (?=.) positive lookahead requires at least a single char, . may match or not line break chars depending on modifiers/regex flavor)
^$|^abc$ ⇒ ^abc$ (remove the ^$ alternative that enables a regex to match an empty string)
^(?:abc|def)?$ ⇒ ^(?:abc|def)$ (remove the ? quantifier that made the (?:abc|def) group optional)
To make \b(?:north|south)?(?:east|west)?\b (that matches north, south, east, west, northeast, northwest, southeast, southwest), the word boundaries must be precised: make the initial word boundary only match start of words by adding (?<!\w) after it, and let the trailing word boundary only match at the end of words by adding (?!\w) after it.
\b(?:north|south)?(?:east|west)?\b ⇒ \b(?<!\w)(?:north|south)?(?:east|west)?\b(?!\w)

You can either use + or the {min, max} Syntax:
^[0-9\(\)\/\+ \-]{1,}$
or
^[0-9\(\)\/\+ \-]+$
By the way: this is a great source for learning regular expressions (and it's fun): http://regexone.com/

Obviously you need to replace Replace * with +, as + matches 1 or more character. However inside character class you don't to do all that escaping you're doing. Your regex can be simplified to:
^([0-9()\/+ -]+)$

Regexp: numbers and few special characters

I am buried in a RegExp hell and can't find way out, please help me.
I need RegExp that matches only numbers (at least 1 number) and one of this characters: <, >, = (exactly one of them one time).
My reg. expression looks like this:
^[0-9]+$|^[=<>]{1}$
And I thought it should match when my string containts one or more digits and exactly 1 special character defined by me. But it doesn't act correctly. I think there might be problem with my start/end of string definition but Im not sure about that.
Examples that should pass include:
<1
=2
22>
>1
=00123456789
Examples that should not pass this reg. exp.:
<<2
==222
<>=2

I thought it should match when my string containts one or more digits and exactly 1 special character
No, the original pattern matches a string contains one or more digits or exactly 1 special character. For example it will match 123 and = but not 123=.
Try this pattern:
^\d+[=<>]$
This will match that consists of one or more digits, followed by exactly one special character. For example, this will match 123= but not 123 or =.
If you want your special character to appear before the number, use a pattern like this instead:
^[=<>]\d+$
This will match =123 but not 123 or =.
Update
Given the examples you provided, it looks like you want to match any string which contains one or more digits and exactly one special character either at the beginning or the end. In that case use this pattern:
^([=<>]\d+|\d+[=<>])$
This will match <1, =2, 22>, and >1, but not 123 or =.

Just use [0-9]+[=<>]
Here are visualizers of your regexp and this one:
http://www.regexper.com/#%5E%5B0-9%5D%2B%24%7C%5E%5B%3D%3C%3E%5D%7B1%7D%24
http://www.regexper.com/#%5B0-9%5D%2B%5B%3D%3C%3E%5D

Your regex says:
1 or more numbers OR 1 symbol
Also, the ^ and $ means the whole string, not contains. if you want a contains, drop them. I don't know if you have a space between the number and symbol, so put in a conditional space:
[0-9]+\s?[=<>]{1}

This should work.
^[0-9]+[=<>]$
1 or more digits followed by "=<>".

Try this regex:
^\d+[=<>]$
Description

This one:
/^\d+[<>=]$|^[<>=]\d+$/

How to find all words with x (and one or more) occurrences of a letter?

I have an answer to my second question right here:
To find words with one or more occurrences of the letter 'a' in it
var re = /(\w+a)/;
With regards to the above, how does it work? For example,
var re = /(\w+a)/g;
var str = "gamma";
console.log(re.exec(str));
Output:
[ 'gamma', 'gamma', index: 0, input: 'gamma' ]
However; these are not the results I expected (although it IS what I want). That is to say, re should have found patterns such that there were any number of occurrences of \w. Then the first occurrence of the letter 'a'. Then stop.
I.e. I expected: ga.
Then mma
Next, how do I look for words with a pre-defined number of occurrences (call it x) of the letter 'a'. Such that f(x)=gamma iff x=2.

Repetition in regex is greedy. That is it takes as much as possible. You happen to get the full word, because it ends in an a. To make it ungreedy, (stop at the first one), you'd use:
\w+?a
But to actually get the full word, I'd rather use
\w*a\w*
Note the *, otherwise you'll get problems with words that have an a only as the first or last letter.
To get words with exactly 2 a you need to exclude a from the repeated letters. This is best done with a negated character class, that disallows non-word characters and as. In addition you need to make sure, that you get full words. This is easily done with the word boundary \b:
\b[^\Wa]*a[^\Wa]*a[^\Wa]*\b
For more flexibility in terms of the number of repetitions, this can be rewritten as
\b[^\Wa]*(?:a[^\Wa]*){2}\b

Regular expressions are greedy by default. That means that if they can grab more characters they will. You need to consider greed when using quantifiers, like + and *.
To make a quantifier not greedy (lazy) suffix it with a ?.
/(\w+?a)/

You can use regex for something, such as
/\b\w*a\w*\b/ - find a word with at least 1 a (can match the word 'a')
/\b\w*(?:a\w*){2}\b/ - find a word with at least 2 as
But it gets tricky when the amount is exact, because you must change the \w to include all letters except a... works by the negated class, thus
/\b[^\Wa]*(?:a[^\Wa]*){2}\b/ - matches a word with exactly 2 as
To find the syllables or so until the "a" letter, then you can use
/\b(?:[^\Wa]*a)/ - matches ga alone and in gamma
/\b(?:[^\Wa]*a){1,4}/ - matches word having 1-4 a, ending in a.
The easiest way to achieve something like this is however is to match all words /\w+/, and filter them by Javascript.

Regular expression for no more than two repeated letters/digits

I have a requirement to handle a regular expression for no more than two of the same letters/digits in an XSL file.
no space
does not support special chars
support (a-z,A-Z,0-9)
require one of a-z
require one of 0-9
no more than 2 same letter/digits (i.e., BBB will fail, BB is accepted)
What I have so far
(?:[^a-zA-Z0-9]{1,2})

This regex will do it:
^(?!.*([A-Za-z0-9])\1{2})(?=.*[a-z])(?=.*\d)[A-Za-z0-9]+$
Here's the breakdown:
(?!.*([A-Za-z0-9])\1{2}) makes sure that none of the chars repeat more than twice in a row.
(?=.*[a-z]) requires at least one lowercase letter
(?=.*\d) requires at least one digit
[A-Za-z0-9]+ allows only letters and digits
EDIT :
removed an extraneous .* from the negative lookahead

(Partial solution) For matching the same character repeated 3 or more times consecutively, try:
([a-zA-Z0-9])\1{2,}
Sample matches (tested both here and here): AABBAA (no matches), AABBBAAA (matches BBB and AAA), ABABABABABABABA (no matches), ABCCCCCCCCCC (matches CCCCCCCCCC).

Does this one work for you?
/(\b(?:([A-Za-z0-9])(?!\2{2}))+\b)/
Try it out:
var regex = new RegExp(/(\b(?:([A-Za-z0-9])(?!\2{2}))+\b)/)
var tests = ['A1D3E', 'AAAA', 'AABAA', 'abccddeeff', 'abbbc', '1234']
for(test in tests) {
console.log(tests[test] + ' - ' + Boolean(tests[test].match(regex)))
}
Will output:
A1D3E - true
AAAA - false
AABAA - true
abccddeeff - true
abbbc - false
1234 - true

You may do this in 2 regexes:
/^(?=.*[a-z])(?=.*[0-9])[a-z0-9]+$/i This will assure that there is at least 1 digit and 1 letter while accepting only letters and digits (no space or special characters)
/([a-z0-9])\1{2,}/i If this one is matched, then there is a repeated character. Which means you should throw false.
Explanation:
First regex:
^ : match begin of line
(?=.*[a-z]) : check if there is at least one letter
(?=.*[0-9]) : check if there is at least one digit
[a-z0-9]+ : if the checks were true, then match only digits/letters one or more times
$ : match end of line
i : modifier, match case insensitive
Second regex:
([a-z0-9]) : match and group a digit or a letter
\1{2,} : match group 1 two or more times
i : modifier, match case insensitive

In response to a clarification, it seems that a single regular expression isn't strictly required. In that case I suggest you use several regular expressions or functions. My guess is, performance isn't a requirement, since usually these sorts of checks are done in response to user input. User input validation can take 100ms and still appear to be instant, and you can run a lot of code in 100ms.
For example, I personally would do a check for each of your conditions in a separate test. First, check for spaces. Second, check for at least one letter. Next, check for at least one number. Finally, look for any spans of three or more repeated characters.
Your code will be much easier to understand, and it will be much easier to modify the rules later (which, experience has shown, is almost certainly going to happen).
For example:
function do_validation(string) {
return (has_no_space(string) &&
has_no_special_char(string) &&
has_alpha(string) &&
has_digit(string) &&
! (has_repeating(string)))
I personally consider the above to be orders of magnitude easier to read than one complex regular expression. Plus, adding or removing a rule doesn't make you have to reimplement a complex regular expression (and thus, be required to re-test all possible combinations).

JavaScript RegExp to match a (partial) hour

I want to allow people to enter times into a textbox in various formats. One of the formats would be either:
2h for 2 hours, or
2.5h for 2 and a half hours
I want to use a regex to recognise the pattern but it's not picking it up for some reason:
I have:
var hourRegex = /^\d{1,2}[\.\d+]?[h|H]$/;
which works for 2h, but not for 2.5h.
I thought that this regex would mean - Start at the beginning of the string, have one or two digits, then have none or one decimal points which if present must be followed by one or more digits then have a h or a H and then it must be the end of the string.
I have tried the regex tool here but no luck.

/^\d{1,2}(?:\.\d+)?h$/i; Use parentheses instead of square braces.
Start at the beginning
One or two digits
Optional: a dot followed by at least one digit
End with a h
Case insensitive
RegExp tuturial
[...] - square braces mean: anything which is within the provided range.
[^...] means: Match a character which is not within the provided range
(...) - parentheses mean: Group me. Optionally, the first characters of a group can start with:
?: - Don't reference me (me, I = group)
?= - Don't include me in the match, though I have to be here
?! - I may not show up at this point
{a,b}, {a,} means: At least a, maximum b characters. Omitting b = Infinity
+ means: at least one time, match as much as possible equivalen to {1,}
* means: match as much as possible equivalent to {0,}
+? and *? have the same effect as previously described, with one difference: Match as less as possible
Examples
[a-z] One character, any character between a, b, c, ..., z
(a-z) Match "a-z", and group it
[^0-9] Match any non-number character
See also
MDN: Regular Expressions - A more detailed guide

The trouble is here :
[\.\d+]
you can not use character classes inside brackets.
Use this instead:
(\.[0-9]+)?

You've confused your square brackets with your parenthesis. Square brackets look for a single match of any contained character, whereas parenthesis look for a match of the entire enclosed pattern.
Your issue lies in [\.\d+]? It's looking for . or 0-9 or +.
Instead you should try:
/^\d{1,2}(\.\d+)?(h|H)$/
Although that will still allow users to enter invalid numbers, such as 99.3 which is probably not the expected behavior.

Develop Reference

JavaScript is the programming language of the Web.

How to match "two or more words" - javascript

In a given string, I'm trying to verify that there are at least two words, where a word is defined as any non-numeric characters so for example // Should pass Phil D'Sousa Billy - the - Kid // Should Fail Joe 454545 354434 I thought this should work: (\b\D*?\b){2,} But it does not.

Looks good, bcherry EXCEPT for the fact that it will not match "foo bar": >>> !!"foo bar".match(r) false However, "2 or more words" ( >= 2) will also include "foo bar" as well.

Related

Regex: how to exclude empty match from somthing like (RegexA)?(RegexB)?(RegexA)? [duplicate]

Regexp: numbers and few special characters

How to find all words with x (and one or more) occurrences of a letter?

Regular expression for no more than two repeated letters/digits

JavaScript RegExp to match a (partial) hour

Categories

Resources