I have a standard expression that is not working correctly.
This expression is supposed to catch if a string has invalid characters anywhere in the string. It works perfect on RegExr.com but not in my tests.
The exp is: /[a-zA-Z0-9'.\-]/g
It is failing on : ####
but passing with : aa####
It should fail both times, what am I doing wrong?
Also, /^[a-zA-Z0-9'.\-]$/g matches nothing...
//All Boxs
$('input[type="text"]').each(function () {
var text = $(this).prop("value")
var textTest = /[a-zA-Z0-9'.\-]/g.test(text)
if (!textTest && text != "") {
allFieldsValid = false
$(this).css("background-color", "rgba(224, 0, 0, 0.29)")
alert("Invalid characters found in " + text + " \n\n Valid characters are:\n A-Z a-z 0-9 ' . -")
}
else {
$(this).css("background-color", "#FFFFFF")
$(this).prop("value", text)
}
});
edit:added code
UPDATE AFTER QUESTION RE-TAGGING
You need to use
var textTest = /^[a-zA-Z0-9'.-]+$/.test(text)
^^
Note the absence of /g modifier and the + quantifier. There are known issues when you use /g global modifier within a regex used in RegExp#test() function.
You may shorten it a bit with the help of the /i case insensitive modifier:
var textTest = /^[A-Z0-9'.-]+$/i.test(text)
Also, as I mention below, you do not have to escape the - at the end of the character class [...], but it is advisable to keep escaped if the pattern will be modified later by less regex-savvy developers.
ORIGINAL C#-RELATED DETAILS
Ok, say, you are using Regex.IsMatch(str, #"[a-zA-Z0-9'.-]"). The Regex.IsMatch searches for partial matches inside a string. So, if the input string contains an ASCII letter, digit, ', . or -, this will pass. Thus, it is logical that aa#### passes this test, and #### does not.
If you use the second one as Regex.IsMatch(str, #"^[a-zA-Z0-9'.-]$"), only 1 character strings (with an optional newline at the end) would get matched as ^ matches at the start of the string, [a-zA-Z0-9'.-] matches 1 character from the specified ranges/sets, and $ matches the end of the string (or right before the final newline).
So, you need a quantifier (+ to match 1 or more, or * to match zero or more occurrences) and the anchors \A and \z:
Regex.IsMatch(str, #"\A[a-zA-Z0-9'.-]+\z")
^^ ^^^
\A matches the start of string (always) and \z matches the very end of the string in .NET. The [a-zA-Z0-9'.-]+ will match 1+ characters that are either ASCII letters, digits, ', . or -.
Note that - at the end of the character class does not have to be escaped (but you may keep the \- if some other developers will have to modify the pattern later).
And please be careful where you test your regexps. Regexr only supports JavaScript regex syntax. To test .NET regexps, use RegexStorm.net or RegexHero.
/^[a-zA-Z0-9'.-]+$/g
In the second case your (/[a-zA-Z0-9'.-]/g) was working because it matched on the first letter, so to make it correct you need to match the whole string (use ^ and $) and also allow more letters by adding a + or * (if you allow empty string).
Try this regex it matches any char which isn't part of the allowed charset
/[^a-zA-Z0-9'.\-]+/g
Test
>>regex = /[^a-zA-Z0-9'.\-]+/g
/[^a-zA-Z0-9'.\-]+/g
>>regex.test( "####dsfdfjsakldfj")
true
>>regex.test( "dsfdfjsakldfj")
false
Related
I have a Javascript regex like this:
/^[a-zA-Z0-9 !##$%^&*()-_-~.+,/\" ]+$/
which allows following conditions:
only alphabets allowed
only numeric allowed
combination of alphabets and numeric allowed
combination of alphabets, numeric and special characters are allowed
I want to modify above regex to cover two more cases as below:
only special characters are not allowed
string should not start with special characters
so basicaly my requirement is:
string = 'abc' -> Correct
string = '123' -> Correct
string = 'abc123' ->Correct
string = 'abc123!##' ->Correct
string = 'abc!##123' -> Correct
string = '123!##abc' -> Correct
string = '!##' -> Wrong
string = '!##abc' -> Wrong
string = '!##123' -> Wrong
string = '!##abc123' -> Wrong
can someone please help me with this?
You can require at least one alphanumeric:
/^(?=[^a-zA-Z0-9]*[a-zA-Z0-9])[a-zA-Z0-9 !##$%^&*()_~.+,/\" -]+$/
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Also, I think you wanted to match a literal -, so need to repeat it, just escape, change -_- to \-_, or - better - move to the end of the character class.
The (?=[^a-zA-Z0-9]*[a-zA-Z0-9]) pattern is a positive character class that requires an ASCII letter of digit after any zero or more chars other than ASCII letters or digits, immediately to the right of the current location, here, from the start of string.
Just add [a-zA-Z0-9] to the beginning of the regex:
/^[a-zA-Z0-9][a-zA-Z0-9 \^\\\-!##$%&*()_~.+,/'"]+$/gm
Note, if within a class (ie [...]) that there are four special characters that must be escaped by prefixing a backward slash (\) to it so that it is interpreted as it's literal meaning:
// If within a class (ie [...])
^ \ - ]
// If not within a class
\ ^ $ . * + ? ( ) [ ] { } |
RegEx101
If all the special characters are allowed and you consider that underscore _ is also a special character then you can always simplify your RegEx like this :
/^[^\W_].+$/
Check here for your given examples on Regex101
I am building search and I am going to use javascript autocomplete with it. I am from Finland (finnish language) so I have to deal with some special characters like ä, ö and å
When user types text in to the search input field I try to match the text to data.
Here is simple example that is not working correctly if user types for example "ää". Same thing with "äl"
var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";
// does not work
//var searchterm = "ää";
// Works
//var searchterm = "wi";
if ( new RegExp("\\b"+searchterm, "gi").test(title) ) {
$("#result").html("Match: ("+searchterm+"): "+title);
} else {
$("#result").html("nothing found with term: "+searchterm);
}
http://jsfiddle.net/7TsxB/
So how can I get those ä,ö and å characters to work with javascript regex?
I think I should use unicode codes but how should I do that? Codes for those characters are:
[\u00C4,\u00E4,\u00C5,\u00E5,\u00D6,\u00F6]
=> äÄåÅöÖ
There appears to be a problem with Regex and the word boundary \b matching the beginning of a string with a starting character out of the normal 256 byte range.
Instead of using \b, try using (?:^|\\s)
var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";
// does not work
//var searchterm = "ää";
// Works
//var searchterm = "wi";
if ( new RegExp("(?:^|\\s)"+searchterm, "gi").test(title) ) {
$("#result").html("Match: ("+searchterm+"): "+title);
} else {
$("#result").html("nothing found with term: "+searchterm);
}
Breakdown:
(?: parenthesis () form a capture group in Regex. Parenthesis started with a question mark and colon ?: form a non-capturing group. They just group the terms together
^ the caret symbol matches the beginning of a string
| the bar is the "or" operator.
\s matches whitespace (appears as \\s in the string because we have to escape the backslash)
) closes the group
So instead of using \b, which matches word boundaries and doesn't work for unicode characters, we use a non-capturing group which matches the beginning of a string OR whitespace.
The \b character class in JavaScript RegEx is really only useful with simple ASCII encoding. \b is a shortcut code for the boundary between \w and \W sets or \w and the beginning or end of the string. These character sets only take into account ASCII "word" characters, where \w is equal to [a-zA-Z0-9_] and \W is the negation of that class.
This makes the RegEx character classes largely useless for dealing with any real language.
\s should work for what you want to do, provided that search terms are only delimited by whitespace.
this question is old, but I think I found a better solution for boundary in regular expressions with unicode letters.
Using XRegExp library you can implement a valid \b boundary expanding this
XRegExp('(?=^|$|[^\\p{L}])')
the result is a 4000+ char long, but it seems to work quite performing.
Some explanation: (?= ) is a zero-length lookahead that looks for a begin or end boundary or a non-letter unicode character. The most important think is the lookahead, because the \b doesn't capture anything: it is simply true or false.
\b is a shortcut for the transition between a letter and a non-letter character, or vice-versa.
Updating and improving on max_masseti's answer:
With the introduction of the /u modifier for RegExs in ES2018, you can now use \p{L} to represent any unicode letter, and \P{L} (notice the uppercase P) to represent anything but.
EDIT: Previous version was incomplete.
As such:
const text = 'A Fé, o Império, e as terras viciosas';
text.split(/(?<=\p{L})(?=\P{L})|(?<=\P{L})(?=\p{L})/);
// ['A', ' Fé', ',', ' o', ' Império', ',', ' e', ' as', ' terras', ' viciosas']
We're using a lookbehind (?<=...) to find a letter and a lookahead (?=...) to find a non-letter, or vice versa.
I would recommend you to use XRegExp when you have to work with a specific set of characters from Unicode, the author of this library mapped all kind of regional sets of characters making the work with different languages easier.
Despite the fact the issue seems to be 8 years old, I run into a similar problem (I had to match Cyrillic letters) not so far ago. I spend a whole day on this and could not find any appropriate answer here on StackOverflow. So, to avoid others making lots of effort, I'd like to share my solution.
Yes, \b word boundary works only with Latin letters (Word boundary: \b):
Word boundary \b doesn’t work for non-Latin alphabets
The word boundary test \b checks that there should be \w on the one side from the position and "not \w" – on the other side.
But \w means a Latin letter a-z (or a digit or an underscore), so the test doesn’t work for other characters, e.g. Cyrillic letters or hieroglyphs.
Yes, JavaScript RegExp implementation hardly supports UTF-8 encoding.
So, I tried implementing own word boundary feature with the support of non-Latin characters. To make word boundary work just with Cyrillic characters I created such regular expression:
new RegExp(`(?<![\u0400-\u04ff])${cyrillicSearchValue}(?![\u0400-\u04ff])`,'gi')
Where \u0400-\u04ff is a range of Cyrillic characters provided in the table of codes. It is not an ideal solution, however, it works properly in most cases.
To make it work in your case, you just have to pick up an appropriate range of codes from the list of Unicode characters.
To try out my example run the code snippet below.
function getMatchExpression(cyrillicSearchValue) {
return new RegExp(
`(?<![\u0400-\u04ff])${cyrillicSearchValue}(?![\u0400-\u04ff])`,
'gi',
);
}
const sentence = 'Будь-який текст кирилицею, де необхідно знайти слово з контексту';
console.log(sentence.match(getMatchExpression('текст')));
// expected output: ["текст"]
console.log(sentence.match(getMatchExpression('но')));
// expected output: null
I noticed something really weird with \b when using Unicode:
/\bo/.test("pop"); // false (obviously)
/\bä/.test("päp"); // true (what..?)
/\Bo/.test("pop"); // true
/\Bä/.test("päp"); // false (what..?)
It appears that meaning of \b and \B are reversed, but only when used with non-ASCII Unicode? There might be something deeper going on here, but I'm not sure what it is.
In any case, it seems that the word boundary is the issue, not the Unicode characters themselves. Perhaps you should just replace \b with (^|[\s\\/-_&]), as that seems to work correctly. (Make your list of symbols more comprehensive than mine, though.)
My idea is to search with codes representing the Finnish letters
new RegExp("\\b"+asciiOnly(searchterm), "gi").test(asciiOnly(title))
My original idea was to use plain encodeURI but the % sign seemed to interfere with the regexp.
http://jsfiddle.net/7TsxB/5/
I wrote a crude function using encodeURI to encode every character with code over 128 but removing its % and adding 'QQ' in the beginning. It is not the best marker but I couldn't get non alphanumeric to work.
What you are looking for is the Unicode word boundaries standard:
http://unicode.org/reports/tr29/tr29-9.html#Word_Boundaries
There is a JavaScript implementation here (unciodejs.wordbreak.js)
https://github.com/wikimedia/unicodejs
I had a similar problem, where I was trying to replace all of a particular unicode word with a different unicode word, and I cannot use lookbehind because it's not supported in the JS engine this code will be used in. I ultimately resolved it like this:
const needle = "КАРТОПЛЯ";
const replace = "БАРАБОЛЯ";
const regex = new RegExp(
String.raw`(^|[^\n\p{L}])`
+ needle
+ String.raw`(?=$|\P{L})`,
"gimu",
);
const result = (
'КАРТОПЛЯ сдффКАРТОПЛЯдадф КАРТОПЛЯ КАРТОПЛЯ КАРТОПЛЯ??? !!!КАРТОПЛЯ ;!;!КАРТОПЛЯ/#?#?'
+ '\n\nКАРТОПЛЯ КАРТОПЛЯ - - -КАРТОПЛЯ--'
)
.replace(regex, function (match, ...args) {
return args[0] + replace;
});
console.log(result)
output:
БАРАБОЛЯ сдффКАРТОПЛЯдадф БАРАБОЛЯ БАРАБОЛЯ БАРАБОЛЯ??? !!!БАРАБОЛЯ ;!;!БАРАБОЛЯ/#?#?
БАРАБОЛЯ БАРАБОЛЯ - - -БАРАБОЛЯ--
Breaking it apart
The first regex: (^|[^\n\p{L}])
^| = Start of the line or
[^\n\p{L}] = Any character which is not a letter or a newline
The second regex: (?=$|\P{L})
?= = Lookahead
$| = End of the line or
\P{L} = Any character which is not a letter
The first regex captures the group and is then used via args[0] to put it back into the string during replacement, thereby avoiding a lookbehind. The second regex utilized lookahead.
Note that the second one MUST be a lookahead because if we capture it then overlapping regex matches will not trigger (e.g. КАРТОПЛЯ КАРТОПЛЯ КАРТОПЛЯ would only match on the 1st and 3rd ones).
Trying to find text "myTest":
/(?<![\p{L}\p{N}_])myTest(?![\p{L}\p{N}_])/gu
Similar to NetBeans or Notepad++ form. Trying to find the expression without any letter or number or underscore (like \w characters of word boundary \b) in any unicode characters of letter and number before or after the expression.
I have had a similar problem, but I had to replace an array of terms. All solutions, which I have found did not worked, if two terms were in the text next to each other (because their boundaries overlaped). So I had to use a little modified approach:
var text = "Ještě. že; \"už\" à. Fürs, 'anlässlich' že že že.";
var terms = ["à","anlässlich","Fürs","už","Ještě", "že"];
var replaced = [];
var order = 0;
for (i = 0; i < terms.length; i++) {
terms[i] = "(^\|[ \n\r\t.,;'\"\+!?-])(" + terms[i] + ")([ \n\r\t.,;'\"\+!?-]+\|$)";
}
var re = new RegExp(terms.join("|"), "");
while (true) {
var replacedString = "";
text = text.replace(re, function replacer(match){
var beginning = match.match("^[ \n\r\t.,;'\"\+!?-]+");
if (beginning == null) beginning = "";
var ending = match.match("[ \n\r\t.,;'\"\+!?-]+$");
if (ending == null) ending = "";
replacedString = match.replace(beginning,"");
replacedString = replacedString.replace(ending,"");
replaced.push(replacedString);
return beginning+"{{"+order+"}}"+ending;
});
if (replacedString == "") break;
order += 1;
}
See the code in a fiddle: http://jsfiddle.net/antoninslejska/bvbLpdos/1/
The regular expression is inspired by: http://breakthebit.org/post/3446894238/word-boundaries-in-javascripts-regular
I can't say, that I find the solution elegant...
The correct answer to the question is given by andrefs.
I will only rewrite it more clearly, after putting all required things together.
For ASCII text, you can use \b for matching a word boundary both at the start and the end of a pattern. When using Unicode text, you need to use 2 different patterns for doing the same:
Use (?<=^|\P{L}) for matching the start or a word boundary before the main pattern.
Use (?=\P{L}|$) for matching the end or a word boundary after the main pattern.
Additionally, use (?i) in the beginning of everything, to make all those matchings case-insensitive.
So the resulting answer is: (?i)(?<=^|\P{L})xxx(?=\P{L}|$), where xxx is your main pattern. This would be the equivalent of (?i)\bxxx\b for ASCII text.
For your code to work, you now need to do the following:
Assign to your variable "searchterm", the pattern or words you want to find.
Escape the variable's contents. For example, replace '\' with '\\' and also do the same for any reserved special character of regex, like '\^', '\$', '\/', etc. Check here for a question on how to do this.
Insert the variable's contents to the pattern above, in the place of "xxx", by simply using the string.replace() method.
bad but working:
var text = " аб аб АБ абвг ";
var ttt = "(аб)"
var p = "(^|$|[^A-Za-zА-Я-а-я0-9()])"; // add other word boundary symbols here
var exp = new RegExp(p+ttt+p,"gi");
text = text.replace(exp, "$1($2)$3").replace(exp, "$1($2)$3");
const t1 = performance.now();
console.log(text);
result (without qutes):
" (аб) (аб) (АБ) абвг "
I struggled hard on this. Working with French accented characters, and I managed to find this solution :
const myString = "MyString";
const regex = new RegExp(
"(?:[^À-ú]|^)\\b(" + myString + ")\\b(?:[^À-ú]|$)",
"ig"
);
What id does :
It keeps checking word-boundaries with \b before and after "MyString".
In addition to that, (?:[^À-ú]|^) and (?:[^À-ú]|$) will check if MyString is not surrounded by any accented characters
It will not work with cyrillic but it may be possible to find the range of cirillic charactes and edit [^À-ú] in consequence.
Warning, it captures only the group (MyString) but the total match contains previous and next characters
See example : https://regex101.com/r/5P0ZIe/1
Match examples :
MyString
match : "MyString"
group 1 : "MyString"
Lorem ipsum. MyString dolor sit amet
match : " MyString "
group 1 : "MyString"
(MyString)
match : "(MyString)"
group 1 : "MyString"
BetweenCharactersMyStringIsNotFound
match : Nothing
group 1 : Nothing
éMyStringé
match : Nothing
group 1 : Nothing
ùMyString
match : Nothing
group 1 : Nothing
MyStringÖ
match : Nothing
group 1 : Nothing
I want to replace all line breaks but only if they're not preceded by these two characters {] (both, not one of them) using JavaScript. The following expression seems to do the job but it breaks other regex results so something must be wrong:
/[^\{\]]\n/g
What am I doing wrong?
Do you need to be able to strip out \n, \r\n, or both?
This should do the job:
/(^|^.|[^{].|.[^\]])\r?\n/gm
And would require that you place $1 at the beginning of your replacement string.
To answer your question about why /[^\{\]]\n/ is wrong, this regex equates to: "match any character that is neither { nor ]", followed by \n, so this incorrectly fail to match the following:
here's a square]\n
see the following{\n
You're also missing the g flag at the end, but you may have noticed that.
When you're using [^\{\]] you're using a character range: this stands for "any character which is not \{ or \]. Meaning the match will fail on {\n or }\n.
If you want to negate a pattern longer than one character you need a negative look-ahead:
/^(?!.*{]\n)([^\n]*)\n/mg
^(?! # from the beginning of the line (thanks to the m flag)
.*{]\n # negative lookahead condition: the line doesn't end with {]\n
)
([^\n]*) # select, in capturing group 1, everything up to the line break
\n
And replace it with
$1 + replacement_for_\n
What we do is check line by line that our line doesn't hold the unwanted pattern.
If it doesn't, we select everything up to the ending \n in capturing group 1, and we replace the whole line with that everything, followed by what you want to replace \n with.
Demo: http://regex101.com/r/nM2xE1
Look behind is not supported, you could emulate it this way
stringWhereToReplaceNewlines.replace(/(.{0,2})\n/g, function(_, behind) {
return (behind || "") + ((behind === '{]') ? "\n" : "NEWLINE_REPLACE")
})
The callback is called for every "\n" with the 2 preceding characters as the second parameter. The callback must return the string replacing the "\n" and the 2 characters before. If the 2 preceding characters are "{]" then the new line should not be replaced so we return exactly the same string matched, otherwise we return the 2 preceding characters (possibly empty) and the thing that should replace the newline
My solution would be this:
([^{].|.[^\]])\n
Your replacement string should be $1<replacement>
Since JavaScript doesn't support lookbehind, we have to make do with lookahead. This is how the regex works:
Anything but { and then anything - [^{].
Or anything and then anything but ] - .[^\]]
Put simply, [^{].|.[^\]] matches everything that .. matches except for {]
Finally a \n
The two chars before the \n are captured, so you can reinsert them into the replacement string using $1.
I just finished all the "Easy" CoderByte challenges and am now going back to see if there are more efficient ways to answer the questions. I am trying to come up with a regular expression for "SimpleSymbol".
(Have the function SimpleSymbols(str) take the str parameter being passed and determine if it is an acceptable sequence by either
returning the string true or false. The str parameter will be composed
of + and = symbols with several letters between them (ie.
++d+===+c++==a) and for the string to be true each letter must be surrounded by a + symbol. So the string to the left would be false.
The string will not be empty and will have at least one letter.)
I originally answered the question by traversing through the whole string, when a letter is found, testing on either side to see if a "+" exists. I thought it would be easier if I could just test the string with a regular expression like,
str.match(/\+[a-zA-Z]\+/g)
This doesn't quite work. I am trying to see if the match will ONLY return true if the condition is met on ALL of the characters in the string. For instance the method will return true on the string, "++d+===+c++==a", due to '+d+' and '+c+'. However, based on the original question it should return false because of the 'a' and no surrounding '+'s.
Any ideas?
EDIT: #Marc brought up a very good point. The easiest way to do this is to search for violations using
[^+][a-zA-Z]|[a-zA-Z][^+]
or something like it. This will find all violations of the rule -- times when a letter appears next to something other than a +. If this matches, then you can return false, knowing that there exist violations. Else, return true.
Original answer:
Here's a regex -- I explain it below. Remember that you have to escape +, because it is a special character!
^([^a-zA-Z+]|(\+[a-zA-Z]\+)|[+])*$
^ // start of the string
[^a-zA-Z+] // any character except a letter or a + (1)
| // or
(\+[a-zA-Z]\+) // + (letter) + (2)
| //or
[+] // plus (3)
)*$ // repeat that pattern 0 or more times, end
The logic behind this is: skip all characters that aren't relevant in your string. (1)
If we have a + (letter) +, that's fine. capture that. (2)
if we have a + all by itself, that's fine too. (3)
A letter without surrounding + will fail.
The problem is that + is a special character in regular expressions. It's a quantifier that means 'one or more of the previous item'. You can represent a literal + character by escaping it, like this:
str.match(/\+[a-zA-Z]\+/g)
However, this will return true if any set of characters is found in the string matching that pattern. If you want to ensure that there are no other characters in the string which do not match that pattern you could do something like this:
str.match(/^([=+]*\+[a-zA-Z](?=\+))+[=+]*$/)
This will match, any number of = or + characters, followed by a literal + followed by a Latin letter, followed by a literal +, all of which may be repeated one or more times, followed by any number of = or + characters. The ^ at the beginning and the $ at the end matches the start and end of the input string, respectively. This ensures that no other characters are allowed. The (?=\+) is a look-ahead assertion, meaning that the next character must be a literal +, but is not considered part of the group, this means it can be rematched as the leading + in the next match (e.g. +a+b+).
Interesting problem!
String requirements:
String must be composed of only +, = and [A-Za-z] alpha chars.
Each and every alpha char must be preceded by a + and followed by a +.
There must be at least one alpha char.
Valid strings:
"+A+"
"++A++"
"=+A+="
"+A+A+"
"+A++A+"
Invalid strings:
+=+= # Must have at least one alpha.
+A+&+A+ # Non-valid char.
"+A" # Alpha not followed by +.
"A+" # Alpha not preceded by +.
Solution:
^[+=]*(?:\+[A-Z](?=\+)[+=]*)+$ (with i ignorecase option set)
Here's how I'd do it: (First in a tested python script with fully commented regex)
import re
def isValidSpecial(text):
if re.search(r"""
# Validate special exercise problem string format.
^ # Anchor to start of string.
[+=]* # Zero or more non-alphas {normal*).
(?: # Begin {(special normal*)+} construct.
\+[A-Z] # Alpha but only if preceded by +
(?=\+) # and followed by + {special} .
[+=]* # More non-alphas {normal*).
)+ # At least one alpha required.
$ # Anchor to end of string.
""", text, re.IGNORECASE | re.VERBOSE):
return "true"
else:
return "false"
print(isValidSpecial("+A+A+"))
Now here is the same solution in JavaScript syntax:
function isValidSpecial(text) {
var re = /^[+=]*(?:\+[A-Z](?=\+)[+=]*)+$/i;
return (re.test(text)) ? "true" : "false";
}
/^[=+]*\+[a-z](?=\+)(?:\+[a-z](?=\+)|[+=]+)*$/i.test(str)
pattern details:
^ # anchor, start of the string
[=+]* # 0 or more = and +
\+ [a-z] (?=\+) # imposed surrounded letter (at least one letter condition)
(?: # non capturing group with:
\+[a-z](?=\+) # surrounded letter
| # OR
[=+]+ # + or = characters (one or more)
)* # repeat the group 0 or more times
$ # anchor, end of the string
To allow consecutive letters like +a+a+a+a+a+, I use a lookahead assertion to check there is a + symbol after the letter without match it. (Thanks to ridgrunner for his comment)
Example:
var str= Array('+==+u+==+a', '++=++a+', '+=+=', '+a+-', '+a+a+');
for (var i=0; i<5; i++) {
console.log(/^[=+]*\+[a-z](?=\+)(?:\+[a-z](?=\+)|[+=]+)*$/i.test(str[i]));
}
I need to validate a string that can have any number of characters, a comma, and then 2 characters. I'm having some issues. Here's what I have:
var str="ab,cdf";
var patt1=new RegExp("[A-z]{2,}[,][A-z]{2}");
if(patt1.test(str)) {
alert("true");
}
else {
alert("false");
}
I would expect this to return false, as I have the {2} limit on characters after the comma and this string has three characters. When I run the fiddle, though, it returns true. My (admittedly limited) understanding of RegExp indicates that {2,} is at least 2, and {2} is exactly two, so I'm not sure why three characters after the comma are still returning true.
I also need to be able to ignore a possible whitespace between the comma and the remaining two characters. (In other words, I want it to return true if they have 2+ characters before the comma and two after it - the two after it not including any whitespace that the user may have entered.)
So all of these should return true:
var str = "ab, cd";
var str = "abc, cd";
var str = "ab,cd";
var str = "abc,dc";
I've tried adding the \S indicator after the comma like this:
var patt1=new RegExp("[A-z]{2,}[,]\S[A-z]{2}");
But then the string returns false all the time, even when I have it set to ab, cd, which should return true.
What am I missing?
{2,} is at least 2, and {2} is exactly two, so I'm not sure why three characters after the comma are still returning true.
That's correct. What you forgot is to anchor your expression to string start and end - otherwise it returns true when it occurs somewhere in the string.
not including any whitespace: I've tried adding the \S indicator after the comma
That's the exact opposite. \s matches whitespace characters, \S matches all non-whitespace characters. Also, you probably want some optional repetition of the whitespace, instead of requiring exact one.
[A-z]
Notice that this character range also includes the characters between Z and a, namely []^_`. You will probably want [A-Za-z] instead, or use [a-z] and make your regex case-insensitive.
Combined, this is what your regex should look like (using a regular expression literal instead of the RegExp constructor with a string literal):
var patt1 = /^[a-z]{2,},\s*[a-z]{2}$/i;
You are missing ^,$.Also the range should be [a-zA-Z] not [A-z]
Your regex should be
^[a-zA-Z]{2,}[,]\s*[A-Za-z]{2}$
^ would match string from the beginning...
$ would match string till end.
Without $,^ it would match anywhere in between the string
\s* would match 0 to many space..