RegExp issues - character limit and whitespace ignoring - javascript

I need to validate a string that can have any number of characters, a comma, and then 2 characters. I'm having some issues. Here's what I have:
var str="ab,cdf";
var patt1=new RegExp("[A-z]{2,}[,][A-z]{2}");
if(patt1.test(str)) {
alert("true");
}
else {
alert("false");
}
I would expect this to return false, as I have the {2} limit on characters after the comma and this string has three characters. When I run the fiddle, though, it returns true. My (admittedly limited) understanding of RegExp indicates that {2,} is at least 2, and {2} is exactly two, so I'm not sure why three characters after the comma are still returning true.
I also need to be able to ignore a possible whitespace between the comma and the remaining two characters. (In other words, I want it to return true if they have 2+ characters before the comma and two after it - the two after it not including any whitespace that the user may have entered.)
So all of these should return true:
var str = "ab, cd";
var str = "abc, cd";
var str = "ab,cd";
var str = "abc,dc";
I've tried adding the \S indicator after the comma like this:
var patt1=new RegExp("[A-z]{2,}[,]\S[A-z]{2}");
But then the string returns false all the time, even when I have it set to ab, cd, which should return true.
What am I missing?

{2,} is at least 2, and {2} is exactly two, so I'm not sure why three characters after the comma are still returning true.
That's correct. What you forgot is to anchor your expression to string start and end - otherwise it returns true when it occurs somewhere in the string.
not including any whitespace: I've tried adding the \S indicator after the comma
That's the exact opposite. \s matches whitespace characters, \S matches all non-whitespace characters. Also, you probably want some optional repetition of the whitespace, instead of requiring exact one.
[A-z]
Notice that this character range also includes the characters between Z and a, namely []^_`. You will probably want [A-Za-z] instead, or use [a-z] and make your regex case-insensitive.
Combined, this is what your regex should look like (using a regular expression literal instead of the RegExp constructor with a string literal):
var patt1 = /^[a-z]{2,},\s*[a-z]{2}$/i;

You are missing ^,$.Also the range should be [a-zA-Z] not [A-z]
Your regex should be
^[a-zA-Z]{2,}[,]\s*[A-Za-z]{2}$
^ would match string from the beginning...
$ would match string till end.
Without $,^ it would match anywhere in between the string
\s* would match 0 to many space..

Related

How to convert a camelcased string to sentence cased without excluding any special characters?

How to convert a camelcased string to sentence cased without excluding any special characters?
Suggest a regex for converting camelcased string with special characters and numbers to sentence case?:
const string = `includes:SummaryFromDetailHistory1990-AsAbstract`
Expected outcome:
Includes : Summary From Detail History 1990 - As Abstract
Currently I'm using lodash startCase to convert camelCased to sentenceCase. But the issue with this approach is that it is removing special characters like brackets, numbers, parenthesis, hyphens, colons, etc... (most of the special characters)
So the idea is to convert camelcased strings to sentence cased while preserve the string identity
For example:
const anotherString = `thisIsA100CharactersLong:SampleStringContaining-SpecialChar(s)10&20*`
const expectedReturn = `This Is A 100 Characters : Long Sample String Containing - Special Char(s) 10 & 20 *`
Is that possible with regex?
You'll have to deal with all the cases yourself:
[a-z](?=[A-Z]): lowercase followed by uppercase
[a-zA-Z](?=[0-9]): letter followed by digit
[0-9](?=[a-zA-Z]): digit followed by letter
[a-zA-Z0-9](?=[^a-zA-Z0-9]): letter or digit followed by neither letter nor digit (\w and \W could be used, but they cover _ too, so up to you)
[^a-zA-Z0-9](?=[a-zA-Z0-9]): not letter nor digit following by either letter or digit
etc.
Then, you can or them together:
([a-z](?=[A-Z])|[a-zA-Z](?=[0-9])|[0-9](?=[a-zA-Z])|[a-zA-Z0-9](?=[^a-zA-Z0-9])|[^a-zA-Z0-9](?=[a-zA-Z0-9]))
And replace by:
$1
(see the space after $1).
See https://regex101.com/r/4AVbAs/1 for instance.
You will hit edge cases though, e.g. Char(s), so you'll need special rules for the parens for instance (see the following section about lookbehinds that can help for that). A bit of a tough job, quite error prone too and hardly maintainable I'm afraid.
If lookbehinds were allowed, you would not need to capture the first char in each group, but wrap the left patterns in (?<=...) and replace by a simple space directly:
(?<=[a-z])(?=[A-Z]): preceded by lowercase, followed by uppercase
(?<=[a-zA-Z])(?=[0-9]): preceded by letter, followed by digit
(?<=[0-9])(?=[a-zA-Z]): preceded by digit, followed by letter
(?<=[a-zA-Z0-9])(?=[^a-zA-Z0-9])(?!(?:\(s)?\)): preceded by letter or digit, followed by not letter nor digit, as well as not followed by (s) nor )
(?<=[^a-zA-Z0-9])(?<!\()(?=[a-zA-Z0-9]): preceded by not letter nor digit, as well as not preceded by (, followed by letter or digit
or-ed together:
(?<=[a-z])(?=[A-Z])|(?<=[a-zA-Z])(?=[0-9])|(?<=[0-9])(?=[a-zA-Z])|(?<=[a-zA-Z0-9])(?=[^a-zA-Z0-9])(?!(?:\(s)?\))|(?<=[^a-zA-Z0-9])(?<!\()(?=[a-zA-Z0-9])
Replace with an empty space, see https://regex101.com/r/DB91DE/1.
The wanted result doesn't seem to be regular, some special characters are supposed to be preceeded with a space and some are not. Treating the parenthesis like you want is a bit tricky. You can use function to handle the parenthesis, like this:
let parenth = 0;
const str = `thisIsA100CharactersLong:SampleStringContaining-SpecialChar(s)10&20*`,
spaced = str.replace(/[A-Z]|\d+|\W/g, (m) => {
if (m === '(') {
parenth = 1;
return m;
}
if (parenth || m === ')') {
parenth = 0;
return m;
}
return ` ${m}`;
});
console.log(spaced);
If the data can contain other brackets, instead of just checking parentheses, use a RexExp to test any opening bracket: if (/[({[]/.test(m)) ..., and test for closing brackets: if (/[)}\]]/.test(m)) ....
You can test the snippet with different data at jsFiddle.
This is impossible. You cannot do this in regex. You will have to consider exceptions...

regular expression, not reading entire string

I have a standard expression that is not working correctly.
This expression is supposed to catch if a string has invalid characters anywhere in the string. It works perfect on RegExr.com but not in my tests.
The exp is: /[a-zA-Z0-9'.\-]/g
It is failing on : ####
but passing with : aa####
It should fail both times, what am I doing wrong?
Also, /^[a-zA-Z0-9'.\-]$/g matches nothing...
//All Boxs
$('input[type="text"]').each(function () {
var text = $(this).prop("value")
var textTest = /[a-zA-Z0-9'.\-]/g.test(text)
if (!textTest && text != "") {
allFieldsValid = false
$(this).css("background-color", "rgba(224, 0, 0, 0.29)")
alert("Invalid characters found in " + text + " \n\n Valid characters are:\n A-Z a-z 0-9 ' . -")
}
else {
$(this).css("background-color", "#FFFFFF")
$(this).prop("value", text)
}
});
edit:added code
UPDATE AFTER QUESTION RE-TAGGING
You need to use
var textTest = /^[a-zA-Z0-9'.-]+$/.test(text)
^^
Note the absence of /g modifier and the + quantifier. There are known issues when you use /g global modifier within a regex used in RegExp#test() function.
You may shorten it a bit with the help of the /i case insensitive modifier:
var textTest = /^[A-Z0-9'.-]+$/i.test(text)
Also, as I mention below, you do not have to escape the - at the end of the character class [...], but it is advisable to keep escaped if the pattern will be modified later by less regex-savvy developers.
ORIGINAL C#-RELATED DETAILS
Ok, say, you are using Regex.IsMatch(str, #"[a-zA-Z0-9'.-]"). The Regex.IsMatch searches for partial matches inside a string. So, if the input string contains an ASCII letter, digit, ', . or -, this will pass. Thus, it is logical that aa#### passes this test, and #### does not.
If you use the second one as Regex.IsMatch(str, #"^[a-zA-Z0-9'.-]$"), only 1 character strings (with an optional newline at the end) would get matched as ^ matches at the start of the string, [a-zA-Z0-9'.-] matches 1 character from the specified ranges/sets, and $ matches the end of the string (or right before the final newline).
So, you need a quantifier (+ to match 1 or more, or * to match zero or more occurrences) and the anchors \A and \z:
Regex.IsMatch(str, #"\A[a-zA-Z0-9'.-]+\z")
^^ ^^^
\A matches the start of string (always) and \z matches the very end of the string in .NET. The [a-zA-Z0-9'.-]+ will match 1+ characters that are either ASCII letters, digits, ', . or -.
Note that - at the end of the character class does not have to be escaped (but you may keep the \- if some other developers will have to modify the pattern later).
And please be careful where you test your regexps. Regexr only supports JavaScript regex syntax. To test .NET regexps, use RegexStorm.net or RegexHero.
/^[a-zA-Z0-9'.-]+$/g
In the second case your (/[a-zA-Z0-9'.-]/g) was working because it matched on the first letter, so to make it correct you need to match the whole string (use ^ and $) and also allow more letters by adding a + or * (if you allow empty string).
Try this regex it matches any char which isn't part of the allowed charset
/[^a-zA-Z0-9'.\-]+/g
Test
>>regex = /[^a-zA-Z0-9'.\-]+/g
/[^a-zA-Z0-9'.\-]+/g
>>regex.test( "####dsfdfjsakldfj")
true
>>regex.test( "dsfdfjsakldfj")
false

How to extract the last word in a string with a JavaScript regex?

I need is the last match. In the case below the word test without the $ signs or any other special character:
Test String:
$this$ $is$ $a$ $test$
Regex:
\b(\w+)\b
The $ represents the end of the string, so...
\b(\w+)$
However, your test string seems to have dollar sign delimiters, so if those are always there, then you can use that instead of \b.
\$(\w+)\$$
var s = "$this$ $is$ $a$ $test$";
document.body.textContent = /\$(\w+)\$$/.exec(s)[1];
If there could be trailing spaces, then add \s* before the end.
\$(\w+)\$\s*$
And finally, if there could be other non-word stuff at the end, then use \W* instead.
\b(\w+)\W*$
In some cases a word may be proceeded by non-word characters, for example, take the following sentence:
Marvelous Marvin Hagler was a very talented boxer!
If we want to match the word boxer all previous answers will not suffice due the fact we have an exclamation mark character proceeding the word. In order for us to ensure a successful capture the following expression will suffice and in addition take into account extraneous whitespace, newlines and any non-word character.
[a-zA-Z]+?(?=\s*?[^\w]*?$)
https://regex101.com/r/D3bRHW/1
We are informing upon the following:
We are looking for letters only, either uppercase or lowercase.
We will expand only as necessary.
We leverage a positive lookahead.
We exclude any word boundary.
We expand that exclusion,
We assert end of line.
The benefit here are that we do not need to assert any flags or word boundaries, it will take into account non-word characters and we do not need to reach for negate.
var input = "$this$ $is$ $a$ $test$";
If you use var result = input.match("\b(\w+)\b") an array of all the matches will be returned next you can get it by using pop() on the result or by doing: result[result.length]
Your regex will find a word, and since regexes operate left to right it will find the first word.
A \w+ matches as many consecutive alphanumeric character as it can, but it must match at least 1.
A \b matches an alphanumeric character next to a non-alphanumeric character. In your case this matches the '$' characters.
What you need is to anchor your regex to the end of the input which is denoted in a regex by the $ character.
To support an input that may have more than just a '$' character at the end of the line, spaces or a period for instance, you can use \W+ which matches as many non-alphanumeric characters as it can:
\$(\w+)\W+$
Avoid regex - use .split and .pop the result. Use .replace to remove the special characters:
var match = str.split(' ').pop().replace(/[^\w\s]/gi, '');
DEMO

regular expression incorrectly matching % and $

I have a regular expression in JavaScript to allow numeric and (,.+() -) character in phone field
my regex is [0-9-,.+() ]
It works for numeric as well as above six characters but it also allows characters like % and $ which are not in above list.
Even though you don't have to, I always make it a point to escape metacharacters (easier to read and less pain):
[0-9\-,\.+\(\) ]
But this won't work like you expect it to because it will only match one valid character while allowing other invalid ones in the string. I imagine you want to match the entire string with at least one valid character:
^[0-9\-,\.\+\(\) ]+$
Your original regex is not actually matching %. What it is doing is matching valid characters, but the problem is that it only matches one of them. So if you had the string 435%, it matches the 4, and so the regex reports that it has a match.
If you try to match it against just one invalid character, it won't match. So your original regex doesn't match the string %:
> /[0-9\-,\.\+\(\) ]/.test("%")
false
> /[0-9\-,\.\+\(\) ]/.test("44%5")
true
> "444%6".match(/[0-9\-,\.+\(\) ]/)
["4"] //notice that the 4 was matched.
Going back to the point about escaping, I find that it is easier to escape it rather than worrying about the different rules where specific metacharacters are valid in a character class. For example, - is only valid in the following cases:
When used in an actual character class with proper-order such as [a-z] (but not [z-a])
When used as the first or last character, or by itself, so [-a], [a-], or [-].
When used after a range like [0-9-,] or [a-d-j] (but keep in mind that [9-,] is invalid and [a-d-j] does not match the letters e through f).
For these reasons, I escape metacharacters to make it clear that I want to match the actual character itself and to remove ambiguities.
You just need to anchor your regex:
^[0-9-,.+() ]+$
In character class special char doesn't need to be escaped, except ] and -.
But, these char are not escaped when:
] is alone in the char class []]
- is at the begining [-abc] or at the end [abc-] of the char class or after the last end range [a-c-x]
Escape characters with special meaning in your RegExp. If you're not sure and it isn't an alphabet character, it usually doesn't hurt to escape it, too.
If the whole string must match, include the start ^ and end $ of the string in your RegExp, too.
/^[\d\-,\.\+\(\) ]*$/

Javascript RegExp Tokenizing

Given a string, I want to use a regular expression to tokenize it. The pattern is as follows: any character (including new line, etc.), until "<", followed by a space zero or more times, followed by "%".
I tried
var patt = /(.)*<(\s)*%/;
but it does not yield the desired result. I would appreciate an explanation along with the pattern.
Use this:
"some string".split(/.*<\s*%/);
/^[\s\S]*?< *%/
should do what you want.
^ causes it to match at the beginning of the string.
[\s\S] matches any character. Literally, it means any space or non-space character, and works around the fact that . does not match newlines.
*? matches zero or more but the fewest necessary for the rest of the pattern to match.
< matches a literal '<'
* (note the space) matches zero or more spaces. This is more readable if written as [ ]*.
% finally matches that character.
If you want to match the entire string (i.e. the % should be the last character in the string), then you can put a $ before the last /.

Categories

Resources