Regex recursive special characters - javascript

I've been doing field validation that should allow a-z characters, spaces, hyphens and periods. The regex is:
/^[a-zA-Z-. ]+$/
For the most part, the following works; however, it fails if either - or . are repeated:
String = true,
Str- in.g = true,
String-- = false,
String... = false
I know that in some cases, the - and . should be escaped but I don't believe they need to be in this case as they are within the [ ].

It returns true for all the strings, what have you tried that's returning false for matching
let reg = /^[a-zA-Z-. ]+$/
let tests = ["String", "Str- in.g", "String--", "String...", "String...---str.ing---"]
tests.forEach((item) => {
console.log(`${item} : ${reg.test(item)}`)
})

From docs at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes:
A character class. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character.
That is, if you use the hyphen not at the beginning or end, or as a range delimiter, it is in an undefined state, and depending on the regex implementation it will do one thing or another. So, unless in a range, always put the hyphen at the beginning or end, or \- escape it.
Fixed regex with tests:
[ 'String', 'Str- in.g', 'String--', 'String...', 'Not good!' ]
.forEach(str => {
let match = /^[a-zA-Z. -]+$/.test(str);
console.log(str, '==>', match);
});
Output:
String ==> true
Str- in.g ==> true
String-- ==> true
String... ==> true
Not good! ==> false

Related

Pass the regex test if there is only one space in the string

I am trying to get lines where there is a single space. I am currently doing it a different way because I still can't find a regex for it:
const line= "a b c";
/ {1}/.test(line)
Expected: false
Gets: true
I think this isn't syntactically good but am open to suggestions:
line.match(/ /g).length == 1
What should I look into?
Use the following regex test:
/^\S* (?=\S*)$/.test(line)
^\S* - starts with optional non-spaces chars
(?=\S*)$ - positive lookahead, ensures that space is followed by any number of non-space chars (if occur) to the end of the string
The regular expression /^[^\s]\s[^\s]$/ matches a string that contains only a single whitespace character.
here is the code example:
const regex = /^[^\s]*\s[^\s]*$/;
console.log(regex.test("ab c")); // true
console.log(regex.test("a b c")); // false
Matching a string without newlines containing a single space
^\S* \S*$
Explanation
^ Start of string
\S* Match optional non whitespace chars
Match a single space
\S* Match optional non whitespace chars
$ End of string
See a regex101 demo.
const regex = /^\S* \S*$/;
[
"a b c",
"",
" ",
" ",
"a ",
"a b"
].forEach(s =>
console.log(`'${s}' --> ${regex.test(s)}`)
);
The question is not clear for me.
The {1} on the first regex does not have any effect this way but it makes me think that you do not want to accept multiple consecutive spaces. The second regex is just fine if you are interested only in the lines that contain exactly one space.
What exactly do you need?
Do you want the line to contain exactly one space?
Or multiple spaces are allowed, just to not be consecutive?
The following code snippet shows solutions for both questions:
function test(input) {
console.log({
input,
exactlyOne: (input.match(/ /g) ?? []).length === 1,
noConsecutive1: / {2}/.test(input) === false,
noConsecutive2: input.includes(' ') === false,
});
}
// no consecutive spaces
test('a b c');
test('a b');
test('a');
test('a ');
test(' ');
test('');
// consecutive spaces; they all should report "exactlyOne: false, noConsecutive: false"
test('a b c');
test('a b');
test('aa ');
test(' ');
The second search can be done without regexps. I cannot tell if it runs faster; for large inputs I think that the regexp is faster but I didn't check.
if (input.includes(' ')) {
console.log('two consecutive spaces found in the input');
}
I added it to the code snippet above.
How about:
^[^\s]*\s[^\s]*$
Explanation:
Start: ^
Any number of non-spaces: [^\s]*
A single space \s
Any number of non-spaces (again): [^\s]*
End: $

match until an unescaped version of a character

Am processing a string format like [enclosed str]outer str[enclosed str]
and am trying to match all [enclosed str].
The problem is that I want any character except an unescaped version of ](that is a ] not preceded by a \) to be within the square brackets.
For instance
str = 'string[[enclosed1\\]]string[enclosed2]';
// match all [ followed by anything other ] then a ]
str.match(/\[[^\]]+]/g)
// returns ["[[enclosed1\]", "[enclosed2]"]
// ignores the `]` after `\\]`
// match word and non-word char enclosed by []
str.match(/\[[\w\W]+]/g)
// returns ["[[enclosed1\]]string[enclosed2]"]
// matches to the last ]
// making it less greedy with /\[[\w\W]+?]/g
// returns same result as /\[[^\]]+]/g
Is it possible within Javascript RegExp to achieve my desired result which is
["[[enclosed1\]]", "[enclosed2]"]
With regex in javascript not supporting a negative lookbehind this is the best I could come up with:
/(?:^|[^\\])(\[.*?[^\\]\])/g
group 1 will contain the string you want.
https://regex101.com/r/PmDcGH/3

Why does String.match( / \d*/ ) return an empty string?

Can someone help me to understand why using \d* returns an array containing an empty string, whereas using \d+ returns ["100"] (as expected). I get why the \d+ works, but don't see why exactly \d* doesn't work. Does using the * cause it to return a zero-length match, and how exactly does this work?
var str = 'one to 100';
var regex = /\d*/;
console.log(str.match(regex));
// [""]
Remember that match is looking for the first substring it can find that matches the given regex.
* means that there may be zero or more of something, so \d* means you're looking for a string that contains zero or more digits.
If your input string started with a number, that entire number would be matched.
"5 to 100".match(/\d*/); // "5"
"5 to 100".match(/\d+/); // "5"
But since the first character is a non-digit, match() figures that the beginning of the string (with no characters) matches the regex.
Since your string doesn't begin with any digits, an empty string is the first substring of your input which matches that regex.
/\d*/
means "match against 0 or more numbers starting from the beginning of the string".
When you start the beginning for your string, it immediately hits a non-number and can't go any further. Yet this is considered a successful match because "0 or more".
You can try either "1 or more" via
/\d+/
or you can tell it to match "0 or more" from the end of the string:
/\d*$/
Find all in Python
In Python, there is the findall() method which returns all parts of the string your regular expression matched against.
re.findall(r'\d*', 'one to 100')
# => ['', '', '', '', '', '', '', '100', '']
.match() in JavaScript, returns only the first match, which would be the first element in the above array.
* means 0 or more, so it's matching 0 times. You need to use + for 1 or more. By default it's greedy, so will match 100:
var str = 'one to 100';
var regex = /\d+/;
console.log(str.match(regex));
// ["100"]
As #StriplingWarrior said below, the empty string is the first match, hence it is being returned. I would like to add that you can tell what the regex is matching by noticing the 'index' field which the function match returns. For example, this is what I get when I run your code in Chrome:
["", index: 0, input: "one to 100"]

RegExp issues - character limit and whitespace ignoring

I need to validate a string that can have any number of characters, a comma, and then 2 characters. I'm having some issues. Here's what I have:
var str="ab,cdf";
var patt1=new RegExp("[A-z]{2,}[,][A-z]{2}");
if(patt1.test(str)) {
alert("true");
}
else {
alert("false");
}
I would expect this to return false, as I have the {2} limit on characters after the comma and this string has three characters. When I run the fiddle, though, it returns true. My (admittedly limited) understanding of RegExp indicates that {2,} is at least 2, and {2} is exactly two, so I'm not sure why three characters after the comma are still returning true.
I also need to be able to ignore a possible whitespace between the comma and the remaining two characters. (In other words, I want it to return true if they have 2+ characters before the comma and two after it - the two after it not including any whitespace that the user may have entered.)
So all of these should return true:
var str = "ab, cd";
var str = "abc, cd";
var str = "ab,cd";
var str = "abc,dc";
I've tried adding the \S indicator after the comma like this:
var patt1=new RegExp("[A-z]{2,}[,]\S[A-z]{2}");
But then the string returns false all the time, even when I have it set to ab, cd, which should return true.
What am I missing?
{2,} is at least 2, and {2} is exactly two, so I'm not sure why three characters after the comma are still returning true.
That's correct. What you forgot is to anchor your expression to string start and end - otherwise it returns true when it occurs somewhere in the string.
not including any whitespace: I've tried adding the \S indicator after the comma
That's the exact opposite. \s matches whitespace characters, \S matches all non-whitespace characters. Also, you probably want some optional repetition of the whitespace, instead of requiring exact one.
[A-z]
Notice that this character range also includes the characters between Z and a, namely []^_`. You will probably want [A-Za-z] instead, or use [a-z] and make your regex case-insensitive.
Combined, this is what your regex should look like (using a regular expression literal instead of the RegExp constructor with a string literal):
var patt1 = /^[a-z]{2,},\s*[a-z]{2}$/i;
You are missing ^,$.Also the range should be [a-zA-Z] not [A-z]
Your regex should be
^[a-zA-Z]{2,}[,]\s*[A-Za-z]{2}$
^ would match string from the beginning...
$ would match string till end.
Without $,^ it would match anywhere in between the string
\s* would match 0 to many space..

Match a letter unless escaped

I'm trying to match a letter (let's say a) that is not escaped with a backslash, but I want to do it without using negative lookaheads or negative lookbehinds, this is what I tried so far but it doesn't work
/([^\\][^a])*/.test('should be true a.'); // true
/([^\\][^a])*/.test('should be not true \\a.'); // true
But they both return true. What am I doing wrong?
To test for an 'a' which is not preceded by a '\' you could use
/(^|[^\\])a/.test( 'should be true a.' ); // true
/(^|[^\\])a/.test( 'should be not true \\a.' ); // false
The (^|[^\\]) matches either the start of the string ^ or a character that is not '\'.
In your regex, the [^a] matches any character that is not 'a', and ()* means match what is enclosed within the brackets zero or more times - so any string would test true, as any string could match the pattern zero times.

Categories

Resources