Regex keep greedy even when using lazy quantifier - javascript

Here is my regex ^((\{.+:.+\})|([^{:}]))+?$.
Here is what i want:
Valid case: {test:test1} {test2:test3} test4 test5.
Invalid case: {test:}, {test1: test2, test1: test3}.
It's mean whenever my string have one of this three character: '{' ':', '}' it must also have 2 remaning character.
My regex is working well when my string not end with } character. I guess it's because of greedy quantifier. But i already put ? character after + quantifier it's still not working. What am I doing wrong?

You may use
^(?:\{[^{}:]*:[^:{}]*}|[^{:}])+$
See the regex demo.
Details
^ - start of string
(?:\{[^{}:]*:[^:{}]*}|[^{:}])+ - a non-capturing group matching 1 or more occurrences of
\{[^{}:]*:[^:{}]*} - a {, then any 0+ chars other than {, } and :, then :, then any 0+ chars other than {, } and :, and a }
| - or
[^{:}] - any char but {, } and :
$ - end of string.

Related

Regex for email (shouldn't match string after ; )

I have the following regex: /(\s?[^\s,]+#[^\s,]+.[^\s,]+\s?;)*(\s?[^\s,]+#[^\s,]+.[^\s,]+)/g
Could you tell me why it matches this string: "hNw6B#90.com;tesr"
and doesn't match this one: "hNw6B#90.com; test" ?
It shouldn't match the first string. However if there is a valid email after the ; like test#abv.bg, it should be matched.
I will be very grateful if you could help me out.
You can use
^[^#\s;]+#[^\s;]+\.[^\s;]+(?:\s*;\s*[^#\s;]+#[^\s;]+\.[^\s;]+)*;?$
See the regex demo. Details:
^ - start of string
[^#\s;]+ - zero or more chars other than #, whitespace and ;
# - a # char
[^\s;]+ - zero or more chars other than whitespace and ;
\. - a dot
[^\s;]+ - zero or more chars other than whitespace and ;
(?:\s*;\s*[^#\s;]+#[^\s;]+\.[^\s;]+)* - zero or more repetition of a ; enclosed with zero or more whitespaces, and then the same pattern as above
;? - an optional ;
$ - end of string.

regular expression to match name with only one spaces

I have a string condition in js where i have to check whether name entered in text box contains with one space only.
pattern: name.status === 'full_name' ? /^[a-zA-Z.+-.']+\s+[a-zA-Z.+-. ']+$/ : /^[a-zA-Z.+-. ']+$/
But the above regex matches names ending with 2 spaces also.
I need to match it such that the entered name should accept only one space for a name string. So the name will have only one space in between or at the end.
Two observations: 1) \s+ in your pattern matches 1 or more whitespaces, and 2) [+-.] matches 4 chars: +, ,, - and ., it is thus best to put the hyphen at the end of the character class.
You may use
/^[a-zA-Z.+'-]+(?:\s[a-zA-Z.+'-]+)*\s?$/
See the regex demo
Details
^ - start of string
[a-zA-Z.+'-]+ - 1 or more letters, ., +, ' or -
(?:\s[a-zA-Z.+'-]+)* - zero or more sequences of:
\s - a single whitespace
[a-zA-Z.+'-]+ - 1 or more letters, ., +, ' or - chars
\s? - an optional whitespace
$ - end of string.
Note: if the "names" cannot contain . and +, just remove these symbols from your character classes.
/^\S+\s\S+$/
try this
Some explanations:
^ - start of string
\s - single whitespace
\S - everything except whitespace
"+"- quantifier "one or more"
$ - end of string
you could also use word boundaries...
function isFullName(s) {
return /^\b\w+\b \b\w+\b$/.test(s);
}
['Giuseppe', 'Mandato', 'Giuseppe Mandato']
.forEach(item => console.log(`isFullName ${item} ? ${isFullName(item)}`))

Regex - I want my string to end with 2 special character

I've been trying to make a regex that ends with 2 special characters, but I couldnt find solution. Here is what i tried, but it seems like it is not working.
/.[!##$%^&*]{2}+$/;
Thanks in advance.
Try this regex:
^.*[!##$%^&*]{2}$
Demo
const regex = /^.*[!##$%^&*]{2}$/;
const str = `abc##\$`;
let m;
if(str.match(regex)) {
console.log("matched");
}
else
console.log("not matched");
The /.[!##$%^&*]{2}+$/ regex matches
. - any character but a line break char
[!##$%^&*]{2}+ - in PCRE/Boost/Java/Oniguruma and other regex engines supporting possessive quantifiers, it matches exactly 2 cars from the defined set, but in JS, it causes a "Nothing to repeat" error
$ - end of string.
To match any string ending with 2 occurrences of the chars from your defined set, you need to remove the . and + and use
console.log(/[!##$%^&*]{2}$/.test("##"))
Or, if these 2 chars cannot be preceded by a 3rd one:
console.log(/(?:^|[^!##$%^&*])[!##$%^&*]{2}$/.test("##"))
// ^^^^^^^^^^^^^^^^^
The (?:^|[^!##$%^&*]) non-capturing group matches start of string (^) or (|) any char other than !, #, #, $, %, ^, &, * ([^!##$%^&*])

Why does string.replace(/\W*/g,'_') prepend all characters?

I've been learning regexp in js an encountered a situation that I didn't understand.
I ran a test of the replace function with the following regexp:
/\W*/g
And expected it prepend the beginning of the string and proceed to replace all non-word characters.
The Number is (123)(234)
would become:
_The_Number_is__123___234_
This would be prepending the string because it has at least zero instances, and then replacing all non-breaking spaces and non-word characters.
Instead, it prepended every character and replaced all non-word characters.
_T_h_e__N_u_m_b_e_r__i_s__1_2_3__2_3_4__
Why did it do this?
The problem is the meaning of \W*. It means "0 or more non-word characters". This means that the empty string "" would match, given that it is indeed 0 non-word characters.
So the regex matches before every character in the string and at the end, hence why all the replacements are done.
You want either /\W/g (replacing each individual non-word character) or /\W+/g (replacing each set of consecutive non-word characters).
"The Number is (123)(234)".replace(/\W/g, '_') // "The_Number_is__123__234_"
"The Number is (123)(234)".replace(/\W+/g, '_') // "The_Number_is_123_234_"
TL;DR
Never use a pattern that can match an empty string in a regex replace method if your aim is to replace and not insert text
To replace all separate occurrences of a non-word char in a string, use .replace(/\W/g, '_') (that is, remove * quantifier that matches zero or more occurrences of the quantified subpattern)
To replace all chunks of non-word chars in a string with a single pattern, use .replace(/\W+/g, '_') (that is, replace * quantifier with + that matches one or more occurrences of the quantified subpattern)
Note: the solution below is tailored for the OP much more specific requirements.
A string is parsed by the JS regex engine as a sequence of chars and locations in between them. See the following diagram where I marked locations with hyphens:
-T-h-e- -N-u-m-b-e-r- -i-s- -(-1-2-3-)-(-2-3-4-)-
||| |
||Location between T and h, etc. ............. |
|1st symbol |
start -> end
All these positions can be analyzed and matched with a regex.
Since /\W*/g is a regex matching all non-overlapping occurrences (due to g modifier) of 0 and more (due to * quantifier) non-word chars, all the positions before word chars are matched. Between T and h, there is a location tested with the regex, and as there is no non-word char (h is a word char), the empty match is returned (as \W* can match an empty string).
So, you need to replace the start of string and each non-word char with a _. Naive approach is to use .replace(/\W|^/g, '_'). However, there is a caveat: if a string starts with a non-word character, no _ will get appended at the start of the string:
console.log("Hi there.".replace(/\W|^/g, '_')); // _Hi_there_
console.log(" Hi there.".replace(/\W|^/g, '_')); // _Hi_there_
Note that here, \W comes first in the alternation and "wins" when matching at the beginning of the string: the space is matched and then no start position is found at the next match iteration.
You may now think you can match with /^|\W/g. Look here:
console.log("Hi there.".replace(/^|\W/g, '_')); // _Hi_there_
console.log(" Hi there.".replace(/^|\W/g, '_')); // _ Hi_there_
The _ Hi_there_ second result shows how JS regex engine handles zero-width matches during a replace operation: once a zero-width match (here, it is the position at the start of the string) is found, the replacement occurs, and the RegExp.lastIndex property is incremented, thus proceeding to the position after the first character! That is why the first space is preserved, and no longer matched with \W.
A solution is to use a consuming pattern that will not allow zero-width matches:
console.log("Hi there.".replace(/^(\W?)|\W/g, function($0,$1) { return $1 ? "__" : "_"; }));
console.log(" Hi there.".replace(/^(\W?)|\W/g, function($0,$1) { return $1 ? "__" : "_"; }));
You can use RegExp /(^\W*){1}|\W(?!=\w)/g to match one \W at beginning of string or \W not followed by \w
var str = "The Number is (123)(234)";
var res = str.replace(/(^\W*){1}|\W(?!=\w)/g, "_");
console.log(res);
You should have used /\W+/g instead.
"*" means all characters by itself.
It's because you're using the * operator. That matches zero or more characters. So between every character matches. If you replace the expression with /\W+/g it works as you expected.
This should work for you
Find: (?=.)(?:^\W|\W$|\W|^|(.)$)
Replace: $1_
Cases explained:
(?= . ) # Must be at least 1 char
(?: # Ordered Cases:
^ \W # BOS + non-word (consumes bos)
| \W $ # Non-word + EOS (consumes eos)
| \W # Non-word
| ^ # BOS
| ( . ) # (1), Any char + EOS
$
)
Note this could have been done without the lookahead via
(?:^\W|\W$|\W|^$)
But, this will insert a single _ on an empty string.
So, it ends up being more elaborate.
All in all though, it's a simple replacement.
Unlike Stribnez's solution, no callback logic is required
on the replace side.

How to know if there are non-whitespace char before the target string?

How do i express this in regex to know if there are non-whitespace chars before '#include'?
var kword_search = "#include<iostream.>something";
/^?+\s*#include$/.test(kword_search)//must return false
var kword_search = "asffs#include<iostream.>something";
/^?+\s*#include$/.test(kword_search)//must return true
Not really good in regex
You are likely looking for something like /^[\S ]#include/
Explanation:
^ beginning of the string
[\S ] any character of: non-whitespace (all but
\n, \r, \t, \f, and " "), ' '
#include/ '#include/'
Regex quick reference
[abc] A single character: a, b or c
[^abc] Any single character but a, b, or c
[a-z] Any single character in the range a-z
[a-zA-Z] Any single character in the range a-z or A-Z
^ Start of line
$ End of line
\A Start of string
\z End of string
. Any single character
\s Any whitespace character
\S Any non-whitespace character
\d Any digit
\D Any non-digit
\w Any word character (letter, number, underscore)
\W Any non-word character
\b Any word boundary character
(...) Capture everything enclosed
(a|b) a or b
? Zero or one
* Zero or more
+ One or more
Use negated character class, with appropriate quantifier. And remove the $ anchor from the end, your string doesn't end with include:
/^[^\s]+#include/.test(kword_search)
/^(?:\s*|)#include/.test(kword_search)
Simply:
\S#include
See a live demo passing your tests on jsfiddle

Categories

Resources