Regular expression match 0 or exact number of characters - javascript

I want to match an input string in JavaScript with 0 or 2 consecutive dashes, not 1, i.e. not range.
If the string is:
-g:"apple" AND --projectName:"grape": it should match --projectName:"grape".
-g:"apple" AND projectName:"grape": it should match projectName:"grape".
-g:"apple" AND -projectName:"grape": it should not match, i.e. return null.
--projectName:"grape": it should match --projectName:"grape".
projectName:"grape": it should match projectName:"grape".
-projectName:"grape": it should not match, i.e. return null.
To simplify this question considering this example, the RE should match the preceding 0 or 2 dashes and whatever comes next. I will figure out the rest. The question still comes down to matching 0 or 2 dashes.
Using -{0,2} matches 0, 1, 2 dashes.
Using -{2,} matches 2 or more dashes.
Using -{2} matches only 2 dashes.
How to match 0 or 2 occurrences?

Answer
If you split your "word-like" patterns on spaces, you can use this regex and your wanted value will be in the first capturing group:
(?:^|\s)((?:--)?[^\s-]+)
\s is any whitespace character (tab, whitespace, newline...)
[^\s-] is anything except a whitespace-like character or a -
Once again the problem is anchoring the regex so that the relevant part isn't completely optionnal: here the anchor ^ or a mandatory whitespace \s plays this role.
What we want to do
Basically you want to check if your expression (two dashes) is there or not, so you can use the ? operator:
(?:--)?
"Either two or none", (?:...) is a non capturing group.
Avoiding confusion
You want to match "zero or two dashes", so if this is your entire regex it will always find a match: in an empty string, in --, in -, in foobar... What will be match in these string will be an empty string, but the regex will return a match.
This is a common source of misunderstanding, so bear in mind the rule that if everything in your regex is optional, it will always find a match.
If you want to only return a match if your entire string is made of zero or two dashes, you need to anchor the regex:
^(?:--)?$
^$ match respectively the beginning and end of the string.

a(-{2})?(?!-)
This is using "a" as an example. This will match a followed by an optional 2 dashes.
Edit:
According to your example, this should work
(?<!-)(-{2})?projectName:"[a-zA-Z]*"
Edit 2:
I think Javascript has problems with lookbehinds.
Try this:
[^-](-{2})?projectName:"[a-zA-Z]*"
Debuggex Demo

Related

Regex finding second string

I'm attempting to get the last word in the following strings.
After about 45 minutes I can't seem to find the right combination of slashes, dashes and brackets.
The closest I've got is
/(?![survey])[a-z]+/gi
It matches the following strings, except for "required" it is returning the match "quired" I'm assuming it's because the re are in the word survey.
survey[1][title]
survey[1][required]
survey[2][anotherString]
You're using a character set, which will exclude any of the characters from being the first character in the match, which isn't what you want. Using plain negative lookahead would be a start:
(?!survey)[a-z]+
But you also want to match the final word, which can be done by matching word characters that are followed with \]$ - that is, by a ] and the end of the string:
[a-z]+(?=\]$)
https://regex101.com/r/rLvsY5/1
If you want to be more efficient, match the whole string, but capture what comes between the square brackets in a capturing group - the last repeated captured group will be in the result:
survey(?:\[(\w+)\])+
https://regex101.com/r/rLvsY5/2
One way to solve this is to match the full line and only capture the part you need.
survey\[\d+\]\[([a-z]+)\]

Forcing a Strict Character Order in a Regex Expression

I'm trying to create a regex in Javascript that has a limited order the characters can be placed in, but I'm having trouble getting the validation to be fully correct.
The criteria for the expression is a little complicated. The user must input strings with the following criteria:
The string contains two parts, an initial group, and an end group.
The groups are separated by a colon (:).
Strings are separated by a semi-colon (;).
The initial group can start with one optional forward-slash and end with one optional forward-slash, but these forward-slashes may not appear anywhere else in the group.
Inside forward-slashes, one optional underscore may appear on either end, but they may not appear anywhere else in the group.
Inside these optional elements, the user may enter any number of numbers or letters, uppercase or lowercase, but exactly one of these characters must be surrounded with angular brackets (<>).
If the letter inside the brackets is an uppercase C, it may be followed by one of a lowercase u or v.
The end group may contain one or more of a number or letter, uppercase or lowercase (If it is an uppercase C, it can be followed by a lowercase u or v.) or one asterisk (*), but not both.
A string must be able to validate with multiple groupings.
This probably sounds a little confusing.
For example, the following examples are valid:
<C>:Cu;
<Cu>:Cv;
/_V<C>V:C;
/_VV<Cv>VV_/:Cu;
_<V>:V1;
_<V>_:V1;
_<V>/:V1;
_<V>:*;
_<m>:n;
The following are invalid:
Cu:Cv;
Cu:Cv
CuCv;
<Cu/>:Cv;
<Cu_>:Cv;
<Cu>:Cv/;
_/<Cu>:Cv;
<Cu>/_:Cv;
They should validate when grouped together like so.
<Cu>:Cv;/_V<C>V:C;_<V>:V1;_<V>/:V1;_<V>:*;_<m>:n;
Hopefully, these examples help you understand what I'm trying to match.
I created the following regexp and tested it on Regex101.com, but this is the closest I could come:
\\/{0,1}_{0,1}[A-Za-z0-9]{0,}<{1}[A-Za-z0-9]{1,2}>{1}[A-Za-z0-9]{0,}_{0,1}\\/{0,1}):([A-Za-z0-9]{1,2}|\\*;$
It's mostly correct, but it allows strings that should be invalid such as:
_/<C>:C;
If an underscore comes before the first forward-slash, it should be rejected. Otherwise, my regexp seems to be correct for all other cases.
If anyone has any suggestions on how to fix this, or knows of a way to match all criteria much more efficiently, any help is appreciated.
The following seems to fulfill all the criteria:
(?:^|;)(\/?_?[a-zA-Z0-9]*<(?:[a-zA-Z]|C[uv]?)>[a-zA-Z0-9]*_?\/?):([a-zA-Z0-9]+|\*)(?=;|$)
Regex101 demo.
It puts each of the "groups" in a capturing group so you can access them individually.
Details:
(?:^|;) A non-capturing group to make sure the string is either at the beginning or starts with a semicolon.
( Start of group 1.
\/?_? An optional forward-slash followed by an optional underscore.
[a-zA-Z0-9]* Any letter or number - Matches zero or more.
<(?:[a-zA-Z]|C[uv]?)> Mandatory <> pair containing one letter or the capital letter C followed by a lowercase u or v.
[a-zA-Z0-9]* Any letter or number - Matches zero or more.
_?\/? An optional underscore followed by an optional forward-slash.
) End of group1.
: Matches a colon character literally.
([a-zA-Z0-9]+|\*) Group 2 - containing one or more numbers or letters or a single * character.
(?=;|$) A positive Lookahead to make sure the string is either followed by a semicolon or is at the end.
Did you mean this?
/^(?:(^|\s*;\s*)(?:\/_|_)?[a-z]*<[a-z]+>[a-z]*_?\/?:(?:[a-z0-9]+|\*)(?=;))+;$/i
We start with a case-insensitive expression /.../i to keep it more readable. You have to rewrite it to a case-sensitive expression if you only want to allow uppercase at the beginning of a word.
^ means the begin of the string. $ means the end of the string.
The whole string ends with ';' after multiple repeatitions of the inner expression (?:...)+ where + means 1 or more ocurrences. ;$ at the end includes the last semicolon into the result. It is not necessary for a test only, since the look-ahead already does the job.
(^|\s*;\s*) every part is at the begin of the string or after a semicolon surrounded by arbitrary whitespaces including linefeed. Use \n if you do not want to allow spaces and tabs.
(?:...|...) is a non-captured alternative. ? after a character or group is the quantifier 0/1 - none or once.
So (?:\/_|_)? means '/', '' or nothing. Use \/?_? if you do want to allow strings starting with a single slash as well.
[a-z]*<[a-z]+>[a-z]* 0 or more letters followed by <...> with at least one letter inside and again followed by 0 or more letters.
_?\/?: optional '_', optional '/', mandatory : in this sequence.
(?:[a-z0-9]+|\*) The part after the colon contains letters and numbers or the asterisk.
(?=;) Look-ahead: Every group must be followed by a semicolon. Look-ahead conditions do not move the search position.

Regex exact match on number, not digit

I have a scenario where I need to find and replace a number in a large string using javascript. Let's say I have the number 2 and I want to replace it with 3 - it sounds pretty straight forward until I get occurrences like 22, 32, etc.
The string may look like this:
"note[2] 2 2_ someothertext_2 note[32] 2finally_2222 but how about mymomsays2."
I want turn turn it into this:
"note[3] 3 3_ someothertext_3 note[32] 3finally_2222 but how about mymomsays3."
Obviously this means .replace('2','3') is out of the picture so I went to regex. I find it easy to get an exact match when I am dealing with string start to end ie: /^2$/g. But that is not what I have. I tried grouping, digit only, wildcards, etc and I can't get this to match correctly.
Any help on how to exactly match a number (where 0 <= number <= 500 is possible, but no constraints needed in regex for range) would be greatly appreciated.
The task is to find (and replace) "single" digit 2, not embedded in
a number composed of multiple digits.
In regex terms, this can be expressed as:
Match digit 2.
Previous char (if any) can not be a digit.
Next char (if any) can not be a digit.
The regex for the first condition is straightforward - just 2.
In other flavours of regex, e.g. PCRE, to forbid the previous
char you could use negative lookbehind, but unfortunately Javascript
regex does not support it.
So, to circumvent this, we must:
Put a capturing group matching either start of text or something
other than a digit: (^|\D).
Then put regex matching just 2: 2.
The last condition, fortunately, can be expressed as negative lookahead,
because even Javascript regex support it: (?!\d).
So the whole regex is:
(^|\D)2(?!\d)
Having found such a match, you have to replace it with the content
of the first capturing group and 3 (the replacement digit).
You can use negative look-ahead:
(\D|^)2(?!\d)
Replace with: ${1}3
If look behind is supported:
(?<!\d)2(?!\d)
Replace with: 3
See regex in use here
(\D|\b)2(?!\d)
(\D|\b) Capture either a non-digit character or a position that matches a word boundary
(?!\d) Negative lookahead ensuring what follows is not a digit
Alternations:
(^|\D)2(?!\d) # Thanks to #Wiktor in the comments below
(?<!\d)2(?!\d) # At the time of writing works in Chrome 62+
const regex = /(\D|\b)2(?!\d)/g
const str = `note[2] 2 2_ someothertext_2 note[32] 2finally_2222 but how about mymomsays2.`
const subst = "$13"
console.log(str.replace(regex, subst))

what is difference between these two syntax in my code

What is difference between The following syntaxs in regular expression?
Please give an example.
(?=.*\d)
and
.*(?=\d)
The first one is just an assertion, a positive look-ahead saying "there must be zero or more characters followed by a digit." If you match it against a string containing at least one digit, it will tell you whether the assertion is true, but the matched text will just be an empty string.
The second one searches for a match, with an assertion (a positive-lookahead) after the match saying "there must be a digit." The matched text will be the characters before the last digit in the string (including any previous digits, because .* is greedy, so it'll consume digits up until the last one, because the last one is required by the assertion).
Note the difference in the match object results:
var str = "foo42";
test("rex1", /(?=.*\d)/, str);
test("rex2", /.*(?=\d)/, str);
function test(label, rex, str) {
console.log(label, "test result:", rex.test(str));
console.log(label, "match object:", rex.exec(str));
}
Output (for those who can't run snippets):
rex1 test result: true
rex1 match object: [
""
]
rex2 test result: true
rex2 match object: [
"foo4"
]
Notice how the match result in the second case was foo4 (from the string foo42), but blank in the first case.
(?=...) is a positive lookahead. Both of these expressions will match "any text followed by a number". The difference, though, is that (?=...) doesn't "eat" ("capture") any characters as it matches. For practical purposes, if this is the only thing your regex contains, they'll match the same stuff. However, .*(?=\d) would be a more correct expression, unless there's more to it than what you put in the question.
Where it really matters is when you're using capturing groups or where you're using the content of the matched text after running the regular expression:
If you want to capture all text before the number, but not the number itself, and use it after, you could do this:
(.*?(?=\d))
The ? makes the match non-greedy, so it will only match up to the first number. All text leading up to the number will be in the match result as the first group.
Please find the difference below
In detail
.* means matches any character (except newline)
(?=\d) means Positive Lookahead - Assert that the regex below can be matched
\d match a digit [0-9]
(?=.*\d)
CapturingGroup
MatchOnlyIfFollowedBy
Sequence: match all of the followings in order
Repeat
AnyCharacterExcept\n
zero or more times
Digit
.*(?=\d)
Sequence: match all of the followings in order
Repeat
AnyCharacterExcept\n
zero or more times
CapturingGroup
MatchOnlyIfFollowedBy
Digit

regular expression incorrectly matching % and $

I have a regular expression in JavaScript to allow numeric and (,.+() -) character in phone field
my regex is [0-9-,.+() ]
It works for numeric as well as above six characters but it also allows characters like % and $ which are not in above list.
Even though you don't have to, I always make it a point to escape metacharacters (easier to read and less pain):
[0-9\-,\.+\(\) ]
But this won't work like you expect it to because it will only match one valid character while allowing other invalid ones in the string. I imagine you want to match the entire string with at least one valid character:
^[0-9\-,\.\+\(\) ]+$
Your original regex is not actually matching %. What it is doing is matching valid characters, but the problem is that it only matches one of them. So if you had the string 435%, it matches the 4, and so the regex reports that it has a match.
If you try to match it against just one invalid character, it won't match. So your original regex doesn't match the string %:
> /[0-9\-,\.\+\(\) ]/.test("%")
false
> /[0-9\-,\.\+\(\) ]/.test("44%5")
true
> "444%6".match(/[0-9\-,\.+\(\) ]/)
["4"] //notice that the 4 was matched.
Going back to the point about escaping, I find that it is easier to escape it rather than worrying about the different rules where specific metacharacters are valid in a character class. For example, - is only valid in the following cases:
When used in an actual character class with proper-order such as [a-z] (but not [z-a])
When used as the first or last character, or by itself, so [-a], [a-], or [-].
When used after a range like [0-9-,] or [a-d-j] (but keep in mind that [9-,] is invalid and [a-d-j] does not match the letters e through f).
For these reasons, I escape metacharacters to make it clear that I want to match the actual character itself and to remove ambiguities.
You just need to anchor your regex:
^[0-9-,.+() ]+$
In character class special char doesn't need to be escaped, except ] and -.
But, these char are not escaped when:
] is alone in the char class []]
- is at the begining [-abc] or at the end [abc-] of the char class or after the last end range [a-c-x]
Escape characters with special meaning in your RegExp. If you're not sure and it isn't an alphabet character, it usually doesn't hurt to escape it, too.
If the whole string must match, include the start ^ and end $ of the string in your RegExp, too.
/^[\d\-,\.\+\(\) ]*$/

Categories

Resources