Regexp match optional group unless it's got something inside it - javascript

I'm playing around with this regexp: http://regex101.com/r/dL3qX1
!\[(.*?)\](?:\(\)|\[\])?
All the below strings should match. However, should the second set of brackets, that is optional, contain anything within it, the regexp should match nothing.
// Match
![]
![caption]
![]()
![caption]()
![][]
![caption][]
// No match
![][No match]
![caption][No match]
![](No match)
![caption](No match)
I should still be able to match examples that have text at the end of the line.
![] hello
![caption][] hi there
In other words, I only want a match if there is no optional group, or if there is, I only want a match if the optional group is empty (nothing between the brackets).
Is what I'm after possible?

I personally prefer using negated class when it comes to brackets:
^!\[([^\[\]]*)\](?:\(\)|\[\])?$
regex101 demo
I substituted (.*?) to [^\[\]]*, added ^ and $ at the beginning and end respectively.
That is, if I understood what you're looking for correctly, only the first set is matching.

You can use this regex:
^!\[[^\]]*\](?:\(\)|\[\])?$
Working Demo: http://regex101.com/r/eX0sR8
Note use of [^\]]* instead of .*? in the first square brackets which makes sure to match until very first ]. Also better to use line start/end anchors ^ and $

Related

RegExp capturing non-match

I have a regex for a game that should match strings in the form of go [anything] or [cardinal direction], and capture either the [anything] or the [cardinal direction]. For example, the following would match:
go north
go foo
north
And the following would not match:
foo
go
I was able to do this using two separate regexes: /^(?:go (.+))$/ to match the first case, and /^(north|east|south|west)$/ to match the second case. I tried to combine the regexes to be /^(?:go (.+))|(north|east|south|west)$/. The regex matches all of my test cases correctly, but it doesn't correctly capture for the second case. I tried plugging the regex into RegExr and noticed that even though the first case wasn't being matched against, it was still being captured.
How can I correct this?
Try using the positive lookbehind feature to find the word "go".
(north|east|south|west|(?<=go ).+)$
Note that this solution prevents you from including ^ at the start of the regex, because the text "go" is not actually included in the group.
You have to move the closing parenthesis to the end of the pattern to have both patterns between anchors, or else you would allow a match before one of the cardinal directions and it would still capture the cardinal direction at the end of the string.
Then in the JavaScript you can check for the group 1 or group 2 value.
^(?:go (.+)|(north|east|south|west))$
^
Regex demo
Using a lookbehind assertion (if supported), you might also get a match only instead of capture groups.
In that case, you can match the rest of the line, asserting go to the left at the start of the string, or match only 1 of the cardinal directions:
(?<=^go ).+|^(?:north|east|south|west)$
Regex demo

JS regular expression, basic lookahead

I cannot figure out, for the life of me, why this regular expression
^\.(?=a)$
does not match
".a"
anyone know why?
I am going off the information provided here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
The reason it doesn't work is because the lookahead doesn't actually consume any characters, so your matching position doesn't advance.
^\.(?=a)$
Matches the beginning of line (^ -- this matches) followed by a literal . (\. -- this also matches), and then (without consuming any characters), checks to see if the next character is a literal a ((?=a)). It is, so the lookahead matches. It then asserts that your position is at the end of the string ($). This is not the case, because we're still right after the ., so the match fails.
Another possible matching expression would be
^\.(?=a$)
Which works just as above, but the assertion about the end of the line is contained in the lookahead, so this time, it matches.
Your regex is only going to match a period that's followed by an 'a', without including 'a' in the match.
Another issue is that you're using $ after a character that's basically being ignored.
Remove the $ and it will work as described.
Bonus: I've enjoyed using this lately http://www.regexpal.com/

Need help to find the right regex pattern to match

my RegEx is not working the way i think, it should.
[^a-zA-Z](\d+-)?OSM\d*(?![a-zA-Z])
I will use this regex in a javascript, to check if a string match with it.
Should match:
12345612-OSM34
12-OSM34
OSM56
7-OSM
OSM
Should not match:
-OSM
a-OSM
rOSMann
rOSMa
asdrOSMa
rOSM89
01-OSMann
OSMond
23OSM
45OSM678
One line, represents a string in my javascript.
https://www.regex101.com/r/xQ0zG1/3
The rules for matching:
match OSM if it stands alone
optional match if line starts with digit/s AND is followed by a -
optional match if line ends with digit/s
match all 3 above combined
no match if line starts with a character/word except OSM
no match if line end with chracter/word except OSM
I Hope someone can help.
You can use the following simplified pattern using anchors:
^(?:\d+-)?OSM\d*$
The flags needed (if matching multi-line paragraph) would be: g for global match and m for multi-line match, so that ^ and $ match the begin/end of each line.
EDIT
Changed the (\d+-) match to (?:\d+-) so that it doesn't group.
[^a-zA-Z](\d+-)?OSM\d*(?![a-zA-Z])
[^a-zA-Z] In regex, you specify what you want, not what you don't want. This piece of code says there must be one character that isn't a letter. I believe what you wanted to say is to match the start of a line. You don't need to specify that there's no letter, you're about to specify what there will be on the line anyway. The start of a regex is represented with ^ (outside of brackets). You'll have to use the m flag to make the regex multi-line.
(\d+-)? means one or more digits followed by a - character. The ? means this whole block isn't required. If you don't want foreign digits, you might want to use [0-9] instead, but it's not as important. This part of the code, you got right. However, if you don't need capture blocks, you could write (?:) instead of ().
\d*(?![a-zA-Z]) uses lookahead, but you almost never need to do that. Again, specifying what you don't want is a bad idea because then I could write OSMé and it would match because you didn't specify that é is forbidden. It's much simpler to specify what is allowed. In your case since you want to match line ends. So instead, you can write \d*$ which means zero or more digits followed by the end of the line.
/^(?:\d+-)?OSM\d*$/gm is the final result.

How to match with an exact string using regular expression

I have small requirement.I want to search a string with exact match.
Suppose i want to search for None_1, i am searching for 'None_1' using /None_1/, but it is matching even "xxxNone" but my requirement is it should match only None_[any digit].
Here is my code
/^None_+[0-9]{?}/
So it should match only None_1 , None_2
You should also anchor the expression at the end of the line. But that alone will not make it work. Your expression is wrong. I think it should be:
/^None_[0-9]+$/
^ matches the beginning of a line
[0-9]+ matches one or more digits
None_ matches None_
$ matches the end of a line
If you only want to match one digit, remove the +.
Your original expression /^None_+[0-9]{?}/ worked like this:
^ matches the beginning of a line
None matches None
_+ matches one or more underscores
[0-9] matches one digit
{? matches an optional opening bracket {
} matches }
Try this:
/^None_+[0-9]{?}$/

How can I shorten this regex for JavaScript?

Basically I just want it to match anything inside (). I tried the . and * but they don't seem to work. Right now my regex looks like:
\(([\\\[\]\-\d\w\s/*\.])+\)
The strings it's going to match are URL routes like:
#!/foo/bar/([a-z])/([\d\w])/(*)
In this example, my regex above matches:
([a-z])
([\d\w])
(*)
BONUS:
How can I make it so that it only matches when it starts with a ( and ends with a ). I thought I used the ^ at the front where it's \( and the $ and the end where it's \) but no luck.
Disregard this bonus. I didnt realize it didnt matter...
Are you worried about nested parentheses? If not, you could set it up to match all characters that aren't a closing paren:
\(([^)]*)\)
Basically I just want it to match anything inside ().
BONUS: How can I make it so that it only matches when it starts with a ( and ends with a )?
Easy peasy.
var re1 = /^\(.*\)$/
// or
var re2 = new RegExp('^\\(.*\\)$');
Edit
Re: #Mike Samuel's comments
Does not match newlines between the parentheses which were explicitly matched by \s in the original.
...
Maybe you should use [\s\S] instead of .
...
If you're going to exclude newlines you should do so intentionally or explicitly.
Note that . matches any single character except the newline character. If you also want to match newlines as part of the "anything" between parentheses, use the [\s\S] character class:
var re3 = /^\([\s\S]*\)$/
// or
var re4 = new RegExp('^\\([\\s\\S]*\\)$');
To negate a match, you use the [^...] construct. Thus, to match anything within parentheses, you would use:
\([^)]+\)
which says "match any string that starts with an open parenthesis, contains any number of characters that are not closing parentheses and ends with a closing parenthesis.
To match entire lines that match the above construct, just wrap it with ^ and $:
^\([^)]+\)$
I'm not completely sure I understand what you're doing, but try this:
var re = /\/(\([^()]+\)(?=\/|$)/;
Matching the leading slash in addition to the opening paren ensures that the paren is indeed at the beginning. You can't do the same thing at the end because you don't know there will be a trailing slash. And if there is one, you don't want to consume it because it's also the leading slash for the next match attempt.
Instead, you use the lookahead - (?=\/|$) - to match the trailing slash without consuming it. If there is no slash, I assume no other character should be present either--hence the anchor: $.
#patorjk brought up a good point, though: can there be more parentheses between the outermost pair? If there are, the problem is much more complicated. I won't bother trying to expand my regex to deal with nested parens; some regex flavors can handle such things, but not JavaScript. Instead I'll recommend this sloppier regex:
\/(\([\s\S]+?\))(?=\/|$)
I say "sloppy" because it relies on the assumption that the sequences /( and )/ will never appear inside a valid match. As with my first regex, the text that you're interested in (i.e., everything but the leading and trailing slashes) will be captured in group #1.
Notice the non-greedy quantifier, too. With a regular greedy quantifier it will match everything from the first ( to the last ) in one shot. In other words, it'll match ([a-z])/([\d\w])/(*) instead of ([a-z]), ([\d\w]) and (*) as you wanted.

Categories

Resources