How can I capture an optional asterisk with my regex? - javascript

I am trying to get sections from an API which is in markdown, so I'm using this:
(?<=\*\*(Test)\*\*)(.*?)(?=\*\*end\*\*)
https://regex101.com/r/r8aiVk/1
Result here should be Test and then this is a test, which it is. Awesome.
This works fine, however, some titles have an asterisk at the end, which is where I'm running into an issue. I loop through titles with the one regex, but I want to capture that one optional asterisk.
So with this example following, I want to be able to capture the asterisk along with the rest:
https://regex101.com/r/r8aiVk/2
The result here should be Test* this is a test.
I've tried various different ways, such as (\*?) and a few other variants, but I am unable to get this working.

The lookbehind implementation in JavaScript tricked you: to match the lookbehind pattern, the regex iterator goes backwards, and tries to match its pattern that way. Since it is executed at each location (your lookbehind is the first atom in the regex), it checks the start of string, then *, then **, then T, etc. and once it matches **Test**, it calls it a day. So, the next * is consumed with .*?.
You can get what you need using a mere consuming pattern:
/\*\*(Test\*?)\*\*(.*?)\*\*end\*\*/g
See the regex demo.
This pattern will be processed normally, from left to right, matching
\*\* - a ** substring
(Test\*?) - capturing Test or Test* into Group 1
\*\* - a ** substring
(.*?) - Capturing group 2: any 0+ chars other than line break chars, as few as possible
\*\*end\*\* - **end** substring.

Related

RegExp capturing non-match

I have a regex for a game that should match strings in the form of go [anything] or [cardinal direction], and capture either the [anything] or the [cardinal direction]. For example, the following would match:
go north
go foo
north
And the following would not match:
foo
go
I was able to do this using two separate regexes: /^(?:go (.+))$/ to match the first case, and /^(north|east|south|west)$/ to match the second case. I tried to combine the regexes to be /^(?:go (.+))|(north|east|south|west)$/. The regex matches all of my test cases correctly, but it doesn't correctly capture for the second case. I tried plugging the regex into RegExr and noticed that even though the first case wasn't being matched against, it was still being captured.
How can I correct this?
Try using the positive lookbehind feature to find the word "go".
(north|east|south|west|(?<=go ).+)$
Note that this solution prevents you from including ^ at the start of the regex, because the text "go" is not actually included in the group.
You have to move the closing parenthesis to the end of the pattern to have both patterns between anchors, or else you would allow a match before one of the cardinal directions and it would still capture the cardinal direction at the end of the string.
Then in the JavaScript you can check for the group 1 or group 2 value.
^(?:go (.+)|(north|east|south|west))$
^
Regex demo
Using a lookbehind assertion (if supported), you might also get a match only instead of capture groups.
In that case, you can match the rest of the line, asserting go to the left at the start of the string, or match only 1 of the cardinal directions:
(?<=^go ).+|^(?:north|east|south|west)$
Regex demo

How to extract separate parts of a string with a regex

I'm trying to build a regex that can process the following:
abc
abc-def
where the -def part is optional.
I'm wanting to get capture groups for the "abc", and optional "def" part.
I've tried this (in Javascript) but can't seem to figure out the optional part:
/^(.*)+(-(.*))?$/
It matches both examples but the optional part is contained in the first capture group. This should be simple, but I can't seem to get it right.
You're close, try a ? to make the expression lazy.
/^(.*?)(-(.*))?$/
You can try /^([^-]+)(-(.*))?$/. One issue is that the first + is outside of the capture group which means it'll only match the last character. Secondly, the .* is greedy and will match a -, gobbling all the way to the end of the line.
Runnable example:
console.log("abc-def".match(/^([^-]*)(-(.*))?$/));
console.log("abc".match(/^([^-]*)(-(.*))?$/));
You may not need to capture the substring starting with -, in which case /^([^-]*)(?:-(.*))?$/ could work.

Javascript RegEx match 1-1-1 and 1-1-1-1-1 but not -1-1-1-1 or 1-1-1-1-

i haven't found anything when using google and stack overflow.
I need to match 1-1-1 but not -1-1-1 or 1-1-1- with javascript RegEx.
So it has to start with a number and end with a number and has to be seperated with "-".
I can't figure out, how to do it.
Is it even possible?
Unfortunately, JavaScript regex doesn't have a look-behind (see javascript regex - look behind alternative?), so to exclude a preceding -, the regex will have to match on the preceding character too (as long as it's not a -).
Since there might not be a preceding character (input starts with 1), you have to also match on beginning of input (^).
So, this regex will do it: (?:[^-]|^)(1(?:-1)+)(?!-)
See regex101.com.
Whether it should match a standalone 1, or only on 1-1 (and longer), is up to you. The regex above will not match standalone 1. Change + to * to change that.
I also added capturing of the actual text you wanted to match, i.e. without the leading character. You can remove the extra () around 1(?:-1)+ if that's not needed.

is it possible to extract multiple segments from a string in javascript

example: I'm trying to return "abcdijklqrstyz" from the string "abcdefghijklmnopqrstuvwxyz"
I've tried str.splice but that only allows me to extract sequential characters.
You may be looking for Regular Expressions (regex). Here is an example of how to get what you requested in your example.
var match = 'abcdefghijklmnopqrstuvwxyz'.match(/^(.{4})(?:.{4})(.{4})(?:.{4})(.{4})(?:.{4})(.+)$/);
match.shift(); // This removes the passed in string from the results, leaving the matches
console.log(match.join(''));
The expression can be broken down to the following: ^ starts the match criteria at the beginning of the string [or line]. (.{4}) does a couple of things. The parenthesis make the content of the match end up in it's own "capture group". . means "match any character except newlines (pretty much)". {4} means match the following sequence a total of 4 times, no more and no less. The rest of the string is just a permutation of that. The only other differences you'll find are (?: which means it's a non-capture group and the contents will not be returned. This could have been omitted but for some people it provides more clarity when reading. Finally, $ at the very end means "end the match at the very end of the string [or line]".
See the example in action and play around with it here: https://regex101.com/r/X54DKC/1. There is also great documentation on building regular expression. Here is a great tutorial site: https://regexone.com/.

Is this regex the most efficient way of parsing my string?

First off, here are the parameters to follow in the string I allow the user to input:
If there is a slash, it has to appear at the start of the string, nowhere else, is limited to 1, is optional and must be succeeded by [a-zA-Z].
If there is a tilde, it has to appear after a space " ", nothing else, is optional and must be succeeded by [a-zA-Z]. Also, this expression is limited to 2. (ie: ~exa ~mple is passed but ~exa ~mp ~le is not passed)
The slash followed by a word is an instruction, like /get or /post.
The tilde followed by a word is a parameter like ~now or ~later.
String format:
[instruction] (optional) [query] [extra parameters] (optional)
[instruction] - Must contain / succeeded with [a-zA-Z] only
[query] - Can contain [\w\s()'-] (alphanumeric, whitespace, parentheses, apostrophe, dash)
[extra parameters] - ~ preceded by whitespace, succeeded with only [a-zA-Z]
String examples that should work:
/get D0cUm3nt ex4Mpl3' ~now
D0cUm3nt ex4Mpl3'
/post T(h)(i5 s(h)ou__ld w0rk t0-0'
String examples that shouldn't work:
//get document~now
~later
example ~now~later
Before I pass the string through the regex I trim any whitespace at the start and end of the string (before any text is seen) but I don't trim double whitespaces within the string as some queries require them.
Here is the regex I used:
^(/{0,1}[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$
To break it down slightly:
[instruction check] - (/{0,1}[a-zA-Z])?
[query check] - [\w\s()'-]*
[parameter check] - ((\s~[a-zA-Z]*){0,2})?
This is the first time I've actually done any serious regex away from a tutorial so I'm wondering is there anything I can change within my regex to make it more compact/efficient?
All fresh perspectives are appreciated!
Thanks.
From your regex: ^(/{0,1}[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$,
you can change {0,1} to ? that is a shortcut to say 0 or 1 times:
^(/?[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$
The last part is present 0,1 or 2 times, then the ? is superfluous:
^(/?[a-zA-Z])?[\w\s()'-]*(\s~[a-zA-Z]*){0,2}$
The first part may be simplified too, the ? just after the / is superfluous:
^(/[a-zA-Z])?[\w\s()'-]*(\s~[a-zA-Z]*){0,2}$
If you don't use the captured groups, you can change them to non-capture group: (?: ) that are more efficient
^(?:/[a-zA-Z])?[\w\s()'-]*(?:\s~[a-zA-Z]*){0,2}$
You can also use the case-insensitive modifier (?i):
^(?i)(?:/[a-z])?[\w\s()'-]*(?:\s~[a-z]*){0,2}$
Finally, as said in OP, ~ must be followed by [a-zA-Z], so change the last * by +:
^(?i)(?:/[a-z])?[\w\s()'-]*(?:\s~[a-z]+){0,2}$
This looks slightly better:
^(?:/?[a-zA-Z]*\s)?[\w\s()'-]*(?:\s~[a-zA-Z]*)*$
https://codereview.stackexchange.com/ is more the place for this kind of thing
Assuming that capture groups are useful to you:
^((?:\/|\s~)[a-z]+)?([\w\s()'-]+)(~[a-z]+)?$
Regex101 Demo
Maybe this is what you're looking for:
var regex = /^((\/)?[a-zA-Z]+)?[\w\s()'-]*((\s~)?[a-zA-Z]+){0,2}$/;

Categories

Resources