How to exclude such pattern from regex matching?

How to exclude such pattern from regex matching? - javascript

I would like to compose a regular expression to highlight keywords.
The regex is kind of like
\btap\b.
And for below sentence, it's expected to match only one "tap" without double quotation. But in reality, it also match the second "tap" within quotation symbol.
tap click "tap"
How can I exclude the second tap word from being matched?

This seems working fine.
var reg = new RegExp('\\b(tap(?!\"))', 'ig')
('tap click "tap" tap.').match(reg)
Rules
Starting word
not quotes at end
case insensitive.
Fiddle

Word boundaries \b matches any non-word character (so the " also).
You can simulate your own word boundaries where to include only what you think is appropriate.
In example:
\s|^|\.|!|\?|$ - space or start of string, or dot, or exclamation mark, or question mark, or end of string
I would also suggest to use negative lookbehinds/-aheads but...
Javascript doesn't support lookbehinds
So you could use some capturing groups and then use the group which you need.
Sample regex: (?:\s|^|\.|!|\?)(tap)(\s|$|\.|!|\?)
And then in the javascript use the first capturing group - match[1].
See this SO answer for details how to use capturing groups in JavaScript.

Related

Javascript RegEx Templating Edge Case

I have a RegEx implemented with JavaScript that is close to doing what I want. However, I am having an issue figuring out the last piece which is causing an issue with an edge case. Here is the RegEx that I have so far:
/\$\{(.+?(}\(.+?\)|}))/g
The idea is that this RegEx would use a templating system to replace/inject variables in a string based on templated variables. Here is an example of the edge case issue:
"Here is a template string ${G:SomeVar:G${G:SomeVar:G} that value gets injected in."
The problem is the RegEx is matching this:
"${G:SomeVar:G${G:SomeVar:G}"
What I want it to match is this:
"${G:SomeVar:G}"
How would I get the RegEx to match the expected variable in this edge case?

You have an alternation in your pattern to either stop at } or also match a following (...) after it.
As the dot can match any character, you can use a negated character class to exclude matching { } ( )
If you want to match ${G:SomeVar:G} but also ${G:SomeVar:G}(test) you can add an optional non capture group after it.
For a match only, you can omit the capture groups.
\$\{[^{}]*}(?:\([^()]*\))?
See a regex101 demo.
If the format of the string with the : and the same character before and after it should be matched, you can use a capture group with a backreference:
\$\{([A-Z]):[^{}]*?:\1}(?:\([^()]*\))?
See a regex101 demo.

Instead of matching anything with (.+?), change it to not match another closing brace or dollar sign, [^{$].
\$\{([^{$]+?(}\(.+?\)|}))

Can I make my regex split the punctuation marks from my special words?

I have the following string:
"By signing in, I agree to the {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}."
And I am using the following regex to split the words while considering {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}} as whole words.
\s+(?![^\[]*\])
My problem is that my current regex does not remove the full stop at the end of {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}.. Ideally I would like my regex to split full stops, exclamation marks and question marks. That being said, I'm not sure how would I differentiate between a full stop at the end of the word and a full stop that is part of the URL.

You can try a variation of the following regular expression:
\s+(?![^\[]*\])|(?=[\.?!](?![a-zA-Z0-9_%-]))
The new part being the alternation of (?=[\.?!](?![a-zA-Z0-9_%-])) at the end. It performs a positive lookahead of a period, question mark or bang, using a negative lookahead to make sure it's not followed by a URL-ish looking character. You may need to adjust that character class in brackets to contain the characters you want to consider part of the URL.

Instead of .split you will be better off using .match here using this regex:
\{\{#a}}.*?\{\{\/a}}/g
This matches {{#a}} followed by 0 or of any character followed by {{/a}}.
or else you may use this more strict regex match:
\{\{#a}}\[[^\]]*]\([^)]*\)\{\{\/a}}
Here:
\[[^\]]*]: Matches [...] substring
\([^)]*\): Matches (...) substring
RegEx Demo
var string = "By signing in, I agree to the {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}.";
console.log( string.match(/\{\{#a}}.*?\{\{\/a}}/g) );

RegExp capturing non-match

I have a regex for a game that should match strings in the form of go [anything] or [cardinal direction], and capture either the [anything] or the [cardinal direction]. For example, the following would match:
go north
go foo
north
And the following would not match:
foo
go
I was able to do this using two separate regexes: /^(?:go (.+))$/ to match the first case, and /^(north|east|south|west)$/ to match the second case. I tried to combine the regexes to be /^(?:go (.+))|(north|east|south|west)$/. The regex matches all of my test cases correctly, but it doesn't correctly capture for the second case. I tried plugging the regex into RegExr and noticed that even though the first case wasn't being matched against, it was still being captured.
How can I correct this?

Try using the positive lookbehind feature to find the word "go".
(north|east|south|west|(?<=go ).+)$
Note that this solution prevents you from including ^ at the start of the regex, because the text "go" is not actually included in the group.

You have to move the closing parenthesis to the end of the pattern to have both patterns between anchors, or else you would allow a match before one of the cardinal directions and it would still capture the cardinal direction at the end of the string.
Then in the JavaScript you can check for the group 1 or group 2 value.
^(?:go (.+)|(north|east|south|west))$
^
Regex demo
Using a lookbehind assertion (if supported), you might also get a match only instead of capture groups.
In that case, you can match the rest of the line, asserting go to the left at the start of the string, or match only 1 of the cardinal directions:
(?<=^go ).+|^(?:north|east|south|west)$
Regex demo

JS Regexp: \b for hyphenated words

In JavaScript regexp, what can I use in place of \b to get the same effect but on words that may be hyphenated?
(This question is directed at readers familiar with \b and with hyphenation, and so does not provide examples.)
UPDATE
Addison's (?<!-)\b(?!-) here is a partial solution for PCRE. It falls short on -500, by losing the boundary that \b delivers. It doesn't work on lookbehind-less JavaScript.

You can't create your own version of \b in regex flavors like JavaScript that don't support lookbehind. \b matches at a position. It needs to check the character (or lack thereof) before and after that position in order to determine whether the position should be matched. This requires both lookahead and lookbehind.
You can match hyphenated words (ASCII only) with this regex:
\b[a-zA-Z\-]+\b
This regex will allow hyphens before and after the word but does not include those in the match.

I would consider using the \b expression, but modify it to be a little more fussy. Add a negative lookahead and lookbehind to it, so that it doesn't appear beside a hypen:
(?<!-)\b(?!-)
Try it on Regex 101
Note that this might cause problems with words such as -500, depending on what behaviour you want. You might want there to be a boundary before or after the hyphen (or not at all).
UPDATE
The regex gets much more complex, since there is not ordinarily a boundary before a hyphen, meaning one must be added.
(?<!-)\b(?!-)|\B(?=-\w)
The second condition adds a boundary wherever there is a non-word boundary followed by a hyphen and a word character. It's very explicit, but this is the only case it happens.

Specific regex positive look(around|ahead|behind) in Javascript

I'm looking to match /(?=\W)(gimme)(?=\W)/gi or alike. The \W are supposed to be zero-width characters to surround my actual match.
Maybe some background. I want te replace certain words (always \w+) with some literal padding added, but only if it's not surrounded by a \w. (That does sound like a negative lookaround, but I hear JS doesn't do those!?)
(Btw: the above "gimme" is the word literal I want to replace. If that wasn't obvious.)
It has to be (?) a lookaround, because the \W have to be zero-width, because the intention is a .replace(...) and I cannot replace/copy the surrounding characters.
So this won't work:
text.replace(/(?=\W)(gimme)(?=\W)/gi, function(l, match, r) {
return l + doMagic(match) + r;
})
The zero-width chars have to be ignored, so the function can return (and replace) only doMagic(match).
I have only very limited lookaround experience and non of it in JS. Grazie.
PS. Or maybe I need a lookbehind and those aren't supported in JS..? I'm confused?
PS. A little bit of context: http://jsfiddle.net/rudiedirkx/kMs2N/show/ (ooh a link!)

you can use word boundary shortcut \b to assert that it's the whole word that you are matching.
The easiest way to achieve what you want to do is probably to match:
/(\s+gimme)(?=\W)/gi
and replace with [yourReplacement] - i.e. capture the whitespaces before 'gimme' and then include one in the replacement.
Another way to approach this would be capturing more characters before and after the gimme literal and then using the groups with backreference:
(\W+?)gimme(\W+?) - your match - note that this time the before and after characters are in the capturing groups 1 and 2
And you'd want to use \1[yourReplacement]\2 as replacement string - not sure how you use backreference in JS, but the idea is to tell the engine that with \1 you mean whatever was matched by the first captuing parenthesis. In some languages these are accessed with $1.

What you currently have will not work, for the following reason, (?=\W) means "the next character is not a word character", and the next thing you try to match is a literal g, so you have a contradiction ("next character is a g, but isn't a letter").
You do in fact need a lookbehind, but they are not supported by JavaScript.
Check out this article on Mimicking Lookbehind in JavaScript for a possible approach.

Have you considered using a lexer/parser combo?
This one is javascript based, and comes with a spiffy demonstration.

Develop Reference

JavaScript is the programming language of the Web.

How to exclude such pattern from regex matching? - javascript

This seems working fine. var reg = new RegExp('\\b(tap(?!\"))', 'ig') ('tap click "tap" tap.').match(reg) Rules Starting word not quotes at end case insensitive. Fiddle

Related

Javascript RegEx Templating Edge Case

Can I make my regex split the punctuation marks from my special words?

RegExp capturing non-match

JS Regexp: \b for hyphenated words

Specific regex positive look(around|ahead|behind) in Javascript

Categories

Resources