Regex pattern to detect anything like $$_$$ - javascript

I am trying to write a regex that works like below if I have something like
Hi $$var1$$ $$var2$$ how are you?
it should return me two matches
$$var1$$ -1st match
$$var2$$ -2nd match
Currently i have made a regex /\$\$\w+\d*\W*\$\$/gi this works fine if the two patterns are not next to each other for eg:
Hi $$var1$$ and $$var2$$ how are you?
it detects two matches, but if these two matches are next to each other like in the first example seperated by space it detects $$var1$$ $$. Can someone help me fix this regex?

I don't know all your restrictions, but it might be easier and faster with a negated class:
\$\$[^$]+\$\$
If single dollar signs are allowed between the double ones, then you might use something like this to keep up the same speed:
\$\$(?:[^$]+|\$(?!\$))+\$\$
(?:[^$]+|\$(?!\$))+ matches either a non $ or one $ that is not followed by a second $ (though at that point, \$\$.+?\$\$ would actually be simpler should that be the case).
The reason why your regex was not behaving as you expected was that \W* can potentially match any dollar signs. So if there was anything that does not match \w in between the last dollar signs of pattern you want to match, your regex would continue matching until the next double dollar signs.

\$\$\w+\d*?\W*?\$\$
^^
Make your \W non greedy as it will eat up $$ $$ but not $$ and $$ as it cannot consume and.See demo.
https://regex101.com/r/tP7qE7/5

Related

RegExp Match all text parts except given words

I have a text and I need to match all text parts except given words with regexp
For example if text is ' Something went wrong and I could not do anything ' and given words are 'and' and 'not' then the result must be ['Something went wrong', 'I could', 'do anything']
Please don't advise me to use string.split() or string.replace() and etc. I know a several ways how I can do this with build-in methods. I'm wonder if there a regex which can do this, when I will execute text.math(/regexp/g)
Please note that the regular expression must work at least in Chrome, Firefox and Safari versions not lower than the current one by 3! At the moment of asking this question the actual versions are 100.0, 98.0.2 and 15.3 respectively. For example you can not use lookbehind feature in Safari
Please, before answering my question, go to https://regexr.com/ and check your answer!. Your regular expression should highlight all parts of a sentence, including spaces between words of need parts and except empty spaces around need parts, except for the given words
Before asking this question I tried to do my own search but this links didn't help me. I also tried non accepted answers:
Match everything except for specified strings
Regex: match everything but a specific pattern
Regex to match all words except a given list
Regex to match all words except a given list (2)
Need to find a regular expression for any word except word1 or word2
Matching all words except one
Javascript match eveything except given words
It's possible with only using match and lookaheads in javascript.
/\b(?=\w)(?!(?:and|not)\b).*?(?=\s+(?:and|not)\b|\s*$)/gi
Test on RegExr here
Basically match the start of a word that's not a restricted word
\b(?=\w)(?!(?:and|not)\b)
Then a lazy match till the next whitespaces and restricted word, or the end of the line without including last whitespaces.
.*?(?=\s+(?:and|not)\b|\s*$)
Test Snippet :
const re = /\b(?=\w)(?!(?:and|not)\b).*?(?=\s+(?:and|not)\b|\s*$)/gi
let str = ` Something went wrong and I could not do anything `;
let arr = str.match(re);
console.log(arr);
See Edit further down.
You can use this regex, which only use look ahead:
/(?!and|not)\b.*?(?=and|not|$)/g
Explanation:
(?!and|not) - negative look ahead for and or not
\b - match word boundary, to prevent matching nd and ot
.*? - match any char zero or more times, as few as possible
(?=and|not|$) - look ahead for and or not or end of text
If your text has multiple lines you can add the m flag (multiline). Alternatively you can replace dot (.) with [\s\S].
Edit:
I have changed it a little so spaces around the forbidden words are removed:
/(?!and|not)\b\w.*?(?= and| not|$)/g
I have added a \w character match to push the start of the match after the space and added spaces in the look ahead.
Edit2: (to handle multiple spaces around words):
You were very close! All you need is a \s* before the dollar sign and specified words:
/(?!and|not|\s)\b.*?(?=\s*(and|not|$))/g
Updated link: regexr.com

how to negate a capture group?

Using a javascript regexp, I would like to find strings like "/foo" or "/foo d/" but not "/foo /"; ie, "annotation character", then either word with no terminating annotation, or multiple words, where the termination comes at the end of the phrase (with no space). Complicating the situation, there are three possible annotation symbols: /, \ and |.
I've tried something like:
/(?:^|\s)([\\\/|])((?:[\w_-]+(?![^\1]+[\w_-]\1))|(?:[\w\s]+[\w](?=\1)))/g
That is, start with space, then annotation, then
word not followed by (anything but annotation) then letter and annotation... or
possibly multiple words, immediately followed by annotation character.
The problem is the [^\1]: this doesn't read as "anything but the annotation character" in the angle brackets.
I could repeat the whole phrase three times, one for each annotation character. Any better ideas?
As you've mentioned, [^\1] doesn't work - it matches anything that is not the character 1. In JavaScript, you can negate \1 by using a lookahead: (?:(?!\1).)* . This is not as efficient, but it works.
Your pattern can be written as:
([\\\/|])([\w\-]+(?:(?:(?!\1).)*[\w\-]\1)?)
Working example at Regex101
\w already contains underscore.
Instead of alternation (a|ab) I'm using an optional group (a(?:b)?) - we always match the first word, with optional further words and tags.
You may still want to include (?:^|\s) at the beginning.

Regular expression - val.replace(/^[^a-zA-Z0-9]*|[^a-zA-Z0-9]*$/g,"'');

I was learning regular expression, It seems very much confusing to me for now.
val.replace(/^[^a-zA-Z0-9]*|[^a-zA-Z0-9]*$/g, '');
In the above expression
1) which part denotes not to include white space? as i am trying to exclude all non alphanumeric characters.
2) Since i don't want to use even '$' and ''(underscore) can i specify '$' & ''(underscore) in expression something like below?
val.replace(/^[^a-zA-Z0-9$_]*|[^a-zA-Z0-9$_]*/g, '');?
3) As 'x|y' specify that - "Find any of the alternatives specified". Then Why we have used something like this [^a-zA-Z0-9]|[^a-zA-Z0-9] which is same on both sides?
Please help me understand this, Finding it bit confused and difficult.
This regular expression replaces all starting and trailing non alphanumeric characters from the string.
It doesn't specifically specifies whitespace. It just negates every thing other than alphanumeric characters. Whatever inside square bracket is a character set - [Whatever]. A starting cap(^) INSIDE the character set says its a negation. So [^a-zA-Z0-9]* says zero or more characters which are other than a-z, A-z or 0-9.
The $ sign at the end says, to the end of string and nothing to do with $ and _ symbols. That will be already included in the character set as it all non alpha numeric characters.
Refer answer of #smathy.
Also just FYI, AFAIU regular expression can't be learned by scrolling a tutorial. You just need to go through the basics and try out the examples.
Some basic info.
When you read regular expressions, you read them from left to right. That's how the engine does it.
This is important in the case of alternations as the one on the left side(s) are always tried first.
But in the case of a $ (EOL or EOS) anchor, it might be easier to read from right to left.
Built-in assertions like line break anchors ^$ and word boundry \b along with normal assertions look ahead (?=)(?!) and look behind (?<=)(?<!), do not consume characters.
They are like single path in-line conditionals that pass or fail, where only if it passes will the expression to the right of it be examined. So they do actually Match something, they match a condition.
Format your regex so you can see what its doing. (Use a app to help you RegexFormat 5)
^ # BOS
[^a-zA-Z0-9]* # Optional not any alphanum chars
| # or,
[^a-zA-Z0-9]* # Optional not any alphanum chars
$ # EOS
Your regex in global context will always match twice, once at the beginning of the string, once at the end because of the line break anchors and because you don't actually require anything else to match.
So basically you should avoid trying to match (mix) all optional things with the built-in anchors ^$\b. That means your regex is better represented by ^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+$ since you don't care if its NOT there (in the case of *, zero or more quantifier).
Good Luck, keep studying.
To answer your third question, the alternatives run all the way to the //s, so both sides are not the same. In the original regex the left alternative is "all non alphanumerics at the start of the string" and the right alternative is "all non alphanumerics at the end of the string".

Specific regex positive look(around|ahead|behind) in Javascript

I'm looking to match /(?=\W)(gimme)(?=\W)/gi or alike. The \W are supposed to be zero-width characters to surround my actual match.
Maybe some background. I want te replace certain words (always \w+) with some literal padding added, but only if it's not surrounded by a \w. (That does sound like a negative lookaround, but I hear JS doesn't do those!?)
(Btw: the above "gimme" is the word literal I want to replace. If that wasn't obvious.)
It has to be (?) a lookaround, because the \W have to be zero-width, because the intention is a .replace(...) and I cannot replace/copy the surrounding characters.
So this won't work:
text.replace(/(?=\W)(gimme)(?=\W)/gi, function(l, match, r) {
return l + doMagic(match) + r;
})
The zero-width chars have to be ignored, so the function can return (and replace) only doMagic(match).
I have only very limited lookaround experience and non of it in JS. Grazie.
PS. Or maybe I need a lookbehind and those aren't supported in JS..? I'm confused?
PS. A little bit of context: http://jsfiddle.net/rudiedirkx/kMs2N/show/ (ooh a link!)
you can use word boundary shortcut \b to assert that it's the whole word that you are matching.
The easiest way to achieve what you want to do is probably to match:
/(\s+gimme)(?=\W)/gi
and replace with [yourReplacement] - i.e. capture the whitespaces before 'gimme' and then include one in the replacement.
Another way to approach this would be capturing more characters before and after the gimme literal and then using the groups with backreference:
(\W+?)gimme(\W+?) - your match - note that this time the before and after characters are in the capturing groups 1 and 2
And you'd want to use \1[yourReplacement]\2 as replacement string - not sure how you use backreference in JS, but the idea is to tell the engine that with \1 you mean whatever was matched by the first captuing parenthesis. In some languages these are accessed with $1.
What you currently have will not work, for the following reason, (?=\W) means "the next character is not a word character", and the next thing you try to match is a literal g, so you have a contradiction ("next character is a g, but isn't a letter").
You do in fact need a lookbehind, but they are not supported by JavaScript.
Check out this article on Mimicking Lookbehind in JavaScript for a possible approach.
Have you considered using a lexer/parser combo?
This one is javascript based, and comes with a spiffy demonstration.

Can I write a regex expression where one symbol matches twice?

I'm matching words with regex in javascript. The following expression uses whitespace to separate the potential matches:
/(\W)(foo)(\W)/g
This works most of the time, but it fails when there are two matches separated by a single space. (e.g. "foo foo") I think this is because the space that separates them is the last \W of the first match and the first of the second.
Is there any way to modify this expression to work in this edge case?
You can use \b instead of \W. It matches a zero-width word boundary (a boundary between a \w and a \W or the start/end of the string, while \W matches a character which may not exist at the start or end of a string.
Javascript regexes have lookahead, so you can probably do something like this:
/(\W)(foo)(?=\W)/g
I don't think lookbehinds are available, but there are other techniques that have the same effect.
Of course, this is functionally different, in that the lookahead doesn't capture, so it depends on the nature of your problem. The main point here it not that it doesn't capture, but that it doesn't match; thereby avoiding your problem.
Give this a try, I think it will work for you:
/(\W)(foo| )(\W)/g
This will tell the regex to match foo or whitespace between the two \Ws.

Categories

Resources