I'm matching words with regex in javascript. The following expression uses whitespace to separate the potential matches:
/(\W)(foo)(\W)/g
This works most of the time, but it fails when there are two matches separated by a single space. (e.g. "foo foo") I think this is because the space that separates them is the last \W of the first match and the first of the second.
Is there any way to modify this expression to work in this edge case?
You can use \b instead of \W. It matches a zero-width word boundary (a boundary between a \w and a \W or the start/end of the string, while \W matches a character which may not exist at the start or end of a string.
Javascript regexes have lookahead, so you can probably do something like this:
/(\W)(foo)(?=\W)/g
I don't think lookbehinds are available, but there are other techniques that have the same effect.
Of course, this is functionally different, in that the lookahead doesn't capture, so it depends on the nature of your problem. The main point here it not that it doesn't capture, but that it doesn't match; thereby avoiding your problem.
Give this a try, I think it will work for you:
/(\W)(foo| )(\W)/g
This will tell the regex to match foo or whitespace between the two \Ws.
Related
In JavaScript regexp, what can I use in place of \b to get the same effect but on words that may be hyphenated?
(This question is directed at readers familiar with \b and with hyphenation, and so does not provide examples.)
UPDATE
Addison's (?<!-)\b(?!-) here is a partial solution for PCRE. It falls short on -500, by losing the boundary that \b delivers. It doesn't work on lookbehind-less JavaScript.
You can't create your own version of \b in regex flavors like JavaScript that don't support lookbehind. \b matches at a position. It needs to check the character (or lack thereof) before and after that position in order to determine whether the position should be matched. This requires both lookahead and lookbehind.
You can match hyphenated words (ASCII only) with this regex:
\b[a-zA-Z\-]+\b
This regex will allow hyphens before and after the word but does not include those in the match.
I would consider using the \b expression, but modify it to be a little more fussy. Add a negative lookahead and lookbehind to it, so that it doesn't appear beside a hypen:
(?<!-)\b(?!-)
Try it on Regex 101
Note that this might cause problems with words such as -500, depending on what behaviour you want. You might want there to be a boundary before or after the hyphen (or not at all).
UPDATE
The regex gets much more complex, since there is not ordinarily a boundary before a hyphen, meaning one must be added.
(?<!-)\b(?!-)|\B(?=-\w)
The second condition adds a boundary wherever there is a non-word boundary followed by a hyphen and a word character. It's very explicit, but this is the only case it happens.
I cannot figure out, for the life of me, why this regular expression
^\.(?=a)$
does not match
".a"
anyone know why?
I am going off the information provided here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
The reason it doesn't work is because the lookahead doesn't actually consume any characters, so your matching position doesn't advance.
^\.(?=a)$
Matches the beginning of line (^ -- this matches) followed by a literal . (\. -- this also matches), and then (without consuming any characters), checks to see if the next character is a literal a ((?=a)). It is, so the lookahead matches. It then asserts that your position is at the end of the string ($). This is not the case, because we're still right after the ., so the match fails.
Another possible matching expression would be
^\.(?=a$)
Which works just as above, but the assertion about the end of the line is contained in the lookahead, so this time, it matches.
Your regex is only going to match a period that's followed by an 'a', without including 'a' in the match.
Another issue is that you're using $ after a character that's basically being ignored.
Remove the $ and it will work as described.
Bonus: I've enjoyed using this lately http://www.regexpal.com/
I am trying to write a regex that works like below if I have something like
Hi $$var1$$ $$var2$$ how are you?
it should return me two matches
$$var1$$ -1st match
$$var2$$ -2nd match
Currently i have made a regex /\$\$\w+\d*\W*\$\$/gi this works fine if the two patterns are not next to each other for eg:
Hi $$var1$$ and $$var2$$ how are you?
it detects two matches, but if these two matches are next to each other like in the first example seperated by space it detects $$var1$$ $$. Can someone help me fix this regex?
I don't know all your restrictions, but it might be easier and faster with a negated class:
\$\$[^$]+\$\$
If single dollar signs are allowed between the double ones, then you might use something like this to keep up the same speed:
\$\$(?:[^$]+|\$(?!\$))+\$\$
(?:[^$]+|\$(?!\$))+ matches either a non $ or one $ that is not followed by a second $ (though at that point, \$\$.+?\$\$ would actually be simpler should that be the case).
The reason why your regex was not behaving as you expected was that \W* can potentially match any dollar signs. So if there was anything that does not match \w in between the last dollar signs of pattern you want to match, your regex would continue matching until the next double dollar signs.
\$\$\w+\d*?\W*?\$\$
^^
Make your \W non greedy as it will eat up $$ $$ but not $$ and $$ as it cannot consume and.See demo.
https://regex101.com/r/tP7qE7/5
Can you please help me. How can I add this regex (?<=^|\s):d(?=$|\s) in javascript RegExp?
e.g
regex = new RegExp("?????" , 'g');
I want to replace the emoticon :d, but only if it is surrounded by spaces (or at an end of the string).
Firstly, as Some1.Kill.The.DJ mentioned, I recommend you use the literal syntax to create the regular expression:
var pattern = /yourPatternHere/g;
It's shorter, easier to read and you avoid complications with escape sequences.
The reason why the pattern does not work is that JavaScript does not support lookbehinds ((?<=...). So you have to find a workaround for that. You won't get around including that character in your pattern:
var pattern = /(?:^|\s):d(?!\S)/g;
Since there is no use in capturing anything in your pattern anyway (because :d is fixed) you are probably only interested in the position of the match. That means, when you find a match, you will have to check whether the first character is a space character (or is not :). If that is the case you have to increment the position by 1. If you know that your input string can never start with a space, you can simply increment any found position if it is not 0.
Note that I simplified your lookahead a bit. That is actually the beauty of lookarounds that you do not have to distinguish between end-of-string and a certain character type. Just use the negative lookahead, and assure that there is no non-space character ahead.
Just for future reference that means you could have simplified your initial pattern to:
(?<!\S):d(?!\S)
(If you were using a regex engine that supports lookbehinds.)
EDIT:
After your comment on the other answer, it's actually a lot easier to use the workaround. Just write back the captured space-character:
string = string.replace(/(^|\s):d(?!\S)/g, "$1emoticonCode");
Where $1 refers to what was matched with (^|\s). I.e. if the match was at the beginning of the string $1 will be empty, and if there was a space before :d, then $1 will contian that space character.
Javascript doesnt support lookbehind i.e(?<=)..
It supports lookahead
Better use
/(?:^|\s)(:d)(?=$|\s)/g
Group1 captures required match
I'm looking to match /(?=\W)(gimme)(?=\W)/gi or alike. The \W are supposed to be zero-width characters to surround my actual match.
Maybe some background. I want te replace certain words (always \w+) with some literal padding added, but only if it's not surrounded by a \w. (That does sound like a negative lookaround, but I hear JS doesn't do those!?)
(Btw: the above "gimme" is the word literal I want to replace. If that wasn't obvious.)
It has to be (?) a lookaround, because the \W have to be zero-width, because the intention is a .replace(...) and I cannot replace/copy the surrounding characters.
So this won't work:
text.replace(/(?=\W)(gimme)(?=\W)/gi, function(l, match, r) {
return l + doMagic(match) + r;
})
The zero-width chars have to be ignored, so the function can return (and replace) only doMagic(match).
I have only very limited lookaround experience and non of it in JS. Grazie.
PS. Or maybe I need a lookbehind and those aren't supported in JS..? I'm confused?
PS. A little bit of context: http://jsfiddle.net/rudiedirkx/kMs2N/show/ (ooh a link!)
you can use word boundary shortcut \b to assert that it's the whole word that you are matching.
The easiest way to achieve what you want to do is probably to match:
/(\s+gimme)(?=\W)/gi
and replace with [yourReplacement] - i.e. capture the whitespaces before 'gimme' and then include one in the replacement.
Another way to approach this would be capturing more characters before and after the gimme literal and then using the groups with backreference:
(\W+?)gimme(\W+?) - your match - note that this time the before and after characters are in the capturing groups 1 and 2
And you'd want to use \1[yourReplacement]\2 as replacement string - not sure how you use backreference in JS, but the idea is to tell the engine that with \1 you mean whatever was matched by the first captuing parenthesis. In some languages these are accessed with $1.
What you currently have will not work, for the following reason, (?=\W) means "the next character is not a word character", and the next thing you try to match is a literal g, so you have a contradiction ("next character is a g, but isn't a letter").
You do in fact need a lookbehind, but they are not supported by JavaScript.
Check out this article on Mimicking Lookbehind in JavaScript for a possible approach.
Have you considered using a lexer/parser combo?
This one is javascript based, and comes with a spiffy demonstration.