Specific regex positive look(around|ahead|behind) in Javascript - javascript

I'm looking to match /(?=\W)(gimme)(?=\W)/gi or alike. The \W are supposed to be zero-width characters to surround my actual match.
Maybe some background. I want te replace certain words (always \w+) with some literal padding added, but only if it's not surrounded by a \w. (That does sound like a negative lookaround, but I hear JS doesn't do those!?)
(Btw: the above "gimme" is the word literal I want to replace. If that wasn't obvious.)
It has to be (?) a lookaround, because the \W have to be zero-width, because the intention is a .replace(...) and I cannot replace/copy the surrounding characters.
So this won't work:
text.replace(/(?=\W)(gimme)(?=\W)/gi, function(l, match, r) {
return l + doMagic(match) + r;
})
The zero-width chars have to be ignored, so the function can return (and replace) only doMagic(match).
I have only very limited lookaround experience and non of it in JS. Grazie.
PS. Or maybe I need a lookbehind and those aren't supported in JS..? I'm confused?
PS. A little bit of context: http://jsfiddle.net/rudiedirkx/kMs2N/show/ (ooh a link!)

you can use word boundary shortcut \b to assert that it's the whole word that you are matching.
The easiest way to achieve what you want to do is probably to match:
/(\s+gimme)(?=\W)/gi
and replace with [yourReplacement] - i.e. capture the whitespaces before 'gimme' and then include one in the replacement.
Another way to approach this would be capturing more characters before and after the gimme literal and then using the groups with backreference:
(\W+?)gimme(\W+?) - your match - note that this time the before and after characters are in the capturing groups 1 and 2
And you'd want to use \1[yourReplacement]\2 as replacement string - not sure how you use backreference in JS, but the idea is to tell the engine that with \1 you mean whatever was matched by the first captuing parenthesis. In some languages these are accessed with $1.

What you currently have will not work, for the following reason, (?=\W) means "the next character is not a word character", and the next thing you try to match is a literal g, so you have a contradiction ("next character is a g, but isn't a letter").
You do in fact need a lookbehind, but they are not supported by JavaScript.
Check out this article on Mimicking Lookbehind in JavaScript for a possible approach.

Have you considered using a lexer/parser combo?
This one is javascript based, and comes with a spiffy demonstration.

Related

Angular material input pattern add white space [duplicate]

How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");

JS Regexp: \b for hyphenated words

In JavaScript regexp, what can I use in place of \b to get the same effect but on words that may be hyphenated?
(This question is directed at readers familiar with \b and with hyphenation, and so does not provide examples.)
UPDATE
Addison's (?<!-)\b(?!-) here is a partial solution for PCRE. It falls short on -500, by losing the boundary that \b delivers. It doesn't work on lookbehind-less JavaScript.
You can't create your own version of \b in regex flavors like JavaScript that don't support lookbehind. \b matches at a position. It needs to check the character (or lack thereof) before and after that position in order to determine whether the position should be matched. This requires both lookahead and lookbehind.
You can match hyphenated words (ASCII only) with this regex:
\b[a-zA-Z\-]+\b
This regex will allow hyphens before and after the word but does not include those in the match.
I would consider using the \b expression, but modify it to be a little more fussy. Add a negative lookahead and lookbehind to it, so that it doesn't appear beside a hypen:
(?<!-)\b(?!-)
Try it on Regex 101
Note that this might cause problems with words such as -500, depending on what behaviour you want. You might want there to be a boundary before or after the hyphen (or not at all).
UPDATE
The regex gets much more complex, since there is not ordinarily a boundary before a hyphen, meaning one must be added.
(?<!-)\b(?!-)|\B(?=-\w)
The second condition adds a boundary wherever there is a non-word boundary followed by a hyphen and a word character. It's very explicit, but this is the only case it happens.

Regex format from PHP to Javascript

Can you please help me. How can I add this regex (?<=^|\s):d(?=$|\s) in javascript RegExp?
e.g
regex = new RegExp("?????" , 'g');
I want to replace the emoticon :d, but only if it is surrounded by spaces (or at an end of the string).
Firstly, as Some1.Kill.The.DJ mentioned, I recommend you use the literal syntax to create the regular expression:
var pattern = /yourPatternHere/g;
It's shorter, easier to read and you avoid complications with escape sequences.
The reason why the pattern does not work is that JavaScript does not support lookbehinds ((?<=...). So you have to find a workaround for that. You won't get around including that character in your pattern:
var pattern = /(?:^|\s):d(?!\S)/g;
Since there is no use in capturing anything in your pattern anyway (because :d is fixed) you are probably only interested in the position of the match. That means, when you find a match, you will have to check whether the first character is a space character (or is not :). If that is the case you have to increment the position by 1. If you know that your input string can never start with a space, you can simply increment any found position if it is not 0.
Note that I simplified your lookahead a bit. That is actually the beauty of lookarounds that you do not have to distinguish between end-of-string and a certain character type. Just use the negative lookahead, and assure that there is no non-space character ahead.
Just for future reference that means you could have simplified your initial pattern to:
(?<!\S):d(?!\S)
(If you were using a regex engine that supports lookbehinds.)
EDIT:
After your comment on the other answer, it's actually a lot easier to use the workaround. Just write back the captured space-character:
string = string.replace(/(^|\s):d(?!\S)/g, "$1emoticonCode");
Where $1 refers to what was matched with (^|\s). I.e. if the match was at the beginning of the string $1 will be empty, and if there was a space before :d, then $1 will contian that space character.
Javascript doesnt support lookbehind i.e(?<=)..
It supports lookahead
Better use
/(?:^|\s)(:d)(?=$|\s)/g
Group1 captures required match

Can I write a regex expression where one symbol matches twice?

I'm matching words with regex in javascript. The following expression uses whitespace to separate the potential matches:
/(\W)(foo)(\W)/g
This works most of the time, but it fails when there are two matches separated by a single space. (e.g. "foo foo") I think this is because the space that separates them is the last \W of the first match and the first of the second.
Is there any way to modify this expression to work in this edge case?
You can use \b instead of \W. It matches a zero-width word boundary (a boundary between a \w and a \W or the start/end of the string, while \W matches a character which may not exist at the start or end of a string.
Javascript regexes have lookahead, so you can probably do something like this:
/(\W)(foo)(?=\W)/g
I don't think lookbehinds are available, but there are other techniques that have the same effect.
Of course, this is functionally different, in that the lookahead doesn't capture, so it depends on the nature of your problem. The main point here it not that it doesn't capture, but that it doesn't match; thereby avoiding your problem.
Give this a try, I think it will work for you:
/(\W)(foo| )(\W)/g
This will tell the regex to match foo or whitespace between the two \Ws.

Replace Pipe and Comma with Regex in Javascript

I'm sitting here with "The Good Parts" in hand but I'm still none the wiser.
Can anyone knock up a regex for me that will allow me to replace any instances of "|" and "," from a string.
Also, could anyone point me in the direction of a really good resource for learning regular expressions, especially in javascript (are they a particular flavour??) It really is a weak point in my knowledge.
Cheers.
str.replace(/(\||,)/g, "replaceWith") don't forget the g at the end so it seaches the string globally, if you don't put it the regex will only replace the first instance of the characters.
What is saying is replace | (you need to escape this character) OR(|) ,
Nice Cheatsheet here
The best resource I have found if you really want to understand regular expressions (and the special caveats or quirks of any of a majority of the implementations/flavors) is Regular-Expressions.info.
If you really get into regular expressions, I would recommend the product called RegexBuddy for testing and debugging regular expressions in all sorts of languages (though there are a few things it does not quite support, it is rather good overall)
Edit:
The best way (I think, especially if you consider readability) is using a character class rather than alternation (i.e.: [] instead of |)
use:
var newString = str.replace(/[|,]/g, ";");
This will replace either a | or a , with a semicolon
The character class essentially means "match anything inside these square brackets" - with only a few exceptions.
First, you can specify ranges of characters ([a-zA-Z] means any letter from a to z or from A to Z).
Second, putting a caret (^) at the beginning of the character class negates it - it means anything not in this character class ([^0-9] means any character that is not from 0 to 9).
put the dash at the beginning and the caret at the end of the character class to match those characters literally, or escape them anywhere else in the class with a \ if you prefer

Categories

Resources