Regex format from PHP to Javascript - javascript

Can you please help me. How can I add this regex (?<=^|\s):d(?=$|\s) in javascript RegExp?
e.g
regex = new RegExp("?????" , 'g');
I want to replace the emoticon :d, but only if it is surrounded by spaces (or at an end of the string).

Firstly, as Some1.Kill.The.DJ mentioned, I recommend you use the literal syntax to create the regular expression:
var pattern = /yourPatternHere/g;
It's shorter, easier to read and you avoid complications with escape sequences.
The reason why the pattern does not work is that JavaScript does not support lookbehinds ((?<=...). So you have to find a workaround for that. You won't get around including that character in your pattern:
var pattern = /(?:^|\s):d(?!\S)/g;
Since there is no use in capturing anything in your pattern anyway (because :d is fixed) you are probably only interested in the position of the match. That means, when you find a match, you will have to check whether the first character is a space character (or is not :). If that is the case you have to increment the position by 1. If you know that your input string can never start with a space, you can simply increment any found position if it is not 0.
Note that I simplified your lookahead a bit. That is actually the beauty of lookarounds that you do not have to distinguish between end-of-string and a certain character type. Just use the negative lookahead, and assure that there is no non-space character ahead.
Just for future reference that means you could have simplified your initial pattern to:
(?<!\S):d(?!\S)
(If you were using a regex engine that supports lookbehinds.)
EDIT:
After your comment on the other answer, it's actually a lot easier to use the workaround. Just write back the captured space-character:
string = string.replace(/(^|\s):d(?!\S)/g, "$1emoticonCode");
Where $1 refers to what was matched with (^|\s). I.e. if the match was at the beginning of the string $1 will be empty, and if there was a space before :d, then $1 will contian that space character.

Javascript doesnt support lookbehind i.e(?<=)..
It supports lookahead
Better use
/(?:^|\s)(:d)(?=$|\s)/g
Group1 captures required match

Related

How to replace my current regular expression without using negative lookbehind

I have the following regular expression which matches on all double quotes besides those that are escaped:
i.e:
The regular expression is as follows:
((?<![\\])")
How could I alter this to no longer use the negative lookbehind as it is not supported on some browsers?
Any help is greatly appreciated, thanks!
I wasn't able to get anything currently working
You can match
/\\"|(")/
and keep only captured matches. Being so simple, it should work with most every regex engine.
Demo
This matches what you don't want (\\")--to be discarded--and captures what you do want (")--to be kept.
This technique has been referred to by one regex expert as The Greatest Regex Trick Ever. To get to the punch line at the link search for "(at last!)".
Neither of these may be a completely satisfactory solution.
This regex won't just match unescaped ", there's additional logic required to check if the 1st character of captured groups is " and adjust the match position.:
(?:^|[^\\])(")
This may be a better choice, but it depends on positive lookahead - which may have the same issue as negative lookbehind.
Version 1a (again requires additional logic)
(?:^|\b)(?=[^\\])(")
Version 2a (depends on positive lookahead)
(?:^|\b|\\\\)(?=[^\\])(")
Assuming you need to also handle escaped slashes followed by escaped quotes (not in the question, but ok):
Version 1a (requires the additional logic):
(?:^|[^\\]|\\\\)(")
Building on this answer, I'd like to add that you may also want to ignore escaped backslashes, and match the closing quote in this string:
"ab\\"
In that case, /\\[\\"]|(")/g is what you're after.

Trying to exclude match when surrounded on both sides by a certain string

What I'm looking to do is to modify a regex (JS flavor) to not match if the pattern is both preceded and followed by the same string.
By way of a simple analogy, say I want to match all instances of n that are not both preceded and followed by e. So, for example, the regex should not match the n in alkene, but it should still match the n in pen or nest, which only have the e directly adjacent to n on one side, not both.
Most older threads I've seen trying to find an answer basically say "just use negative lookarounds", but the problem is that (?<!e)n(?!e) doesn't match any of those inputs - because the lookbehind and lookahead are processed by the regex engine separately, so it considers either condition to be sufficient to exclude the match.
(The real regex is (?<!¸ª)()(ɣʷ|h₂|r₂|r₃|w|j)(?:e|o|ø|ɑ|i|ɚ|y|u|a)(?!¸ª) and it's failing to match the ɣʷ in t͡ʃe:h₁dɣʷo¸ªh₂¸ª, but that makes the problem look a lot harder to explain than it needs to be)
How do you modify a regex to only exclude patterns when they're nested?
The (?<!b)a(?!b) pattern here must be replaced with (?<!b(?=ab))a or a(?!(?<=ba)b). The point is to call a reverse lookahead or lookbehind from lookbehind or lookahead.
See your pattern fix (without any optimizations) where I took the lookahead, pasted it inside lookbehind after ª, reversed the lookahead (i.e. made it positive) and added the whole pattern before ¸ª in the lookahead to be able to get to the right-hand ¸ª:
(?<!¸ª(?!(ɣʷ|h₂|r₂|r₃|w|j)(?:e|o|ø|ɑ|i|ɚ|y|u|a)¸ª))()(ɣʷ|h₂|r₂|r₃|w|j)(?:e|o|ø|ɑ|i|ɚ|y|u|a)
Or, if you put the lookbehind into lookahead:
()(ɣʷ|h₂|r₂|r₃|w|j)(?:e|o|ø|ɑ|i|ɚ|y|u|a)(?!(?<=¸ª(ɣʷ|h₂|r₂|r₃|w|j)(?:e|o|ø|ɑ|i|ɚ|y|u|a))¸ª)
See the regex demo (and regex demo #2).
Whenever your pattern is simple, it is best not to repeat the pattern in the lookarounds, you may usually just use . or .{x} where x stands for the number of chars your consuming pattern part can match. Here, it is not clear how many chars the pattern can actually match, you may probably use (?<!¸ª(?!.{1,2}¸ª))()(ɣʷ|h₂|r₂|r₃|w|j)(?:e|o|ø|ɑ|i|ɚ|y|u|a), but I do not have any edge cases to test against.
Enhancing this further may yield (?<!¸ª(?!.{1,2}¸ª))()(ɣʷ|[hr]₂|r₃|w|j)([eoøɑiɚyua]) (demo).

Javascript Regex: how to simulate "match without capture" behavior of positive lookbehind?

I have a relatively simple regex problem - I need to match specific words in a string, if they are entire words or a prefix. With word boundaries, it would look something like this:
\b(word1|word2|prefix1|prefix2)
However, I can't use the word boundary condition because some words may start with odd characters, e.g. .999
My solution was to look for whitespace or starting token for these odd cases.
(\b|^|\s)(word1|word2|prefix1|prefix2)
Now words like .999 will still get matched correctly, BUT it also captures the whitespace preceding the matched words/prefixes. For my purposes, I can't have it capture the whitespace.
Positive lookbehinds seem to solve this, but javascript doesn't support them. Is there some other way I can get the same behavior to solve this problem?
You can use a non-capturing group using (?:):
/(?:\b|^|\s)(word1|word2|prefix1|prefix2)/
UPDATE:
Based on what you want to replace it with (and #AlanMoore's good point about the \b), you probably want to go with this:
var regex = /(^|\s)(word1|word2|prefix1|prefix2)/g;
myString.replace(regex,"$1<span>$2</span>");
Note that I changed the first group back to a capturing one since it'll be part of the match but you want to keep it in the replacement string (right?). Also added the g modifier so that this happens for all occurrences in the string (assuming thats what you wanted).
Let's get the terminology straight first. A regex normally consumes everything it matches. When you do a replace(), everything that was consumed is overwritten. You can also capture parts of the matched text separately and plug them back in using $1, $2, etc.
When you were using the word boundary you didn't have to worry about this, because \b doesn't consume anything. But now you're consuming the leading whitespace character if there is one, so you have to plug it back in. I don't know what you're replacing the match with, so I'll just replace them with nothing for this demonstration.
result = subject.replace(/(^|\s)(word1|word2|prefix1|prefix2)/g, "$1");
Note that the \b isn't needed any more. In fact, you must remove it, or it will match things like .999 in xyz.999, because \b matches between z and .. I'm pretty sure you don't want that.

Solving regular expression recursive strings

The Problem
I could match this string
(xx)
using this regex
\([^()]*\)
But it wouldn't match
(x(xx)x)
So, this regex would
\([^()]*\([^()]*\)[^()]*\)
However, this would fail to match
(x(x(xx)x)x)
But again, this new regex would
[^()]*\([^()]*\([^()]*\)[^()]*\)[^()]*
This is where you can notice the replication, the entire regex pattern of the second regex after the first \( and before the last \) is copied and replaces the center most [^()]*. Of course, this last regex wouldn't match
(x(x(x(xx)x)x)x)
But, you could always copy replace the center most [^()]* with [^()]*\([^()]*\)[^()]* like we did for the last regex and it'll capture more (xx) groups. The more you add to the regex the more it can handle, but it will always be limited to how much you add.
So, how do you get around this limitation and capture a group of parenthesis (or any two characters for that matter) that can contain extra groups within it?
Falsely Assumed Solutions
I know you might think to just use
\(.*\)
But this will match all of
(xx)xx)
when it should only match the sub-string (xx).
Even this
\([^)]*\)
will not match pairs of parentheses that have pairs nested like
(xx(xx)xx)
From this, it'll only match up to (xx(xx).
Is it possible?
So is it possible to write a regex that can match groups of parentheses? Or is this something that must be handled by a routine?
Edit
The solution must work in the JavaScript implementation of Regular Expressions
If you want to match only if the round brackets are balanced you cannot do it by regex itself..
a better way would be to
1>match the string using \(.*\)
2>count the number of (,) and check if they are equal..if they are then you have the match
3>if they are not equal use \([^()]*\) to match the required string
Formally speaking, this isn't possible using regular expressions! Regular expressions define regular languages, and regular languages can't have balanced parenthesis.
However, it turns out that this is the sort of thing people need to do all the time, so lots of Regex engines have been extended to include more than formal regular expressions. Therefore, you can do balanced brackets with regular expressions in javascript. This article might help get you started: http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx . It's for .net, but the same applies for the standard javascript regex engine.
Personally though, I think it's best to solve a complex problem like this with your own function rather than leveraging the extended features of a Regex engine.

Specific regex positive look(around|ahead|behind) in Javascript

I'm looking to match /(?=\W)(gimme)(?=\W)/gi or alike. The \W are supposed to be zero-width characters to surround my actual match.
Maybe some background. I want te replace certain words (always \w+) with some literal padding added, but only if it's not surrounded by a \w. (That does sound like a negative lookaround, but I hear JS doesn't do those!?)
(Btw: the above "gimme" is the word literal I want to replace. If that wasn't obvious.)
It has to be (?) a lookaround, because the \W have to be zero-width, because the intention is a .replace(...) and I cannot replace/copy the surrounding characters.
So this won't work:
text.replace(/(?=\W)(gimme)(?=\W)/gi, function(l, match, r) {
return l + doMagic(match) + r;
})
The zero-width chars have to be ignored, so the function can return (and replace) only doMagic(match).
I have only very limited lookaround experience and non of it in JS. Grazie.
PS. Or maybe I need a lookbehind and those aren't supported in JS..? I'm confused?
PS. A little bit of context: http://jsfiddle.net/rudiedirkx/kMs2N/show/ (ooh a link!)
you can use word boundary shortcut \b to assert that it's the whole word that you are matching.
The easiest way to achieve what you want to do is probably to match:
/(\s+gimme)(?=\W)/gi
and replace with [yourReplacement] - i.e. capture the whitespaces before 'gimme' and then include one in the replacement.
Another way to approach this would be capturing more characters before and after the gimme literal and then using the groups with backreference:
(\W+?)gimme(\W+?) - your match - note that this time the before and after characters are in the capturing groups 1 and 2
And you'd want to use \1[yourReplacement]\2 as replacement string - not sure how you use backreference in JS, but the idea is to tell the engine that with \1 you mean whatever was matched by the first captuing parenthesis. In some languages these are accessed with $1.
What you currently have will not work, for the following reason, (?=\W) means "the next character is not a word character", and the next thing you try to match is a literal g, so you have a contradiction ("next character is a g, but isn't a letter").
You do in fact need a lookbehind, but they are not supported by JavaScript.
Check out this article on Mimicking Lookbehind in JavaScript for a possible approach.
Have you considered using a lexer/parser combo?
This one is javascript based, and comes with a spiffy demonstration.

Categories

Resources