Alternatives to (?<=exp) in Javascript? - javascript

I read some tutorials about regex and I saw a sentence:
(?<=exp): Match any position following a prefix exp
For example, I have some strings:
Share
Care
If I want to find all string include "are", but "are" must follow "Sh": /(?<=Sh)are/i. Now only "Share" is matched, and matched index is 2 (match "are", not "Share" from "Share").
But Javascript don't have this regex. How can I do like that in Javascript?
Thanks!

You can't do it. There are no lookbehind assertions in Javascript's implementation of regular expressions.
Alternatives
In some situations you can instead use a grouping to capture what you actually wanted to match: /Sh(are)/i
If you really need lookbehinds you could use a third-party regular expression library.
Related
JavaScript: Is there a regular expression library that fully supports lookarounds?

The only way (and of course this only works if you don't also have a lookahead assertion in your regex) is to reverse the string and use a lookahead instead of lookbehind:
/era(?=hS)/i

If I well understood I would use this regexp
/(Sh|\b)(are)/gi
where are can be only a single word or a substring preceded by Sh.

You can use non capturing groups
/(?:sh)(are)/
this tells the regex to find are without capturing the sh group. However in this context, as you have a simple pattern to match, this is not necessary and you can find the answer in other solutions and do something like
/sh(are)/
matching then only the first group

Related

javascript regexp for word boundary detection with parenthesis

I have a string "I am a robot, I have been named 456/m(4). Forget the name (it does not mean anything)"
Now I would like to extract all words from this string
for this I use the regular expression:
/\b[\w\S]+\b/g
it returns me all the words in the string except that there is a word "456/(4" instead of "456/(4)". I understand that it is due to the fact that it is a word boundary, but is there a way I could say that it is not a legal word boundary since there was no "legal" starting parenthesis?
I made it even better now. It does exactly what you want.
\b(?>\([\w\/]+\)|[\w\/])+
Regex101
If you want a version that's javascript friendly:
((?:(?=(\([\w\/]+\)|[\w\/]))\2)+)
Just use capture group #1 here.
Regex101

How to replace a substring with open parentheses (

I am a Regex newbie and trying to implement Regex to replace a matching pattern in a string only when it has a ( - open parentheses using Javascript. for example if I have a string
IN(INTERM_LEVEL_IN + (int)X_ID)
I would only like to highlight the first IN( in the string. Not the INTERM_LEVEL_IN (2 ins here) and the int.
What is the Regex to accomplish this?
To match the opening bracket you just need to escape it: IN\(.
For instance, running this in Firebug console:
enter code here"IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN()/, 'test');`
Will result in:
>>> "IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN\()/, 'test');
"testINTERM_LEVEL_IN + (int)X_ID)"
Parenthesis in regular expressions have a special meaning (sub-capture groups), so when you want them to be interpreted literally you have to escape them by with a \ before them. The regular expression IN\( would match the string IN(.
The following should only match IN( at the beginning of a line:
/^IN\(/
The following would match IN( that is not preceded by any alphanumeric character or underscore:
/[a-zA-Z0-9_]IN\(/
And finally, the following would match any instance of IN( no matter what precedes it:
/IN\(/
So, take your pick. If you're interested in learning more about regex, here's a good tutorial: http://www.regular-expressions.info/tutorial.html
You can use just regular old Javascript for regex, a simple IN\( would work for the example you gave (see here), but I suspect your situation is more complicated than that. In which case, you need to define exactly what you are trying to match and what you don't want to match.

Solving regular expression recursive strings

The Problem
I could match this string
(xx)
using this regex
\([^()]*\)
But it wouldn't match
(x(xx)x)
So, this regex would
\([^()]*\([^()]*\)[^()]*\)
However, this would fail to match
(x(x(xx)x)x)
But again, this new regex would
[^()]*\([^()]*\([^()]*\)[^()]*\)[^()]*
This is where you can notice the replication, the entire regex pattern of the second regex after the first \( and before the last \) is copied and replaces the center most [^()]*. Of course, this last regex wouldn't match
(x(x(x(xx)x)x)x)
But, you could always copy replace the center most [^()]* with [^()]*\([^()]*\)[^()]* like we did for the last regex and it'll capture more (xx) groups. The more you add to the regex the more it can handle, but it will always be limited to how much you add.
So, how do you get around this limitation and capture a group of parenthesis (or any two characters for that matter) that can contain extra groups within it?
Falsely Assumed Solutions
I know you might think to just use
\(.*\)
But this will match all of
(xx)xx)
when it should only match the sub-string (xx).
Even this
\([^)]*\)
will not match pairs of parentheses that have pairs nested like
(xx(xx)xx)
From this, it'll only match up to (xx(xx).
Is it possible?
So is it possible to write a regex that can match groups of parentheses? Or is this something that must be handled by a routine?
Edit
The solution must work in the JavaScript implementation of Regular Expressions
If you want to match only if the round brackets are balanced you cannot do it by regex itself..
a better way would be to
1>match the string using \(.*\)
2>count the number of (,) and check if they are equal..if they are then you have the match
3>if they are not equal use \([^()]*\) to match the required string
Formally speaking, this isn't possible using regular expressions! Regular expressions define regular languages, and regular languages can't have balanced parenthesis.
However, it turns out that this is the sort of thing people need to do all the time, so lots of Regex engines have been extended to include more than formal regular expressions. Therefore, you can do balanced brackets with regular expressions in javascript. This article might help get you started: http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx . It's for .net, but the same applies for the standard javascript regex engine.
Personally though, I think it's best to solve a complex problem like this with your own function rather than leveraging the extended features of a Regex engine.

Regular expression: Get the matched value of each match in a Kleene-star expression?

In particular, is this possible with Javascript?
>> "Version 1.2.3.4".match(/\S+ (\d+)(\.\d+)*/)
["Version 1.2.3.4", "1", ".4"]
It's obvious $2 gets set to the last Kleene-"match". Is there no built-in method to retrieve the rest (".2", ".3")?
If this cannot be done easily in JS, could Perl do it?
UPDATE: Many of the answers so far have been "workarounds" which work because of the simplicity of my example. If the part that repeated that I wanted to match was more than just a number, they wouldn't work.
However, a very valid solution does exist: use /expr/g global regex matching: just filter out the parts that repeat and use that. I find this to be somewhat less flexible than the more generally applicable * operator but it will obviously get the job done in most cases.
Regex in JavaScript, like most other regex flavors, only captures the last value of the capturing group if it is matched repeatedly. The only well known regex lib (that I know of) where you get access to all of the previous matched captures is the one in .NET.
So no, you can't do this in JS.
In Perl there are a couple of ways you can accomplish such things. One of the more elegant is probably to use \G (which works in PCRE too).
For example:
"Version 1.2.3.4" =~ /(?:\S+ |\G(?!^)\.)(\d+)/g
Returns (in list context):
(1, 2, 3, 4)
Why not match the whole version string, then split by .?
>> "Version 1.2.3.4".match(/\S+ (\d+(?:\.\d+)*)/)[1].split('.')
Just capture the whole version number string and then split on the period character.
Regex for matching the whole number: /((?:\d+)(?:\.\d+)*)/
Then simply call split on the resulting capture.
Regex \.?\d+ will return you what you need, but you have to run this regex for all matches, not just one...
var n=str.match(/\.?\d+/g);
If you want to match just numbers without leading dot, then go with regex \d+.
var n=str.match(/\d+/g);

Match altered version of first match with only one expression?

I'm writing a brush for Alex Gorbatchev's Syntax Highlighter to get highlighting for Smalltalk code. Now, consider the following Smalltalk code:
aCollection do: [ :each | each shout ]
I want to find the block argument ":each" and then match "each" every time it occurrs afterwards (for simplicity, let's say every occurrence an not just inside the brackets).
Note that the argument can have any name, e.g. ":myArg".
My attempt to match ":each":
\:([\d\w]+)
This seems to work. The problem is for me to match the occurrences of "each". I thought something like this could work:
\:([\d\w]+)|\1
But the right hand side of the alternation seems to be treated as an independent expression, so backreferencing doesn't work.
Is it even possible to accomplish what I want in a single expression? Or would I have to use the backreference within a second expression (via another function call)?
You could do it in languages that support variable-length lookbehind (AFAIK only the .NET framework languages do, Perl 6 might). There you could highlight a word if it matches (?<=:(\w+)\b.*)\1. But JavaScript doesn't support lookbehind at all.
But anyway this regex would be very inefficient (I just checked a simple example in RegexBuddy, and the regex engine needs over 60 steps for nearly every character in the document to decide between match and non-match), so this is not a good idea if you want to use it for code highlighting.
I'd recommend you use the two-step approach you mentioned: First match :(\w+)\b (word boundary inserted for safety, \d is implied in \w), then do a literal search for match result \1.
I believe the only thing stored by the Regex engine between matches is the position of the last match. Therefore, when looking for the next match, you cannot use a backreference to the match before.
So, no, I do not think that this is possible.

Categories

Resources