JavaScript regex - positive lookahead -- giving me syntax errors - javascript

This piece of regex (?<=href\=")[^]+?(?=#_) is supposed to match everything in a href value except the the hash value and what follows it within the href url.
It appears to work fine under Regex debuggers/testers such as http://gskinner.com/RegExr/
but in javascript it appears to produce syntax error. If i remove the < from the (?<=) it works however, that's not the positive lookahead I am looking for.
I am pulling my hair off, as usual, thanks to Regex lol
Please help

(?<=...) and (?<!...) are lookbehinds, not lookaheads. Lookbehinds are not supported by Javascript's regular expression engine.
Resources:
http://www.regular-expressions.info/lookaround.html#lookbehind
http://www.regular-expressions.info/javascript.html

Lookbehinds are not support as already mentioned.
I don't know which function you use for your regex, but if you use match(), just add a capture group:
href="(.+?)(?=#)
Which gives you, e.g.:
var str = '';
var matches = str.match(/href="(.+?)(?=#)/);
// matches[0] = href="foo
// matches[1] = foo // <-- this is what you want
Additional information:
[^] means, match all characters that are not in this character class. But there are no characters in the class. So it matches any character which is exactly what the dot . is doing.

The reason your regex works in RegExr is because RegExr is a Flash application written ActionScript, not JavaScript. Although AS advertises itself as being compliant with the EcmaScript standard the same as JS, it's actually much more powerful. Its regex engine is powered by the PCRE library, so it has the same capabilities as PHP.
To get a more accurate picture of what JavaScript regexes can do, use a tester that's actually powered by JavaScript, like this one.

Related

How to replace my current regular expression without using negative lookbehind

I have the following regular expression which matches on all double quotes besides those that are escaped:
i.e:
The regular expression is as follows:
((?<![\\])")
How could I alter this to no longer use the negative lookbehind as it is not supported on some browsers?
Any help is greatly appreciated, thanks!
I wasn't able to get anything currently working
You can match
/\\"|(")/
and keep only captured matches. Being so simple, it should work with most every regex engine.
Demo
This matches what you don't want (\\")--to be discarded--and captures what you do want (")--to be kept.
This technique has been referred to by one regex expert as The Greatest Regex Trick Ever. To get to the punch line at the link search for "(at last!)".
Neither of these may be a completely satisfactory solution.
This regex won't just match unescaped ", there's additional logic required to check if the 1st character of captured groups is " and adjust the match position.:
(?:^|[^\\])(")
This may be a better choice, but it depends on positive lookahead - which may have the same issue as negative lookbehind.
Version 1a (again requires additional logic)
(?:^|\b)(?=[^\\])(")
Version 2a (depends on positive lookahead)
(?:^|\b|\\\\)(?=[^\\])(")
Assuming you need to also handle escaped slashes followed by escaped quotes (not in the question, but ok):
Version 1a (requires the additional logic):
(?:^|[^\\]|\\\\)(")
Building on this answer, I'd like to add that you may also want to ignore escaped backslashes, and match the closing quote in this string:
"ab\\"
In that case, /\\[\\"]|(")/g is what you're after.

Looking for alternative to javascript lookbehind for phone number regex pattern

I have a regex pattern to check for input phone number. Regex pattern is:
(#"((?:\(?[2-9](?(?=1)1[02-9]|(?(?=0)0[1-9]|\d{2}))\)?\D{0,3})(?:\(?[2-9](?(?=1)1[02-9]|\d{2})\)?\D{0,3})\d{4})"
This works fine for Server side validation and fails for client-side. I get the Invalid group error.
I am fairly new to regex and by digging around I found out that it is because JS doesn't support lookbehind.
I tried to apply the - inversing the string technique but the pattern is too complicated.
Could someone please help.
Thanks in advance.
All your conditional constructs need to be replaced with a non-capturing group that contains a negative lookahead at the start. In general, it looks like
(?(?=0)01|\d{2}) = (?:(?=0)01|(?!0)\d{2})
That is, you convert a conditional group into a non-capturing group, and add restrictions to each alternative in the group. (?:(?=0)01|(?!0)\d{2}) matches 01 if the next char is 0, else, if the next char is not 0, match any two digits (but not if they start with 0 of course).
So, in your concrete case, change
(?(?=1)1[02-9]|(?(?=0)0[1-9]|\d{2})) -> (?:(?=1)1[02-9]|(?:(?=0)0[1-9]|(?!0)\d{2}))
(?(?=1)1[02-9]|\d{2}) -> (?:(?=1)1[02-9]|(?!1)\d{2})
The exact JavaScript equivalent for the PCRE pattern is
((?:\(?[2-9](?:(?=1)1[02-9]|(?:(?=0)0[1-9]|(?!0)\d{2}))\)?\D{0,3})(?:\(?[2-9](?:(?=1)1[02-9]|(?!1)\d{2})\)?\D{0,3})\d{4})
See the regex demo.
However, some of the groupings are redundant, so you may shorten it to
\(?[2-9](?:(?=1)1[02-9]|(?:(?=0)0[1-9]|(?!0)\d{2}))\)?\D{0,3}\(?[2-9](?:(?=1)1[02-9]|(?!1)\d{2})\)?\D{0,3}\d{4}

Regex format from PHP to Javascript

Can you please help me. How can I add this regex (?<=^|\s):d(?=$|\s) in javascript RegExp?
e.g
regex = new RegExp("?????" , 'g');
I want to replace the emoticon :d, but only if it is surrounded by spaces (or at an end of the string).
Firstly, as Some1.Kill.The.DJ mentioned, I recommend you use the literal syntax to create the regular expression:
var pattern = /yourPatternHere/g;
It's shorter, easier to read and you avoid complications with escape sequences.
The reason why the pattern does not work is that JavaScript does not support lookbehinds ((?<=...). So you have to find a workaround for that. You won't get around including that character in your pattern:
var pattern = /(?:^|\s):d(?!\S)/g;
Since there is no use in capturing anything in your pattern anyway (because :d is fixed) you are probably only interested in the position of the match. That means, when you find a match, you will have to check whether the first character is a space character (or is not :). If that is the case you have to increment the position by 1. If you know that your input string can never start with a space, you can simply increment any found position if it is not 0.
Note that I simplified your lookahead a bit. That is actually the beauty of lookarounds that you do not have to distinguish between end-of-string and a certain character type. Just use the negative lookahead, and assure that there is no non-space character ahead.
Just for future reference that means you could have simplified your initial pattern to:
(?<!\S):d(?!\S)
(If you were using a regex engine that supports lookbehinds.)
EDIT:
After your comment on the other answer, it's actually a lot easier to use the workaround. Just write back the captured space-character:
string = string.replace(/(^|\s):d(?!\S)/g, "$1emoticonCode");
Where $1 refers to what was matched with (^|\s). I.e. if the match was at the beginning of the string $1 will be empty, and if there was a space before :d, then $1 will contian that space character.
Javascript doesnt support lookbehind i.e(?<=)..
It supports lookahead
Better use
/(?:^|\s)(:d)(?=$|\s)/g
Group1 captures required match

Alternatives to (?<=exp) in Javascript?

I read some tutorials about regex and I saw a sentence:
(?<=exp): Match any position following a prefix exp
For example, I have some strings:
Share
Care
If I want to find all string include "are", but "are" must follow "Sh": /(?<=Sh)are/i. Now only "Share" is matched, and matched index is 2 (match "are", not "Share" from "Share").
But Javascript don't have this regex. How can I do like that in Javascript?
Thanks!
You can't do it. There are no lookbehind assertions in Javascript's implementation of regular expressions.
Alternatives
In some situations you can instead use a grouping to capture what you actually wanted to match: /Sh(are)/i
If you really need lookbehinds you could use a third-party regular expression library.
Related
JavaScript: Is there a regular expression library that fully supports lookarounds?
The only way (and of course this only works if you don't also have a lookahead assertion in your regex) is to reverse the string and use a lookahead instead of lookbehind:
/era(?=hS)/i
If I well understood I would use this regexp
/(Sh|\b)(are)/gi
where are can be only a single word or a substring preceded by Sh.
You can use non capturing groups
/(?:sh)(are)/
this tells the regex to find are without capturing the sh group. However in this context, as you have a simple pattern to match, this is not necessary and you can find the answer in other solutions and do something like
/sh(are)/
matching then only the first group

Regex in javascript working with Cyrillic (Russian) set

Is it possible to work with Russian characters, in javascript's regex?
Maybe the use of \p{Cyrillic}?
If yes, please provide a basic example of usage.
The example:
var str1 = "абв прв фву";
var regexp = new RegExp("[вф]\\b", "g");
alert(str1.replace(regexp, "X"));
I expect to get: абX прX
Here is a good article on JavaScript regular expressions and unicode. Strings in JavaScript are 16 bit, so strings and RegExp objects can contain unicode characters, but most of the special characters like '\b', '\d', '\w' only support ascii. So your regular expression does not work as expected due to the use of '\b'. It seems you'll have to find a different way to detect word boundaries.
It should work if you just save the JavaScript file in UTF8. Then you should be able to enter any character in a string.
edit:
Just made a quick example with some cryllic characters from Wikipedia:
var cryllic = 'абвгдеёжзийклмнопрстуфхцчшщъыьэюяабвгдеёжзийклмнопрстуфхцчшщъыьэюя';
cryllic.match( 'л.+а' )[0];
// returns as expected: "лмнопрстуфхцчшщъыьэюяа"
According to this:
JavaScript, which does not offer any
Unicode support through its RegExp
class, does support \uFFFF for
matching a single Unicode code point
as part of its string syntax.
so you can at least use code points, but seemingly nothing more (no classes).
Also check out this duplicate of your question.

Categories

Resources