Javascript: equivalent regex for negative lookbehind? - javascript

I want to write a regular expression that will captures all double quotes " in a string, except for those that are escaped.
For example, in the following String will return the first quote only:
"HELLO\"\"\"
but the following one will return 3 matches:
"HELLO\"\""\""
I have used the following expression, but since in JavaScript there is no negative lookbehind I am stuck:
(?<!\\)"
I have looked at similar questions but most provide a programmatic interface. I don't want to use a programmatic interface because I am using Ace editor and the simplest way to go around my problem is to define this regex.
I suppose there is no generic alternative, since I have tried the alternatives proposed to the similar questions, but non of them exactly matched my case.
Thanks for your answers!

You can use this workaround:
(^|[^\\])"
" only if preceded by any char but a \ or the beginning of the string (^).
But be careful, this matches two chars: the " AND the preceding character (unless in the start-of-the-string case). In other words, if you wan't to replace all these " by ' for example, you'll need:
theString.replace(/(^|[^\\])"/g, "$1'")

The code I assume you are trying to run:
while ( matcher = /(?<!\\)"/g.exec(theString) ) {
// do stuff. matcher[0] is double quotes (that don't follow a backslash)
}
In JavaScript, using this guide to JS lookbehinds:
while ( matcher = /(\\)?"/g.exec(theString) ) {
if (!matcher[1]) {
// do stuff. matcher[0] is double quotes (that don't follow a backslash)
}
}
This looks for double quotes (") that optionally follow a backslash (\) but then doesn't act when it actually does follow a backslash.
If you were merely trying to count the number of unescaped double-quotes, the "do stuff" line could be count++.

Related

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

How to replace a substring with open parentheses (

I am a Regex newbie and trying to implement Regex to replace a matching pattern in a string only when it has a ( - open parentheses using Javascript. for example if I have a string
IN(INTERM_LEVEL_IN + (int)X_ID)
I would only like to highlight the first IN( in the string. Not the INTERM_LEVEL_IN (2 ins here) and the int.
What is the Regex to accomplish this?
To match the opening bracket you just need to escape it: IN\(.
For instance, running this in Firebug console:
enter code here"IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN()/, 'test');`
Will result in:
>>> "IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN\()/, 'test');
"testINTERM_LEVEL_IN + (int)X_ID)"
Parenthesis in regular expressions have a special meaning (sub-capture groups), so when you want them to be interpreted literally you have to escape them by with a \ before them. The regular expression IN\( would match the string IN(.
The following should only match IN( at the beginning of a line:
/^IN\(/
The following would match IN( that is not preceded by any alphanumeric character or underscore:
/[a-zA-Z0-9_]IN\(/
And finally, the following would match any instance of IN( no matter what precedes it:
/IN\(/
So, take your pick. If you're interested in learning more about regex, here's a good tutorial: http://www.regular-expressions.info/tutorial.html
You can use just regular old Javascript for regex, a simple IN\( would work for the example you gave (see here), but I suspect your situation is more complicated than that. In which case, you need to define exactly what you are trying to match and what you don't want to match.

Regex not working as expected in JavaScript

I wrote the following regex:
(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?
Its behaviour can be seen here: http://gskinner.com/RegExr/?34b8m
I wrote the following JavaScript code:
var urlexp = new RegExp(
'^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$', 'gi'
);
document.write(urlexp.test("blaaa"))
And it returns true even though the regex was supposed to not allow single words as valid.
What am I doing wrong?
Your problem is that JavaScript is viewing all your escape sequences as escapes for the string. So your regex goes to memory looking like this:
^(https?://)?([da-z.-]+).([a-z]{2,6})(/(w|-)*)*/?$
Which you may notice causes a problem in the middle when what you thought was a literal period turns into a regular expressions wildcard. You can solve this in a couple ways. Using the forward slash regular expression syntax JavaScript provides:
var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi
Or by escaping your backslashes (and not your forward slashes, as you had been doing - that's exclusively for when you're using /regex/mod notation, just like you don't have to escape your single quotes in a double quoted string and vice versa):
var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi')
Please note the double backslash before the w - also necessary for matching word characters.
A couple notes on your regular expression itself:
[da-z.-]
d is contained in the a-z range. Unless you meant \d? In that case, the slash is important.
(/(\w|-)*)*/?
My own misgivings about the nested Kleene stars aside, you can whittle that alternation down into a character class, and drop the terminating /? entirely, as a trailing slash will be match by the group as you've given it. I'd rewrite as:
(/[\w-]*)*
Though, maybe you'd just like to catch non space characters?
(/[^/\s]*)*
Anyway, modified this way your regular expression winds up looking more like:
^(https?://)?([\da-z.-]+)\.([a-z]{2,6})(/[\w-]*)*$
Remember, if you're going to use string notation: Double EVERY backslash. If you're going to use native /regex/mod notation (which I highly recommend), escape your forward slashes.

RegExp in JavaScript, when a quantifier is part of the pattern

I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!
You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.
This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.

Figuring out Regex pattern

I am still not all that good when it comes to writing Regex patterns and am having issues with trying to figure out a search pattern for the following string:
{embed_video('RpcF9EYXZpZFBhY2tfRklOQUwuZj','575','352','video_player')}
I basically need to search a page for anything in between the hash {} marks.
I have tried this:
string = $(".content").text();
string.match("^{[\w-]}");
But its not working... any ideas on what I could be doing wrong?
Thanks for the help everybody! This is what I did to make it work:
$("div", "html, body").each(function(){
var text = $(this).text();
if(text.match("^\{.*\}$")) {
console.log("FOUND");
}
})
This should find the innermost content of curly braces (even nested ones).
string.match(/\{([^\{\}]*)\}/)[1]; // the [1] gets what is within the parentheses.
edit:
Thanks to the comments below here is a cleaner version:
string.match(/\{(.*?)\}/)[1];
One problem is the lack of a quantifier. As it stands, your regex is looking for a single \w or - character, denoted by your character class. You're probably looking for either of the following quantifiers:
[\w-]* - match 0 or more \w or - characters
[\w-]+ - match 1 or more \w or - characters
Another problem is the restrictions in the character class. [\w-] won't match (, ), ", spaces or other non-word characters that may appear. If you want to match all characters, use .. If you want to match all characters except }, use [^}] instead.
For example:
string = $(".content").text();
string.match("^{[^}]+}");
Using * would allow the content within the braces to be empty.
Side note: It looks to me like you're gearing up to eval() the code contained within the { and }. eval() is generally best avoided (if possible) for both security and performance reasons. In your case, you may be able to use this instead:
var string = $(".content").text(), fn, args;
if (string.charAt(0) == "{" && string.charAt(string.length - 1) == "}") {
fn = string.slice(1, string.indexOf("("));
args = string.slice(string.indexOf("("), string.lastIndexOf(")")).split(",");
window[fn].apply(null, args);
}
If you are using eclipse by any chance, there is a regular expression plugin with which you can play around and see how your regular expression searches your text.
I would try this
string.match("^\{.*\}$");
Search for the following regular expression:
var sRe = /\{([^\}]*)\}/g;
sText.match(sRe);
It means that you are searching for character "{" followed by any symbols but not "}" optionally and then ending with "}".
Try "\{.*?\}". But it won't handle the situation with nested curly braces. Here you can test your regexps online.
string.match("^\{(.*?)\}$")[1];
I think you need to escape the {} characters...they have special meaning in regex...

Categories

Resources