Figuring out Regex pattern - javascript

I am still not all that good when it comes to writing Regex patterns and am having issues with trying to figure out a search pattern for the following string:
{embed_video('RpcF9EYXZpZFBhY2tfRklOQUwuZj','575','352','video_player')}
I basically need to search a page for anything in between the hash {} marks.
I have tried this:
string = $(".content").text();
string.match("^{[\w-]}");
But its not working... any ideas on what I could be doing wrong?
Thanks for the help everybody! This is what I did to make it work:
$("div", "html, body").each(function(){
var text = $(this).text();
if(text.match("^\{.*\}$")) {
console.log("FOUND");
}
})

This should find the innermost content of curly braces (even nested ones).
string.match(/\{([^\{\}]*)\}/)[1]; // the [1] gets what is within the parentheses.
edit:
Thanks to the comments below here is a cleaner version:
string.match(/\{(.*?)\}/)[1];

One problem is the lack of a quantifier. As it stands, your regex is looking for a single \w or - character, denoted by your character class. You're probably looking for either of the following quantifiers:
[\w-]* - match 0 or more \w or - characters
[\w-]+ - match 1 or more \w or - characters
Another problem is the restrictions in the character class. [\w-] won't match (, ), ", spaces or other non-word characters that may appear. If you want to match all characters, use .. If you want to match all characters except }, use [^}] instead.
For example:
string = $(".content").text();
string.match("^{[^}]+}");
Using * would allow the content within the braces to be empty.
Side note: It looks to me like you're gearing up to eval() the code contained within the { and }. eval() is generally best avoided (if possible) for both security and performance reasons. In your case, you may be able to use this instead:
var string = $(".content").text(), fn, args;
if (string.charAt(0) == "{" && string.charAt(string.length - 1) == "}") {
fn = string.slice(1, string.indexOf("("));
args = string.slice(string.indexOf("("), string.lastIndexOf(")")).split(",");
window[fn].apply(null, args);
}

If you are using eclipse by any chance, there is a regular expression plugin with which you can play around and see how your regular expression searches your text.
I would try this
string.match("^\{.*\}$");

Search for the following regular expression:
var sRe = /\{([^\}]*)\}/g;
sText.match(sRe);
It means that you are searching for character "{" followed by any symbols but not "}" optionally and then ending with "}".

Try "\{.*?\}". But it won't handle the situation with nested curly braces. Here you can test your regexps online.
string.match("^\{(.*?)\}$")[1];

I think you need to escape the {} characters...they have special meaning in regex...

Related

Javascript: equivalent regex for negative lookbehind?

I want to write a regular expression that will captures all double quotes " in a string, except for those that are escaped.
For example, in the following String will return the first quote only:
"HELLO\"\"\"
but the following one will return 3 matches:
"HELLO\"\""\""
I have used the following expression, but since in JavaScript there is no negative lookbehind I am stuck:
(?<!\\)"
I have looked at similar questions but most provide a programmatic interface. I don't want to use a programmatic interface because I am using Ace editor and the simplest way to go around my problem is to define this regex.
I suppose there is no generic alternative, since I have tried the alternatives proposed to the similar questions, but non of them exactly matched my case.
Thanks for your answers!
You can use this workaround:
(^|[^\\])"
" only if preceded by any char but a \ or the beginning of the string (^).
But be careful, this matches two chars: the " AND the preceding character (unless in the start-of-the-string case). In other words, if you wan't to replace all these " by ' for example, you'll need:
theString.replace(/(^|[^\\])"/g, "$1'")
The code I assume you are trying to run:
while ( matcher = /(?<!\\)"/g.exec(theString) ) {
// do stuff. matcher[0] is double quotes (that don't follow a backslash)
}
In JavaScript, using this guide to JS lookbehinds:
while ( matcher = /(\\)?"/g.exec(theString) ) {
if (!matcher[1]) {
// do stuff. matcher[0] is double quotes (that don't follow a backslash)
}
}
This looks for double quotes (") that optionally follow a backslash (\) but then doesn't act when it actually does follow a backslash.
If you were merely trying to count the number of unescaped double-quotes, the "do stuff" line could be count++.

C# regex doesn't work in Javascript

I am using the following regex to validate an email:
^[a-zA-Z0-9!$'*+/\-_#%?^`&=~}{|]+(\.[a-zA-Z0-9!$'*+/\-_#%?^`&=~}{|]+)*#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-['&_\]]]+)(\.[\w-['&_\]]]+)*))(\]?)$
This works fine in C# but in JavaScript, its not working.... and yes I replaced every backslash with a double backslash as the following:
^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([\\w-['&_\\]]]+)(\\.[\\w-['&_\\]]]+)*))(\\]?)$
I am using XRegExp. Am I missing something here? Is there such thing as a converter to convert normal regex to JavaScript perhaps :) ?
Here is my function:
function CheckEmailAddress(email) {
var reg = new XRegExp("^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([\\w-['&_\\]]]+)(\\.[\\w-['&_\\]]]+)*))(\\]?)$")
if (reg.test(email) == false) {
return false;
}
return true;
}
It is returning false for a simple "abc#123.com" email address.
Thanks in advance!
Kevin
The problem is that your regular expression contains character class subtractions. JavaScript's RegExp does not support them, nor does XRegExp. (I initially misremembered and commented that it does, but it does not.)
However, character class subtractions can be replaced with negative lookaheads so this:
[\w-['&_\\]]]
can become this:
(?:(?!['&_\\]])\\w)
Both mean "any word character but not one in the set '&_]". The expression \w does not match ', & or ] so we can simplify to:
(?:(?!_)\\w)
Or since \w is [A-Za-z0-9_], we can just remove the underscore from the list and further simplify to:
[A-Za-z0-9]
So the final RegExp is this:
new RegExp("^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([A-Za-z0-9]+)(\\.[A-Za-z0-9]+)*))(\\]?)$")
I've done modest testing with this RegExp, but you should do due diligence on checking it.
It is not strictly necessary to go through the negative lookahead step to simplify the regular expression but knowing that character class subtractions can be replaced with negative lookaheads is useful in more general cases where manual simplification would be difficult or brittle.

Replace word and characters in string

I'm in the midst of figuring out the RegExp function capabilities and trying to create this string:
api/seaarch/redirect/complete/127879/2013-11-27/4-12-2013/7/2/0/0/LGW/null
from this:
/api/search/redirect/complete/127879/2013-11-27/4-12-2013/7/2/0/0/undefined/undefined/undefined/undefined/undefined/undefined/undefined/undefined/undefined/^^undefined^undefined^undefined^undefined^undefined/undefined^undefined^undefined^undefined^undefined^undefined^undefined/undefined^undefined^undefined^undefined^undefined^undefined^undefined/undefined^undefined^undefined^undefined^undefined^undefined^undefined/LGW/null
I know \bundefined\b\^ removes 'undefined^' and undefined\b\/ removes 'undefined/' but how do i combine these together?
In this case, since the ^ or / follow the undefined in the same place, you can use a character class:
str = str.replace(/\bundefined[/^]+/g, '');
Note: ^ is special both inside and outside of character classes, but in different ways. Inside a character class ([...]), it's only special if it's the first character, hence my not making it the first char above.
Also note the + at the end, saying "one or more" of the ^ or /. Without that, since there are a couple of double ^^, you end up with ^ in the result.
If you want to be a bit paranoid (and I admit I probably would be), you could escape the / within the character class. For me it works if you don't, with Chrome's implementation, but trying to follow the pattern definition in the spec is...tiresome...so I honestly don't know if there's any implementation out there that would try to end the expression as of the / in the character class. So with the escape:
str = str.replace(/\bundefined[\/^]+/g, '');

How to replace a substring with open parentheses (

I am a Regex newbie and trying to implement Regex to replace a matching pattern in a string only when it has a ( - open parentheses using Javascript. for example if I have a string
IN(INTERM_LEVEL_IN + (int)X_ID)
I would only like to highlight the first IN( in the string. Not the INTERM_LEVEL_IN (2 ins here) and the int.
What is the Regex to accomplish this?
To match the opening bracket you just need to escape it: IN\(.
For instance, running this in Firebug console:
enter code here"IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN()/, 'test');`
Will result in:
>>> "IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN\()/, 'test');
"testINTERM_LEVEL_IN + (int)X_ID)"
Parenthesis in regular expressions have a special meaning (sub-capture groups), so when you want them to be interpreted literally you have to escape them by with a \ before them. The regular expression IN\( would match the string IN(.
The following should only match IN( at the beginning of a line:
/^IN\(/
The following would match IN( that is not preceded by any alphanumeric character or underscore:
/[a-zA-Z0-9_]IN\(/
And finally, the following would match any instance of IN( no matter what precedes it:
/IN\(/
So, take your pick. If you're interested in learning more about regex, here's a good tutorial: http://www.regular-expressions.info/tutorial.html
You can use just regular old Javascript for regex, a simple IN\( would work for the example you gave (see here), but I suspect your situation is more complicated than that. In which case, you need to define exactly what you are trying to match and what you don't want to match.

RegExp in JavaScript, when a quantifier is part of the pattern

I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!
You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.
This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.

Categories

Resources