Invalid regular expression in javascript - javascript

I'm trying to find out if a string contains css code with this expression:
var pattern = new RegExp('\s(?[a-zA-Z-]+)\s[:]{1}\s*(?[a-zA-Z0-9\s.#]+)[;]{1}');
But I get "invalid regular expression" error on the line above...
What's wrong with it?
found the regex here: http://www.catswhocode.com/blog/10-regular-expressions-for-efficient-web-development
It's for PHP but it should work in javascript too, right?

What are the ? at the start of the two [a-zA-z-] blocks for? They look wrong to me.
The ? is unfortunately somewhat overload in regexp syntax, it can have three different meanings that I know of, and none of them match what I see in your example.
Also, your \s sequences need the backslash escaping because this is a string - they should look like \\s. To avoid escaping, just use the /.../ syntax instead of new Regexp("...").
That said, even that is insufficient - the regexp still produces an Invalid Group error in Chrome, probably related to the {1} sequences.

The ?'s are messing it up. I'm not sure what they are for.
/\s[a-zA-Z\-]+\s*:\s*[a-zA-Z0-9\s.#]+;/
worked for me (as far as compiling. I didn't test to see if it properly detected a CSS string).

Replace the quotes with / (slashes):
var pattern = /\s([a-zA-Z-]+)\s[:]{1}\s*([a-zA-Z0-9\s.#]+)[;]{1}/;
You also don't need the new RegExp() part either, which is why it's been removed; instead of using a quote or double quote to denote a string, JavaScript uses a slash / to denote a regular expression, which isn't a normal string.

That regular expression is very bad and I would avoid its source in the future. That said, I cleaned it up a bit and got the following result:
var pattern = /\s(?:[a-zA-Z-]+)\s*:\s*(?:[^;\n\r]+);/;
this matches something that looks like css, for example:
background-color: red;
Here's the fiddle to prove it, though I'd recommend to find a different solution to your problem. This is a very simple regex and it's not save to say that it is reliable.

Related

Regex returns nothing to repeat [duplicate]

I'm new to Regex and I'm trying to work it into one of my new projects to see if I can learn it and add it to my repitoire of skills. However, I'm hitting a roadblock here.
I'm trying to see if the user's input has illegal characters in it by using the .search function as so:
if (name.search("[\[\]\?\*\+\|\{\}\\\(\)\#\.\n\r]") != -1) {
...
}
However, when I try to execute the function this line is contained it, it throws the following error for that specific line:
Uncaught SyntaxError: Invalid regular expression: /[[]?*+|{}\()#.
]/: Nothing to repeat
I can't for the life of me see what's wrong with my code. Can anyone point me in the right direction?
You need to double the backslashes used to escape the regular expression special characters. However, as #Bohemian points out, most of those backslashes aren't needed. Unfortunately, his answer suffers from the same problem as yours. What you actually want is:
The backslash is being interpreted by the code that reads the string, rather than passed to the regular expression parser. You want:
"[\\[\\]?*+|{}\\\\()#.\n\r]"
Note the quadrupled backslash. That is definitely needed. The string passed to the regular expression compiler is then identical to #Bohemian's string, and works correctly.
Building off of #Bohemian, I think the easiest approach would be to just use a regex literal, e.g.:
if (name.search(/[\[\]?*+|{}\\()#.\n\r]/) != -1) {
// ... stuff ...
}
Regex literals are nice because you don't have to escape the escape character, and some IDE's will highlight invalid regex (very helpful for me as I constantly screw them up).
For Google travelers: this stupidly unhelpful error message is also presented when you make a typo and double up the + regex operator:
Okay:
\w+
Not okay:
\w++
Firstly, in a character class [...] most characters don't need escaping - they are just literals.
So, your regex should be:
"[\[\]?*+|{}\\()#.\n\r]"
This compiles for me.
Well, in my case I had to test a Phone Number with the help of regex, and I was getting the same error,
Invalid regular expression: /+923[0-9]{2}-(?!1234567)(?!1111111)(?!7654321)[0-9]{7}/: Nothing to repeat'
So, what was the error in my case was that + operator after the / in the start of the regex. So enclosing the + operator with square brackets [+], and again sending the request, worked like a charm.
Following will work:
/[+]923[0-9]{2}-(?!1234567)(?!1111111)(?!7654321)[0-9]{7}/
This answer may be helpful for those, who got the same type of error, but their chances of getting the error from this point of view, as mine! Cheers :)
for example I faced this in express node.js when trying to create route for paths not starting with /internal
app.get(`\/(?!internal).*`, (req, res)=>{
and after long trying it just worked when passing it as a RegExp Object using new RegExp()
app.get(new RegExp("\/(?!internal).*"), (req, res)=>{
this may help if you are getting this common issue in routing
This can also happen if you begin a regex with ?.
? may function as a quantifier -- so ? may expect something else to come before it, thus the "nothing to repeat" error. Nothing preceded it in the regex string so it didn't get to quantify anything; there was nothing to repeat / nothing to quantify.
? also has another role -- if the ? is preceded by ( it may indicate the beginning of a lookaround assertion or some other special construct. See example below.
If one forgets to write the () parentheses around the following lookbehind assertion ?<=x, this will cause the OP's error:
Incorrect: const xThenFive = /?<=x5/;
Correct:
const xThenFive = /(?<=x)5/;
This /(?<=x)5/ is a positive lookbehind: we're looking for a 5 that is preceded by an x e.g. it would match the 5 in x563 but not the 5 in x652.

What is the function of .source in context of this new RegExp

I ran into the below monster of a regex in the wild today. The regex is meant to validate a url.
function superUrlValidation(url) {
return new RegExp(/^/.source + "((.+):\/\/)?" + /(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/.source, "i")
.test(url);
}
I've never seen .source used in a regex like this so I looked it up.
The MDN docs for RegExp.prototype.source states:
The source property returns a String containing the source text of the regexp object, and it doesn't contain the two forward slashes on both sides and any flags.
... and gives this example:
var regex = /fooBar/ig;
console.log(regex.source); // "fooBar", doesn't contain /.../ and "ig".
I understand the MDN example (you're getting the source text of the regex object after it is created, makes sense), but I dont understand how this is being used in the superUrlValidation regex above.
How is the source being used before the regex object is completed and what does this accomplish? I cant find any documentation showing .source being used in this way.
Note that .source is used twice in the regex, at the beginning and the end
Use of .source everywhere in your regex seems totally unnecessary, may be just a trick to avoid double escaping. In fact even use of new RegExp is not needed and you can get away with just the regex literal as this:
var re = /^((.+):\/\/)?(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i;
/^/ is a regex literal, meaning it's a valid regex object in it's own right. This means that /^/.source === "^".
This seems like an arbitrary example of using the source property as this means the author could have just placed a "^" in it's place, or even just put a ^ at the beginning of the next string, and it would have the same effect.
The .source property returns the content of the regex between the forward slashes as you say. so the result of the above is equivalent to this string:
/^((.+):\/\/)?(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i
In JavaScript you can write regexes like this: /matchsomething/ or using the RegExp function/constructor above. It looks like the code you found is the result of someone not know what they were doing. They seem to have taken a few regexes using the literal syntax (i.e /match_here/) and plugged it into the constructor version and stuck them all together.
I can't see any benefit in using the source property this way. I would just use the string version or the constructor version. Or better, find out what the original author intended and write it again or find a respected regex library with the criteria you need.
And, yeah, wow. It's massive.

RegEx Syntax Error - nothing to repeat

Could someone please tell me why this RegEx fails?
http://jsfiddle.net/SrKPG/
^(\+[0-9]+ )[1-9]{2,} [0-9]{2,}(\-[0-9]+|)$
The funny thing is - when I test it at http://jsregex.com/ it works.
But in my code it fails.
The reason you're failing to match is because your second sequence of numbers does not accept zeroes:
^([+][0-9]+ )[1-9]{2,} [0-9]{2,}(\-[0-9]+|)$
+43 660 1234556
It fails because you write it as a string, without escaping the \.
You could write
var regex = "^(\\+[0-9]+ )[1-9]{2,} [0-9]{2,}(\\-[0-9]+|)$";
But, instead of using a string and the RegExp constructor, you should directly use a regex literal :
text.match(/^(\+[0-9]+ )[1-9]{2,} [0-9]{2,}(\-[0-9]+|)$/g);
You were also refusing 0 in the middle, which doesn't comply with your test string. It seems that what you want is
text.match(/^(\+[0-9]+ )[0-9]{2,} [0-9]{2,}(\-[0-9]+|)$/g);
Yours
"^(\+[0-9]+ )[1-9]{2,} [0-9]{2,}(\-[0-9]+|)$"
Correct
"^(\\+[0-9]+ )[1-9]{2,} [0-9]{2,}(-[0-9]+|)$"
The double escaping is a requirement of JavaScript string literals. It has nothing to do with regex.
Upon parsing your program your string literal becomes "^(+[0-9]+ )[1-9]{2,} [0-9]{2,}(-[0-9]+|)$" in memory, because \+ (as opposed to, let's say, \n) has no meaning in JS strings.
At this time the regex engine complains about the lone + that follows nothing.
Note that the something-or-nothing (something|) is better written as (something)?.
Apart from that: Thou shalt not use regex to validate phone numbers.
EDIT: The proof is in the comments. ;)

Preparing a regular expression for javascript

I have made this regular expression which does exactly what I want when I test it in e.g. RegExr:
^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?!([a-z0-9]+\.)?(localhost|yahoo\.com))(.*)?
However when I test it in javascript it says that the expression is invalid. After hours of debugging I found out that this expression works in javascript:
^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?![a-z0-9]+\.)?(localhost|yahoo\.com)(.*)?
However this doesn't do what I want (again testing in RegExr).
Why cannot I use the first expression in javascript? And how do I fix it?
UPDATE JULY 25
Sorry for the lack of info. The way I am using the Regexp is through a jQuery extension which lets me select using regexp. The script can be seen here: http://james.padolsey.com/javascript/regex-selector-for-jquery/
The specific code I am trying to get to work is:
$('a:regex(href, ^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?!([a-z0-9]+\.)?(localhost|yahoo\.com))(.*)?)').live('click', function(e) {
After including the linked jQuery plugin. The text strings I am testing are:
http://yahoo.com
http://google.dk
http://subdomain.yahoo.com
http://test.yahoo.com
http://localhost.dk
http://sub.yahoo.com/lalala
Where it is supposed to match "http://google.dk", "http://test.yahoo.com" and "http://sub.yahoo.com/lalala" - which it does when using RegExr but failing (invalid expression) using the jQuery plugin.
The first regular expression is not invalid:
var regexp = /^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?!([a-z0-9]+\.)?(localhost|yahoo\.com))(.*)?/;
works fine.
If you want to instantiate the expression from a string, you have to double all the backslashes:
var regexp = new RegExp("^https?:\\/\\/(www\\.)?(test\\.yahoo\\.com|sub\\.yahoo\\.com)?(?!([a-z0-9]+\\.)?(localhost|yahoo\\.com))(.*)?");
When you start from a string, you have to account for the fact that the string constant itself uses backslashes as a quoting mechanism, so there will be two evaluations made: one as a string, and one as a regular expression.
edit — OK I think I see the problem. That plugin you're trying to use is simply attempting to do something that's just not going to work, given the way that Sizzle parses selectors. In other words, the problem is not with your regular expression, it's with the overall selector. It is not even getting far enough to parse the regular expression.
Specifically it seems to be nested parentheses inside the regular expression. Something as simple as
$('a:regex(href, ((abc)))')
causes an error. You can instead do something like this:
$('a').filter(function() {
return /^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?!([a-z0-9]+\.)?(localhost|yahoo\.com))(.*)?/.test(this.href);
}).whatever( ... );

URL regex does not work in javascript

I am trying to use John Gruber's URL regex in Javascript but NetBeans keeps telling me there is a syntax error and illegal errors:
var patt = "/(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])
|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]
{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|
(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|
(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:
'".,<>?«»“”‘’]))/";
Anyone know how to solve this?
As others have said, it's the double quote. But alternatively, you can just write the regexp as a literal in javascript (but then you need to escape the forward slashes in lines 1 and 3 instead).
var regexp = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i;
I also moved the case-insensitive modifier to the end. Just because. (edit: Well, not just "because" - see Alan Moore's comment below)
Note: Whether you use a literal or a string, it has to be on 1 line.
put the whole expression in one line, and remove the quotes at the start and end so it looks like this var patt = /the-long-patttern/;, netbeans will still complain, but the browsers won't and thats what matters.
You should write it like this in NetBeans:
"(?i)\\b((?:[a-z][\\w-]+:(?:\\/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]"
+ "+[.][a-z]{2,4}\\/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))"
+ "+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))";

Categories

Resources