Regex Pattern Causes Catastrophic Backtracking In Edge Cases

Regex Pattern Causes Catastrophic Backtracking In Edge Cases - javascript

I have these two simple regex patterns to match urls that are from these stores, but they lead to catastrophic backtracking and a frozen browser when running on some string url with an edge case. This logic is running on thousands of random requests, so the chance of catastrophic backtracking is high. Does anyone have an idea of what could be wrong in the way I wrote this regex.
> ".*://.*.newegg.com/Product/Product.*"
> ".*://.*.gamestop.com*.*Product-Variation*.*productDetailsRedesign"

You have too many greedy dot patterns in the expressions. Try be a ted bit more verbose:
\w+://[^/]*\.newegg\.com/Product/Product\S*
The second pattern:
\w+://[^\s/]*\.gamestop\.com\S*?Product-Variation\S*?productDetailsRedesign
See proof #1 | proof #2.
Use \S*? to match any characters different from whitespace (as few as possible).
Escape the period characters as they are regex metacharacters.
Use [^...] negated character classes if you know there can be no such characters between two substrings in a match.

Related

Regex fails only on Safari

I have the following simple email validation regex: /(.+){2,}#(.+){2,}\.(.+){2,}/
This works fine on Firefox, Chrome etc, but fails on Safari.
Why would this perfectly valid regex fail on Safari? I could not find elements in the regex that are not supported by Safari.
/(.+){2,}#(.+){2,}\.(.+){2,}/.test('123#abc.nl');
Above fails on Safari, but not on any other browser.

Different regex engines have different tolerance to catastrophic backtracking prone patterns.
Yours is a catastrophic backtracking prone pattern as you quantify (.+) with the {2,} quantifier that makes (.+) match two or more times (that is, match one or more times twice or more, which makes it fail very slowly with non-matching patterns.)
If you meant to match any two or more chars, quantify the . pattern and not a .+ one:
/.{2,}#.{2,}\..{2,}/
Or, use existing email validation patterns..

Regex - Validate that the local part of the email is not ending with a dot while only allowing certain characters without using a lookbehind

I was using a lookbehind to check for a dot before the # but just realized not all browsers are supporting lookbehinds. It works perfect in Chrome but fails in Firefox and IE.
This is what I came up with but it certainly is messy
^([a-zA-Z0-9&^*%#~{}=+?`_-]\.?)*[a-zA-Z0-9&^*%#~{}=+?`_-]#([a-zA-Z0-9]+\.)+[a-zA-Z]$
Is there a simpler and/or more elegant way to do this? I don't think I can negate the dot (^.) because I'm only allowing certain characters to be present in the local part.

This ([a-zA-Z0-9&^*%#~{}=+?`_-].?)*[a-zA-Z0-9&^*%#~{}=+?`_-] part is not messy, but inefficient, because the * quantifies a group containing an obligatory part, [...], and an optional \.?. Instead of (ab?)*a, you may use a+(?:ba+)* that will make matching linear and swift, in your case, [a-zA-Z0-9&^*%#~{}=+?`_-]+(?:.[a-zA-Z0-9&^*%#~{}=+?`_-]+)*.
More, [a-zA-Z0-9_] equals \w in JS regex, you may use this to shorten the pattern.
Besides, the last [a-zA-Z]$ pattern only matches a single letter, you most probably need [a-zA-Z]{2}$ there, as TLDs consist of 2+ letters.
So, you may use
^[\w&^*%#~{}=+?`-]+(?:\.[\w&^*%#~{}=+?`-]+)*#(?:[a-zA-Z0-9]+\.)+[a-zA-Z]{2,}$
See the regex demo.

A-Za-z vs. a-z/i case insensitive

What is considered best practice, and what is the fastest to compute?
[A-Za-z]
vs.
[a-z]/i
Assuming that you don't care about the rest of the regex's case. I'm wanting to know with exactly those 2 regex's which is the fastest. Or does it yield the same outcome under the hood.

A quick test on jsperf.com, shows that to search a single string from a-z on a sample string would results that [a-zA-Z] is slightly faster.
Here are the tests that I performed.
Including /g test
Without /g test

Optimisation: (\\[abc] | [^abc])*

I have a long Regex (JavaScript), and it contains the following construct:
((\\\\)|(\\[abc])|([^abc]))*
The regex says:
Match any String, that doesn't contain the letters a,b and c.
In except if they're escaped by a backslash.
If the backslash is escaped (eg. \\a), also don't match these letters.
Here's a simple match-example:
eeeaeaee\aee\\\\ae\\\\\aee
I wonder if it's possible to optimise this regulat expression. This is only a little example, the actual regex I'm using is bigger, and I have lots of code twice.

I think a more logical (and likely faster) regexp would be something like:
(?:[^abc\\]|\\.)*
In other words, a backslash will escape anything, including another backslash.
Note a few things: first, if you don't need to capture parts of the match, use non-capturing groups. That buys you a little performance. Second, when there are multiple alternatives, put the most common one first.
You might get even better performance this way (try it):
[^abc\\]*(?:\\.[^abc\\]*)*
Rather than going through the alternation for each and every character, that will "eat" runs of non-special characters with a single step. Nested * can be bad news, leading to quadratic (or worse) runtime in cases where the regex doesn't match, but in this case that won't happen.
When writing this answer, I discovered that JS's regex engine has no possessive matchers. That sucks -- you could get better worst-case performance if they were available. (An important tip for working towards regex mastery: when performance testing a regex, always test cases where it does match AND where it doesn't match. The worst-case performance generally occurs when it doesn't.)

You can match any character after a backslash or any character that is not in [abc]:
(\\.|[^abc])*
That will match the exact same language.
I think it's actually more clear what you're intention is if you flip it around like:
([^abc]|\\.)*

Is there a way to make JSLint happy with this regex?

When running my JavaScript through JSLint, I get the following two errors from the same line of code.
Problem at line 398 character 29: Insecure '.'.
if (password.match(/.[!,#,#,$,%,^,&,*,?,_,~,-,(,)]/))
Problem at line 398 character 41: Unescaped '^'.
if (password.match(/.[!,#,#,$,%,^,&,*,?,_,~,-,(,)]/))
I understand that JSLint may be being "over-cautious". I read the comments on a similar question, Purpose of JSLint "disallow insecure in regex" option.
Nonetheless, I would like to have the best of all worlds, and have a working regular expression that also doesn't cause JSLint to complain.
But I fail at regex.
Is it possible to make regular expression that looks for the presence of at least one special character, yet doesn't cause JSLint to complain?

That's a character class; you don't need a separator (eg: the commas). You can clean up the regex by placing the caret (^) and the dash (-) in strategic positions so they don't need to be escaped.
/[!##$%^&*?_~()-]/
Should work. You can also use the non-word character class:
/\W/
That matches anything that's not a letter (a-zA-Z), number (0-9) or underscore (_).

Develop Reference

JavaScript is the programming language of the Web.

Regex Pattern Causes Catastrophic Backtracking In Edge Cases - javascript

Related

Regex fails only on Safari

Regex - Validate that the local part of the email is not ending with a dot while only allowing certain characters without using a lookbehind

A-Za-z vs. a-z/i case insensitive

Optimisation: (\\[abc] | [^abc])*

Is there a way to make JSLint happy with this regex?

Categories

Resources