What's the maximum length should I allow to avoid Catastrophic backtracking?

What's the maximum length should I allow to avoid Catastrophic backtracking? - javascript

The line is approximately 7915621 in length and is actually the view state value of an ASPX website.
I get the original HTML of the site, then pass it line by line to the extract function, and as soon as it reaches the view_state line containing that long string, the regex become stuck.
Here is the regex pattern that get stuck,
/[\w\.]+\#[\w]+(?:\.[\w]{3}|\.[\w]{2}\.[\w]{2})\b/gi
I thought about setting a maximum line length to skip this line or any other lines like that but I can't think of a optimal size as I care about false positives.

[\w\.]+ is found so many times in your document that it becomes a problem to process them with your expression.
Reducing the amount of places to start searching at is a possible solution. E.g. using a word boundary.
(?:\.\w{3}|\.\w{2}\.\w{2}) can be streamlined as \.\w{2}(?:\w|\.\w{2}).
Use
/\b[\w.]+#\w+\.\w{2}(?:\w|\.\w{2})\b/gi
Or, get rid of the brackets
/\b\w+(?:\.\w+)*#\w+\.\w{2}(?:\w|\.\w{2})\b/gi
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\w{2} word characters (a-z, A-Z, 0-9, _) (2
times)
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\w{2} word characters (a-z, A-Z, 0-9, _) (2
times)
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char

Related

Matching an entire sentence containing words even if the sentence spans multiple lines

Attempting to match the entire sentence of a document containing certain words even if the sentence spans multiple lines.
My current attempts only capture the sentence if it does not span to the next lines.
^.*\b(dog|cat|bird)\b.*\.
Using ECMAScript.

When no abbreviations in the input are expected use
/\b[^?!.]*?\b(dog|cat|bird)\b[^?!.]*[.?!]/gi
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
[^?!.]*? any character except: '?', '!', '.' (0 or
more times (matching the least amount
possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
dog 'dog'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
cat 'cat'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
bird 'bird'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
[^?!.]* any character except: '?', '!', '.' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[.?!] any character of: '.', '?', '!'

Javascript regex - check if string contains two or more of same letter

I'm working on some regex with JavaScript. I have now created a regex that checks if a string has two or more of the same letter following each other. I would want to create a regex that checks if a word / string contains two or more of one particular letter, no matter if they are after each other or just in the same word / string.
It would need to match: drama and anaconda, but not match: lame, kiwi or tree.
This is the regex in JS.
const str = "anaconda";
str.match(/[a]{2,}/);

Use
\w*(\w)\w*\1\w*
See proof
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))

My thought process was something like this:
A word can start with any Alphabet
A word can end with any Alphabet
Similar letters can have zero or multiple alphabets between them
If any of it's letters are similar and it fits the criteria above then accept the word
regex = /[a-z]*([a-z])[a-z]*\1+[a-z]*/

I want to limit number of subdomain in Regular Expression

I want to limit levels of subdomain to 3 levels only. trying regex below fails
([\.]?[a-z]*){3}
My Target: abc.def.ghi
but
regex above accepts abc.def.ghi. (Notice the last .)

Use
^(?:[a-z]+(?:\.[a-z]+){0,2})?$
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (between 0 and
2 times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
){0,2} end of grouping
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Want to remove / ( -) . from phone number strings

I want to remove symbols in phone numbers. Sometimes it is in the format of 151-454-6545 but sometimes it is in (545)-(564)-(5465) and in sometimes it is in 548.445.8454. I am using
val.replace(/(\d{3})(\d{3})(\d{4})/, '($1) -$2-$3')
for replacing.. but it doesn't remove the dot.What to do remove the dot also? expected output like 545-455-4545

I suggest to use a non-digit expression to replace them by '-' string :
val.replace(/^\D+/, '')
.replace(/\D+$/, '')
.replace(/\D+/g, '-')
Let me know if it does what you need.
EDIT : trim whitespaces

here is a version with only 1 regex
https://regex101.com/r/Wavw45/1
regex
[^\d\n]*(\d{3})[^\d\n]+(\d{3,4})[^\d\n]+(\d{4})[^\d\n]*
replace (or whatever pattern you want)
($1) -$2-$3

Use
.replace(/^\D*(\d{3})\D*(\d{3})\D*(\d{4})\D*$/, '$1-$2-$3')
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d{3} digits (0-9) (3 times)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d{3} digits (0-9) (3 times)
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\d{4} digits (0-9) (4 times)
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Javascript Regex Input Validation to Prevent Duplicate Characters

I am attempting to validate text input with the following requirements:
allowed characters & length /^\w{8,15}$/
must contain /[a-z]+/
must contain /[A-Z]+/
must contain /[0-9]+/
must not contain repeated characters (ie. aba=pass and aab=fail)
Each test would return true when used with .test().
With modest familiarity, I am able to write the first 4 tests, albeit individually. The 5th test is not working out, negated lookahead (which is what i believe i need to be using) is challenging.
Here are a few value/result examples:
re.test("Fail1");//returns false, too short
re.test("StringFailsRule1");//returns false, too long
re.test("Fail!");//returns false, invalid !
re.test("FAILRULE2");//returns false, missing [a-z]+
re.test("failrule3");//returns false, missing [A-Z]+
re.test("failRuleFour");//returns false, missing [0-9]+
re.test("failRule55");//returns false, repeat of "5"
re.test("TestValue1");//returns true
Finally, the ideal would be a single combined test used to enforce all requirements.

This uses negative and positive lookaheads zero-length assertions for your tests and the .{8,15} bit validates length.
^(?!.*(.)\1)(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])\w{8,15}$
For your fifth rule I used a negative lookahead to make sure that a capture group of any character is never followed by itself.
Regexpal demo
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
\w{8,15} word characters (a-z, A-Z, 0-9, _)
(between 8 and 15 times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Develop Reference

JavaScript is the programming language of the Web.

What's the maximum length should I allow to avoid Catastrophic backtracking? - javascript

Related

Matching an entire sentence containing words even if the sentence spans multiple lines

Javascript regex - check if string contains two or more of same letter

I want to limit number of subdomain in Regular Expression

Want to remove / ( -) . from phone number strings

Javascript Regex Input Validation to Prevent Duplicate Characters

Categories

Resources