how to understand this regex pattern - javascript

Javascript regex pattern I find in less:
/^([#.](?:[\w-]|\\(?:[A-Fa-f0-9]{1,6} ?|[^A-Fa-f0-9]))+)\s*\(/
especially this section:
\\(?:[A-Fa-f0-9]{1,6} ?|[^A-Fa-f0-9])

([#.](?:[\w-]|\\(?:[A-Fa-f0-9]{1,6} ?|[^A-Fa-f0-9]))+)\s*\(

Let's work it from the inside out, using MDN as reference when necessary:
(?:[A-Fa-f0-9]{1,6} ?|[^A-Fa-f0-9])
(?:) is a non-capturing parenthesis. It groups and matches, but doesn't save the results. Inside that group is 1-6 hex digits followed by an optional space or any character other than a hex character.
(?:[\w-]|\\ above)+
Again, a non-capturing parenthesis, this time of \w, which is any alphanumeric character + _, and since there's [\w-], that's "any alphanum + -_". Then there's an or, a \ character, and the above. Together, that makes this parenthesis group read as: "Any single alphanumeric character, underscore or hyphen, or a backslash followed by either anything not a hexdigit or a hexstring of 1 to 6 characters." The + means "at least 1 instance of the group."
^([#.]above)\s*(
Now we have ^[#.] which means "the line must start with # or . followed by the above, with any number of spaces, followed by a left parenthesis.
TL;DR:
When you add that all up, you get:
"A line that starts with either # or . followed by one or more of:
alphanumeric characters, _ or - OR
a backslash followed by a one to six digit hexstring followed by a single optional space OR
a backslash followed by a single nonhexdigit character
followed by any number of whitespace and then a (".
If a match is found, the entire part before the whitespace and ( is stored in the result of the search.

Related

Using lookahead, how to ensure at least 4 alphanumeric chars are included + underscores

I'm trying to make sure that at least 4 alphanumeric characters are included in the input, and that underscores are also allowed.
The regular-expressions tutorial is a bit over my head because it talks about assertions and success/failure if there is a match.
^\w*(?=[a-zA-Z0-9]{4})$
my understanding:
\w --> alphanumeric + underscore
* --> matches the previous token between zero and unlimited times ( so, this means it can be any character that is alphanumeric/underscore, correct?)
(?=[a-zA-Z0-9]{4}) --> looks ahead of the previous characters, and if they include at least 4 alphanumeric characters, then I'm good.
Obviously I'm wrong on this, because regex101 is showing me no matches.
You want 4 or more alphanumeric characters, surround by any number of underscores (use ^ and $ to ensure it match's the whole input ):
^(_*[a-zA-Z0-9]_*){4,}$
Your pattern ^\w*(?=[a-zA-Z0-9]{4})$ does not match because:
^\w* Matches optional word characters from the start of the string, and if there are only word chars it will match until the end of the string
(?=[a-zA-Z0-9]{4}) The positive lookahead is true, if it can assert 4 consecutive alphanumeric chars to the right from the current position. The \w* allows backtracking, and can backtrack 4 positions so that the assertion it true.
But the $ asserts the end of the string, which it can not match as the position moved 4 steps to the left to fulfill the previous positive lookahead assertion.
Using the lookahead, what you can do is assert 4 alphanumeric chars preceded by optional underscores.
If the assertion is true, match 1 or more word characters.
^(?=(?:_*[a-zA-Z0-9]){4})\w+$
The pattern matches:
^ Start of string
(?= Positive lookahead, asser what is to the right is
(?:_*[a-zA-Z0-9]){4} Repeat 4 times matching optional _ followed by an alphanumeric char
) Close the lookahead
\w+ Match 1+ word characters (which includes the _)
$ End of string
Regex demo
I suggest using atomic groups (?>...), please see regex tutorial for details
^(?>_*[a-zA-Z0-9]_*){4,}$
to ensure 4 or more fragments each of them containing letter or digit.
Edit: If regex doesn't support atomic, let's try use just groups:
^(?:_*[A-Za-z0-9]_*){4,}$

regex for allowing alphanumeric, special characters and not ending with # or _ or

I am new to regex , I created below regex which allows alpha numeric and 3 special characters #._ but string should not end with # or . or *
^[a-zA-Z0-9._#]*[^_][^.][^#]$
it validates abc# but fails for abc.
Your pattern allows at least 3 characters, where the last 3 are negated character classes matching any char other than the listed.
The pattern ^[a-zA-Z0-9._#]*[^_][^.][^#]$ will match 3 newlines, and adding all the chars to a single character class ^[a-zA-Z0-9._#]*[^#._]$ will also match a single newline only.
If you want to allow all 3 "special" characters and match at least 3 characters in total you can repeat the character class 2 or more times using {2,} and match a single char at the end without the special characters.
^[a-zA-Z0-9._#]{2,}[a-zA-Z0-9]$
Regex demo
Matching as least a single char (and not end with . _ #)
^[a-zA-Z0-9._#]*[a-zA-Z0-9]$
Regex demo
if you include all the characters in the one character set, that'll work.
^[a-zA-Z0-9._#]*[^#._]$
Screenshot shows how different text examples would work (try it out on http://regexr.com)
Leading ^ is the start of the paragraph
Trailing $ is the end of the paragraph
. is the everything {2,} means the more than 2 letters
[^#_] means one letter Not # or _
^.{2,}[^#_]$
click here the answer

Regex to match string from the back

Let's say we have a string "text\t1\nText that has to be extracted" what regex can be used so that we check the string from the back that is from the last " to n because the start of the string can change. In this case, I need to get only Text that has to be extracted. What generic regex can we use here?
I used this (?<=\\n1\\n)(.*)(?=“) but this will not work if the pattern before n changes to n2 or ntext.
Any help is appreciated.
You may use this regex:
/(?<=\\n)[^"\\]+(?="$)/
RegEx Demo
RegEx Details:
(?<=\\n): Lookbehind to make sure we have a \n before the current position
[^"\\]+: Match 1+ of any character that is not " and not \
(?="$): Make sure we have a " before line end ahead
Can't you just split and take the last element?
var item = "text\n1\nText that has to be extracted";
var last = item.split(/\n/g).reverse()[0];
console.log(last) // "Text that has to be extracted"
/^(\d+)\n([^\n"]+)"$/ may have some edge cases, but will find the number (one or more digits), followed by a newline, followed by any character that is neither newline nor a double quote, followed by a literal double quote.
This would require that the double quote occurs immediately before the end-of-line (EOL), but if that's not required (for example, if you have a semi-colon after the closing quote), remove $ from the end.
Edit
Just noticed that it's the literal text \n and not a newline character.
/(?<=\\n)(\d+)\\n((?:[^\\"]+|\\.)*)"/
Regex101 example
Breakdown:
(?<=\\n) looks for a \ followed by the letter n.
(\d+) captures the 1-or-more digits.
\\n matches a literal \ followed by the letter n.
(...*) matches some text that repeats 0 or more times.
(?:...|...) matches any character that are neither a literal \ character nor a double quote character... OR a literal \ character that is followed by "anything" so you can have \n or \" etc. The entire group is matched repeatedly.
" at the end ensures that you're inside (well, we hope) a double-quoted string on the same line.

Regular expression to match line separated size strings

I am writing a reular expression to validate input string, which is a line separated list of sizes ([width]x[height]).
Valid input example:
300x200
50x80
100x100
The regular expression I initially came up with is (https://regex101.com/r/H9JDjA/1):
^(\d+x\d+[\r\n|\r|\n]*)+$
This regular expression matches my input but also matches this invalid input (size can't be 100x100x200):
300x200
50x80
100x100x200
Adding a word boundary at the end seems to have fixed this issue:
^(\d+x\d+[\r\n|\r|\n]*\b)+$
My questions:
Why does the initial regular expression without the word boundary fail? It looks like I am matching one or more instances of a \d+(number), followed by character 'x', followed by a \d+(number), followed by one or more new lines from various operating systems.
How to validate input having multiple training new line characters in this input? The following doesn't work for some kind of inputs like this:
500x500\n100x100\n\n\n384384
^(\d+x\d+[\r\n|\r|\n]\b)+|[\r\n|\r|\n]$
Isolate the problem with this target 100x100x200
For now, forget about the anchors in the regex.
The minimum regex is \d+x\d+ since it only has to be satisfied once
for a match to take place.
The maximum is something like this \d+x\d+ (?: (?:\r?\n | \r)* \d+x\d+ )*
Since \r?\n|\r is optional, it can be reduced to this \d+x\d+ (?: \d+x\d+ )*
The result, when you applied to the target string is:
100x100x200 matches.
But, since you've anchored the regex ^$, it is forced to break up
the middle 100 to make it match.
100x10 from \d+x\d+
0x200 from (?: \d+x\d+ )*
So, that is why the first regex seemingly matches 100x100x200.
To avoid all of that, just require a line break between them, and
make the trailing linebreaks optional (if you need to validate the whole
string, otherwise leave it and the end anchor off).
^\d+x\d+(?:(?:\r?\n|\r)+\d+x\d+)*(?:\r?\n|\r)*$
A better view of it
^
\d+ x \d+
(?:
(?: \r? \n | \r )+
\d+ x \d+
)*
(?: \r? \n | \r )*
$
Your initial regular expression "fails" because of the +:
^(\d+x\d+[\r\n|\r|\n]*)+$
-----------------------^ here
Your parenthesis pattern (\d+x\d+[\r\n|\r|\n]*) says match one or more number followed by an "x" followed by one or more number followed by zero or more newlines. The + after that says match one or more of the entire parenthesis pattern, which means that for an input like 100x200x300 your pattern matches 100x200 and then 200x300, so it looks like it matches the entire line.
If you're simply trying to extract dimensions from a newline-separated string, I would use the following regular expression with a multiline flag:
^(\d+x\d+)$
https://regex101.com/r/H9JDjA/2
Side note: In your expression, [\r\n|\r|\n] is actually saying match any one instance of \r, \n, |, \r, |, or \n (i.e. it's quite redundant, and you probably aren't meaning to match |). If you want to match a sequential set of any combination of \r or \n, you can simply use [\r\n]+.
You can use multiline modifier, which should make life easier:
var input = "\n\
300x200x400\n\
50x80\n\
\n\
\n\
300x200\n\
50x80\n\
100x100x200x100\n";
var allSizes = input.match(/^\d+x\d+/gm); // multiline modifier assumes each line has start and end
for (var size in allSizes)
console.log(allSizes[size]);
Prints:
300x200
50x80
300x200
50x80
100x100
Try this regex out
^[0-9]{1,4}x[0-9]{1,4}|[(\r\n|\r|\n)]+$
It'll match these inputs.
1x1
10x10
100x100
2000x2938
\n
\r
\r\n
but not this 100x100x200

Issues in password regular expression

Hi all I am making a password regular expression in javascript test() method, It will take the following inputs
solution
/^(?=.*\d)^(?=.*[!#$%'*+\-/=?^_{}|~])(?=.*[A-Z])(?=.*[a-z])\S{8,15}$/gm
May contains any letter except space
At least 8 characters long but not more the 15 character
Take at least one uppercase and one lowercase letter
Take at least one numeric and one special character
But I am not able to perform below task with (period, dot, fullStop)
(dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
Can anyone one help me to sort out this problem, Thanks in advance
You may move the \S{8,15} part with the $ anchor to the positive lookahead and place it as the first condition (to fail the whole string if it has spaces, or the length is less than 8 or more than 15) and replace that pattern with [^.]+(?:\.[^.]+)* consuming subpattern.
/^(?=\S{8,15}$)(?=.*\d)(?=.*[!#$%'*+\/=?^_{}|~-])(?=.*[A-Z])(?=.*[a-z])[^.]+(?:\.[^.]+)*$/
See the regex demo
Details:
^ - start of string
(?=\S{8,15}$) - the first condition that requires the string to have no whitespaces and be of 8 to 15 chars in length
(?=.*\d) - there must be a digit after any 0+ chars
(?=.*[!#$%'*+\/=?^_{}|~-]) - there must be one symbol from the defined set after any 0+ chars
(?=.*[A-Z]) - an uppercase ASCII letter is required
(?=.*[a-z]) - a lowercase ASCII letter is required
[^.]+(?:\.[^.]+)* - 1+ chars other than ., followed with 0 or more sequences of a . followed with 1 or more chars other than a dot (note that we do not have to add \s into these 2 negated character classes as the first lookahead already prevalidated the whole string, together with its length)
$ - end of string.

Categories

Resources