Regular expression to match line separated size strings - javascript

I am writing a reular expression to validate input string, which is a line separated list of sizes ([width]x[height]).
Valid input example:
300x200
50x80
100x100
The regular expression I initially came up with is (https://regex101.com/r/H9JDjA/1):
^(\d+x\d+[\r\n|\r|\n]*)+$
This regular expression matches my input but also matches this invalid input (size can't be 100x100x200):
300x200
50x80
100x100x200
Adding a word boundary at the end seems to have fixed this issue:
^(\d+x\d+[\r\n|\r|\n]*\b)+$
My questions:
Why does the initial regular expression without the word boundary fail? It looks like I am matching one or more instances of a \d+(number), followed by character 'x', followed by a \d+(number), followed by one or more new lines from various operating systems.
How to validate input having multiple training new line characters in this input? The following doesn't work for some kind of inputs like this:
500x500\n100x100\n\n\n384384
^(\d+x\d+[\r\n|\r|\n]\b)+|[\r\n|\r|\n]$

Isolate the problem with this target 100x100x200
For now, forget about the anchors in the regex.
The minimum regex is \d+x\d+ since it only has to be satisfied once
for a match to take place.
The maximum is something like this \d+x\d+ (?: (?:\r?\n | \r)* \d+x\d+ )*
Since \r?\n|\r is optional, it can be reduced to this \d+x\d+ (?: \d+x\d+ )*
The result, when you applied to the target string is:
100x100x200 matches.
But, since you've anchored the regex ^$, it is forced to break up
the middle 100 to make it match.
100x10 from \d+x\d+
0x200 from (?: \d+x\d+ )*
So, that is why the first regex seemingly matches 100x100x200.
To avoid all of that, just require a line break between them, and
make the trailing linebreaks optional (if you need to validate the whole
string, otherwise leave it and the end anchor off).
^\d+x\d+(?:(?:\r?\n|\r)+\d+x\d+)*(?:\r?\n|\r)*$
A better view of it
^
\d+ x \d+
(?:
(?: \r? \n | \r )+
\d+ x \d+
)*
(?: \r? \n | \r )*
$

Your initial regular expression "fails" because of the +:
^(\d+x\d+[\r\n|\r|\n]*)+$
-----------------------^ here
Your parenthesis pattern (\d+x\d+[\r\n|\r|\n]*) says match one or more number followed by an "x" followed by one or more number followed by zero or more newlines. The + after that says match one or more of the entire parenthesis pattern, which means that for an input like 100x200x300 your pattern matches 100x200 and then 200x300, so it looks like it matches the entire line.
If you're simply trying to extract dimensions from a newline-separated string, I would use the following regular expression with a multiline flag:
^(\d+x\d+)$
https://regex101.com/r/H9JDjA/2
Side note: In your expression, [\r\n|\r|\n] is actually saying match any one instance of \r, \n, |, \r, |, or \n (i.e. it's quite redundant, and you probably aren't meaning to match |). If you want to match a sequential set of any combination of \r or \n, you can simply use [\r\n]+.

You can use multiline modifier, which should make life easier:
var input = "\n\
300x200x400\n\
50x80\n\
\n\
\n\
300x200\n\
50x80\n\
100x100x200x100\n";
var allSizes = input.match(/^\d+x\d+/gm); // multiline modifier assumes each line has start and end
for (var size in allSizes)
console.log(allSizes[size]);
Prints:
300x200
50x80
300x200
50x80
100x100

Try this regex out
^[0-9]{1,4}x[0-9]{1,4}|[(\r\n|\r|\n)]+$
It'll match these inputs.
1x1
10x10
100x100
2000x2938
\n
\r
\r\n
but not this 100x100x200

Related

RegExp avoid double space and space before characters

I'm trying to write a regular expression in order to not allow double spaces anywhere in a string, and also force a single space before a MO or GO mandatory, with no space allowed at the beginning and at the end of the string.
Example 1 : It is 40 GO right
Example 2 : It is 40GO wrong
Example 3 : It is 40 GO wrong
Here's what I've done so far ^[^ ][a-zA-Z0-9 ,()]*[^;'][^ ]$, which prevents spaces at the beginning and at the end, and also the ";" character. This one works like a charm.
My issue is not allowing double spaces anywhere in the string, and also forcing spaces right before MO or GO characters.
After a few hours of research, I've tried these (starting from the previous RegExp I wrote):
To prevent the double spaces: ^[^ ][a-zA-Z0-9 ,()]*((?!.* {2}).+)[^;'][^ ]$
To force a single space before MO: ^[^ ][a-zA-Z0-9 ,()]*(?=\sMO)*[^;'][^ ]$
But neither of the last two actually work. I'd be thankful to anyone that helps me figure this out
The lookahead (?!.* {2} can be omitted, and instead start the match with a non whitespace character and end the match with a non whitespace character and use a single space in an optionally repeated group.
If the string can not contain a ' or ; then using [^;'][^ ]$ means that the second last character should not be any of those characters.
But you can omit that part, as the character class [a-zA-Z0-9,()] does not match ; and '
Note that using a character class like [^ ] and [^;'] actually expect a single character, making the pattern that you tried having a minimum length.
Instead, you can rule out the presence of GO or MO preceded by a non whitespace character.
^(?!.*\S[MG]O\b)[a-zA-Z0-9,()]+(?: [a-zA-Z0-9,()]+)*$
The pattern matches:
^ Start of string
(?!.*\S[MG]O\b) Negative lookahead, assert not a non whitspace character followed by either MO or GO to the right. The word boundary \b prevents a partial word match
[a-zA-Z0-9,()]+ Start the match with 1+ occurrences of any of the listed characters (Note that there is no space in it)
(?: [a-zA-Z0-9,()]+)* Optionally repeat the same character class with a leading space
$ End of string
Regex demo

Regular Expression to match text between # and only if # is not preceded by '

Hello I'm trying to find a regular expression that can help me find all matches inside a string when they're inside # and only if # are not preceded by an apostrophe "'".
Basically I need to bold the text just as here when we use double * to bold text like this, but the apostrophe should work as an escape character.
For example
#Hello my name is Noé# should look like Hello my name is Noé
#Hello this has an escape apostrophe '# so I'll match until here# should look like Hello this has an escape apostrophe '# so I'll match until here
Inside a long text there might or might not be several matches:
"Hello I'm a text #I'm bold#, and I need to know how to match my text that's inside two '#, and #I will not match either 'cause I got no end"
So i can print it like
"Hello I'm a text I'm bold, and I need to know how to match my text that's inside two '#, and #I will not match either 'cause I got no end"
If thats not possible with a RegExp I could program a finite state machine, but I was hoping I was possible, thank you in advance God bless you!
Note: I will handle the escape characters later by now I just need to know how to mach this
/(?<!')#.*(?<!')#/gim
This was the only thing I could come up with, but honestly, I have no idea how negative look behind works :(, with this regexp it would match wrong. For example, if I type:
"I'm a text #and I should be a match# and this should not #But this should as well# and I'm just some random extra text"
matches from the first # occurrence until the last one, like so:
"I'm a text #and I should be a match# and this should not #But this should as well# and I'm just some random extra text"
I think this should work:
(?<!')#(.*?)(?<!')#
Here you can see the regexp working with your examples: https://regex101.com/r/wnguiA/1
(?<!') is Negative Lookbehind, it tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a b that is not preceded by an a.
More easy is the (.*?) that matches any character (except for line terminators); adding ? tells the capturing group to be not-greedy and stop at the first occourence of the succesive token.
To prevent triggering the negatilve lookbehind at all the positions not asserting a ' to the left, you can also first match # and do the assertion after it.
#(?<!'#)(.*?)#(?<!'#)
Regex demo
Another option instead of using the non greedy .*? is to use a negated character class matching any char except #
Then when you encounter # only match it if there is ' before it using a positive lookbehind.
#(?<!'#)([^#\n]*(?:#(?<='#)[^#\n]*)*)#(?<!'#)
#(?<!'#) Match # not directly preceded by '
( Capture group 1
[^#\n]* Optionally match any char except # or a newline
(?: Non capture group
#(?<='#) Match # not directly preceded by '
[^#\n]* Match optional repetitions of any char except # or a newline
)* Close non capture group and optionally repeat it to match all occurrences
) Close group 1
#(?<!'#) Match # not directly preceded by '
Regex demo

Create a regex to extract a string that contain a noral character and escaped string without DOS

I have a string like this:
///////AB?\a\b\c\d\d\e\\f\a\a\b\cd\ed\fmnopqrstuvwxy\z\a\a\a\a\a\a\a\a\a///imgy
it started with /// and ended with ///imgy (i and/or m and/or g and/or y), and between the beginning and end are the character are normal character like a or escaped character like \a.
Here is my regex:
/^\/{3}((?:\\?[\s\S])+?)\/{3}([imgy]{0,4})(?!\w)/
But the problem is that it is reported as "vulnerable to denial-of-service attacks". The main part that has the problem is
(?:\\?[\s\S])+
How can I create a right one that can figure out both a and \a? Thank you!
Regex Demo
Update:
I just found to use the following regex:
(?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?)
to replace the old problematic part (?:\\?[\s\S])+?, and in this way, it can avoid requires exponential time to match certain inputs, and avoid vulnerable to denial-of-service attacks.
The details:
(?:\\[\s\S]+?) match any \a
(?:(?<!\\)[\s\S]+?) match any a, but not following \.
(?:(?<=\\\\)[\s\S]+?) match any a, but much following \\. This to make sure f is matched that following \\.
So the whole regex will look like this:
^\/{3}((?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?))\/{3}([imgy]{0,4})(?!\w)
You might list the characters that are allowed to a character class, and optionally repeat an escaped character [a-z]
^\/{3,}[A-Za-z?]+(?:\\[a-z\\][A-Za-z?]*)*\/\/\/[imgy]{0,4}$
The pattern matches:
^ Start of string
\/{3,}[A-Za-z?]+ Match 3 or more / and 1 or more times any of the listed allowed chars
(?: Non capture group
\\[a-z\\] Match an escaped char a-z or \\
[A-Za-z?]* Optionally match any of the listed
)* Close an optionally repeat the group
\/\/\/[imgy]{0,4} Match /// and 0-4 times any of i m g or y If there should be at least a single char, you can use {1,4}
$ End of string
Regex demo

Regex to match first character and no more than 2 identical consecutives

I'm using jQuery to add a pattern attribute to a text field based on a letter I select from an array. I'm trying to restrict the values that text field can accept with a regex, but it doesn't work properly.
What I want is that the first char of the value must be the letter I choose of the array, and then don't accept more than 2 identical consecutive caracters.
My regex is this:
^["+letter+"](?!(.)\1).{2}.*
And it seems to work when I'm testing it in regexr.com, but when I test it in my page, just the part of match the 1st char works, and the rest don't. When I type something like "Aaaaron", the message of "invalid entry" doesn't show.
Thanks in advance.
Description
^(.)(?!\1{2})
This regex will do the following:
capture the first character
validate the first character is then not repeated 2 more times. If two more of the same character are present after the first occurrence, then you have 3 of the same characters in a row.
Note to make this expression view upper and lower case versions of a letter as the same character you'll need to use the case insensitive flag.
Live Example
https://regex101.com/r/xG9mE9/2
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\1{2} what was matched by capture \1 (2 times)
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
try this regex: (with letter being 'a')
$("form").find("input[type=text]").attr("pattern", "(?=[Aa])(?!.*(.)\\1\\1).*");
it validates the starting letter, and that no character appears more than 2 times consecutively:
jsfiddle
notes:
you can't do case insensitive matching with HTML5 pattern attribute, so 'a' and 'A' are not the same thing ('Aaaron' isn't 3 a's in a row)
if adding pattern via a string (not a regex literal) in jquery/javascript, remember there's string interpolation first and then regex interpolation second (the backslash means something to String as well to Regex, you might need to double escape them: (\\1 for a backreference in this case)
you don't need ^ or $ to make the input value match the entire pattern only, the regex is wrapped in ^(?:regex)$ for you. This means that if your pattern does not consume the entire string it will not work: (?=[Aa])(?!.*(.)\\1\\1), which are just a couple of lookarounds, and would normally validate the input just fine, is a zero-width pattern, and without the .* at the end, does not work.

regular expression to replace with ','

I have one RegExp, could anyone explain exactly what it does?
Regexp
b=b.replace(/(\d{1,3}(?=(?:\d\d\d)+(?!\d)))/g,"$1 ")
I think it is replacing with space(' ')
if i'm right, i want to replace it with comma(,) instead of space(' ').
To explain the regex, let's break it down:
( # Match and capture in group number 1:
\d{1,3} # one to three digits (as many as possible),
(?= # but only if it's possible to match the following afterwards:
(?: # A (non-capturing) group containing
\d\d\d # exactly three digits
)+ # once or more (so, three/six/nine/twelve/... digits)
(?!\d) # but only if there are no further digits ahead.
) # End of (?=...) lookahead assertion
) # End of capturing group
Actually, the outer parentheses are unnecessary if you use $& instead of $1 for the replacement string ($& contains the entire match).
The regex (\d{1,3}(?=(?:\d\d\d)+(?!\d))) matches any 1-3 digits ((\d{1,3}) that is followed by a multiple of 3 digits ((?:\d\d\d)+), that isn't followed by another digit ((?!\d)). It replaces it with "$1 ". $1 is replaced by the first capture group. The space behind it is... a space.
See regexpressions on mdn for more information about the different syntaxes.
If you want to seperate the numbers with a comma, instead of a space, you'll need to replace it with "$1," instead.
Don't try to solve everything by using regular expressions.
Regular expressions are meant for matching, not to fix non-text-encoded-as-text formatting.
If you want to format numbers differently, extract them and use format strings to reformat them on a character processing level. That is just an ugly hack.
It is okay to use regular expressions to find the numbers in the text, e.g. \d{4,} but trying to do the actual formatting with regexp is a crazy abuse.

Categories

Resources