RegExp avoid double space and space before characters - javascript

I'm trying to write a regular expression in order to not allow double spaces anywhere in a string, and also force a single space before a MO or GO mandatory, with no space allowed at the beginning and at the end of the string.
Example 1 : It is 40 GO right
Example 2 : It is 40GO wrong
Example 3 : It is 40 GO wrong
Here's what I've done so far ^[^ ][a-zA-Z0-9 ,()]*[^;'][^ ]$, which prevents spaces at the beginning and at the end, and also the ";" character. This one works like a charm.
My issue is not allowing double spaces anywhere in the string, and also forcing spaces right before MO or GO characters.
After a few hours of research, I've tried these (starting from the previous RegExp I wrote):
To prevent the double spaces: ^[^ ][a-zA-Z0-9 ,()]*((?!.* {2}).+)[^;'][^ ]$
To force a single space before MO: ^[^ ][a-zA-Z0-9 ,()]*(?=\sMO)*[^;'][^ ]$
But neither of the last two actually work. I'd be thankful to anyone that helps me figure this out

The lookahead (?!.* {2} can be omitted, and instead start the match with a non whitespace character and end the match with a non whitespace character and use a single space in an optionally repeated group.
If the string can not contain a ' or ; then using [^;'][^ ]$ means that the second last character should not be any of those characters.
But you can omit that part, as the character class [a-zA-Z0-9,()] does not match ; and '
Note that using a character class like [^ ] and [^;'] actually expect a single character, making the pattern that you tried having a minimum length.
Instead, you can rule out the presence of GO or MO preceded by a non whitespace character.
^(?!.*\S[MG]O\b)[a-zA-Z0-9,()]+(?: [a-zA-Z0-9,()]+)*$
The pattern matches:
^ Start of string
(?!.*\S[MG]O\b) Negative lookahead, assert not a non whitspace character followed by either MO or GO to the right. The word boundary \b prevents a partial word match
[a-zA-Z0-9,()]+ Start the match with 1+ occurrences of any of the listed characters (Note that there is no space in it)
(?: [a-zA-Z0-9,()]+)* Optionally repeat the same character class with a leading space
$ End of string
Regex demo

Related

Create a regex to extract a string that contain a noral character and escaped string without DOS

I have a string like this:
///////AB?\a\b\c\d\d\e\\f\a\a\b\cd\ed\fmnopqrstuvwxy\z\a\a\a\a\a\a\a\a\a///imgy
it started with /// and ended with ///imgy (i and/or m and/or g and/or y), and between the beginning and end are the character are normal character like a or escaped character like \a.
Here is my regex:
/^\/{3}((?:\\?[\s\S])+?)\/{3}([imgy]{0,4})(?!\w)/
But the problem is that it is reported as "vulnerable to denial-of-service attacks". The main part that has the problem is
(?:\\?[\s\S])+
How can I create a right one that can figure out both a and \a? Thank you!
Regex Demo
Update:
I just found to use the following regex:
(?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?)
to replace the old problematic part (?:\\?[\s\S])+?, and in this way, it can avoid requires exponential time to match certain inputs, and avoid vulnerable to denial-of-service attacks.
The details:
(?:\\[\s\S]+?) match any \a
(?:(?<!\\)[\s\S]+?) match any a, but not following \.
(?:(?<=\\\\)[\s\S]+?) match any a, but much following \\. This to make sure f is matched that following \\.
So the whole regex will look like this:
^\/{3}((?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?))\/{3}([imgy]{0,4})(?!\w)
You might list the characters that are allowed to a character class, and optionally repeat an escaped character [a-z]
^\/{3,}[A-Za-z?]+(?:\\[a-z\\][A-Za-z?]*)*\/\/\/[imgy]{0,4}$
The pattern matches:
^ Start of string
\/{3,}[A-Za-z?]+ Match 3 or more / and 1 or more times any of the listed allowed chars
(?: Non capture group
\\[a-z\\] Match an escaped char a-z or \\
[A-Za-z?]* Optionally match any of the listed
)* Close an optionally repeat the group
\/\/\/[imgy]{0,4} Match /// and 0-4 times any of i m g or y If there should be at least a single char, you can use {1,4}
$ End of string
Regex demo

Regex not to allow to consecutive dot characters and more

I am trying to make a JavaScript Regex which satisfies the following conditions
a-z are possible
0-9 are possible
dash, underscore, apostrophe, period are possible
ampersand, bracket, comma, plus are not possible
consecutive periods are not possible
period cannot be located in the start and the end
max 64 characters
Till now, I have come to following regex
^[^.][a-zA-Z0-9-_\.']+[^.]$
However, this allows consecutive dot characters in the middle and does not check for length.
Could anyone guide me how to add these 2 conditions?
You can use this regex
^(?!^[.])(?!.*[.]$)(?!.*[.]{2})[\w.'-]{1,64}$
Regex Breakdown
^ #Start of string
(?!^[.]) #Dot should not be in start
(?!.*[.]$) #Dot should not be in start
(?!.*[.]{2}) #No consecutive two dots
[\w.'-]{1,64} #Match with the character set at least one times and at most 64 times.
$ #End of string
Correction in your regex
- shouldn't be in between of character class. It denotes range. Avoid using it in between
[a-zA-Z0-9_] is equivalent to \w
Here is a pattern which seems to work:
^(?!.*\.\.)[a-zA-Z0-9_'-](?:[a-zA-Z0-9_'.-]{0,62}[a-zA-Z0-9_'-])?$
Demo
Here is an explanation of the regex pattern:
^ from the start of the string
(?!.*\.\.) assert that two consecutive dots do not appear anywhere
[a-zA-Z0-9_'-] match an initial character (not dot)
(?: do not capture
[a-zA-Z0-9_'.-]{0,62} match to 62 characters, including dot
[a-zA-Z0-9_'-] ending with a character, excluding dot
)? zero or one time
$ end of the string
Here comes my idea. Used \w (short for word character).
^(?!.{65})[\w'-]+(?:\.[\w'-]+)*$
^ at start (?!.{65}) look ahead for not more than 64 characters
followed by [\w'-]+ one or more of [a-zA-Z0-9_'-]
followed by (?:\.?[\w'-]+)* any amount of non capturing group containing a period . followed by one or more [a-zA-Z0-9_'-] until $ end
And the demo at regex101 for trying

Regular expression to match line separated size strings

I am writing a reular expression to validate input string, which is a line separated list of sizes ([width]x[height]).
Valid input example:
300x200
50x80
100x100
The regular expression I initially came up with is (https://regex101.com/r/H9JDjA/1):
^(\d+x\d+[\r\n|\r|\n]*)+$
This regular expression matches my input but also matches this invalid input (size can't be 100x100x200):
300x200
50x80
100x100x200
Adding a word boundary at the end seems to have fixed this issue:
^(\d+x\d+[\r\n|\r|\n]*\b)+$
My questions:
Why does the initial regular expression without the word boundary fail? It looks like I am matching one or more instances of a \d+(number), followed by character 'x', followed by a \d+(number), followed by one or more new lines from various operating systems.
How to validate input having multiple training new line characters in this input? The following doesn't work for some kind of inputs like this:
500x500\n100x100\n\n\n384384
^(\d+x\d+[\r\n|\r|\n]\b)+|[\r\n|\r|\n]$
Isolate the problem with this target 100x100x200
For now, forget about the anchors in the regex.
The minimum regex is \d+x\d+ since it only has to be satisfied once
for a match to take place.
The maximum is something like this \d+x\d+ (?: (?:\r?\n | \r)* \d+x\d+ )*
Since \r?\n|\r is optional, it can be reduced to this \d+x\d+ (?: \d+x\d+ )*
The result, when you applied to the target string is:
100x100x200 matches.
But, since you've anchored the regex ^$, it is forced to break up
the middle 100 to make it match.
100x10 from \d+x\d+
0x200 from (?: \d+x\d+ )*
So, that is why the first regex seemingly matches 100x100x200.
To avoid all of that, just require a line break between them, and
make the trailing linebreaks optional (if you need to validate the whole
string, otherwise leave it and the end anchor off).
^\d+x\d+(?:(?:\r?\n|\r)+\d+x\d+)*(?:\r?\n|\r)*$
A better view of it
^
\d+ x \d+
(?:
(?: \r? \n | \r )+
\d+ x \d+
)*
(?: \r? \n | \r )*
$
Your initial regular expression "fails" because of the +:
^(\d+x\d+[\r\n|\r|\n]*)+$
-----------------------^ here
Your parenthesis pattern (\d+x\d+[\r\n|\r|\n]*) says match one or more number followed by an "x" followed by one or more number followed by zero or more newlines. The + after that says match one or more of the entire parenthesis pattern, which means that for an input like 100x200x300 your pattern matches 100x200 and then 200x300, so it looks like it matches the entire line.
If you're simply trying to extract dimensions from a newline-separated string, I would use the following regular expression with a multiline flag:
^(\d+x\d+)$
https://regex101.com/r/H9JDjA/2
Side note: In your expression, [\r\n|\r|\n] is actually saying match any one instance of \r, \n, |, \r, |, or \n (i.e. it's quite redundant, and you probably aren't meaning to match |). If you want to match a sequential set of any combination of \r or \n, you can simply use [\r\n]+.
You can use multiline modifier, which should make life easier:
var input = "\n\
300x200x400\n\
50x80\n\
\n\
\n\
300x200\n\
50x80\n\
100x100x200x100\n";
var allSizes = input.match(/^\d+x\d+/gm); // multiline modifier assumes each line has start and end
for (var size in allSizes)
console.log(allSizes[size]);
Prints:
300x200
50x80
300x200
50x80
100x100
Try this regex out
^[0-9]{1,4}x[0-9]{1,4}|[(\r\n|\r|\n)]+$
It'll match these inputs.
1x1
10x10
100x100
2000x2938
\n
\r
\r\n
but not this 100x100x200

Regex allow multiple work in a sentence

I'm trying to parse following sentences with regex (javascript) :
I wish a TV
I want some chocolate
I need fire
Currently I'm trying : I(\b[a-zA-Z]*\b){0,5}(TV|chocolate|fire) but it doesn't work. I also made some test with \w but no luck.
I want to allow any word (max 5 words) between "I" and the last word witch is predefined.
To account for non-word chars in-between words, you may use
/I(?:\W+\w+){0,5}\‌​W+(?:TV|chocolate|fir‌​e)/
See the regex demo
The point is that you added word boundaries, but did not account for spaces, punctuation, etc. (all the other non-word chars) between "words".
Pattern details:
I - matches the left delimiter
(?:\W+\w+){0,5}\‌​W+ - matches 0 to 5 sequences (due to the limiting quantifier {n,m}) of 1+ non-word chars (\W+) and 1+ word chars after them (\w+), and a \W+ at the end matches 1 or more non-word chars that must be present to separate the last matched word chars from the...
(?:TV|chocolate|fir‌​e) - matches the trailing delimiter
You need to add the whitespace after the I. Otherwise it wouldn´t capture the whole sentence.
I(\b[a-zA-Z ]*\b){0,5}(TV|chocolate|fire)
I greate site to test regex expressions is regexr
If you don't care about the spaces, use:
/I(\s[a-zA-Z]*\s?){0,5}(TV|chocolate|fire)/
Try
/I\s+(?:\w+\s+){0,5}(TV|chocolate|fire)/
(Test here)
Based on Stefan Kert version, but rely on right side spaces of each extra word instead of word boundaries.
It also accepts any valid "word" (\w) character words of any length and any valid spacing character (not caring for repetitions).

Can somebody explain this RegEx to me?

I see this line of code and the regular expression just panics me...
quickExpr = /^(?:[^#<]*(<[\w\W]+>)[^>]*$|#([\w\-]*)$)/
Can someone please explain little by little what it does?
Thanks,G
Here's what I can extract:
^ beginning of string.
(?: non-matching group.
[^#<]* any number of consecutive characters that aren't # or <.
(<[\w\W]+>) a group that matches strings like <anything_goes_here>.
[^>]* any number of characters in sequence that aren't a >.
The part after the | denotes a second regex to try if the first one fails. That one is #([\w\-]*):
# matches the # character. Not that complex.
([\w\-]*) is a group that matches any number of word characters or dashes. Basically Things-of-this-form
$ marks the end of the regex.
I'm no regex pro, so please correct me if I am wrong.
^(?:[^#<]*(<[\w\W]+>)[^>]*$|#([\w\-]*)$)
Assert position at the start of the string «^»
Match the regular expression below «(?:[^#<]*(<[\w\W]+>)[^>]*$|#([\w\-]*)$)»
Match either the regular expression below (attempting the next alternative only if this one fails) «[^#<]*(<[\w\W]+>)[^>]*$»
Match a single character NOT present in the list "#<" «[^#<]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «(<[\w\W]+>)»
Match the character "<" literally «<»
Match a single character present in the list below «[\w\W]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a single character that is a "word character" (letters, digits, etc.) «\w»
Match a single character that is a "non-word character" «\W»
Match the character ">" literally «>»
Match any character that is not a ">" «[^>]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «#([\w\-]*)$»
Match the character "#" literally «#»
Match the regular expression below and capture its match into backreference number 2 «([\w\-]*)»
Match a single character present in the list below «[\w\-]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a "word character" (letters, digits, etc.) «\w»
A - character «\-»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Created with RegexBuddy

Categories

Resources