RegExp - How can I match the shortest amount possible? [duplicate] - javascript

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 3 years ago.
I have my Regular Expression /'(.*)(?:(?:'\s*,\s*)|(?:'\)))/
and my test code ('He said, "You're cool."' , 'Rawr')
(My test code simulates parameters being passed into a function.)
I will explain my Regular Expression as I understand it and hopefully a few of you can shed some light on my problem.
1)/' means at the beginning of the matched string, there needs to be '
2)(.*) means capture any character except \n 0 or more times
3)(?:(?:4)|(?:5)) means don't capture but try to do step 4 and if it doesn't work try step 5
4)(?:'\s*,\s*) means don't capture but there needs to be a ' with 0 or more whitespace characters followed by a , with 0 or more whitespace characters
5)(?:'\)) means don't capture but there needs to be ')
So it seems that it should return this (and this is what I want):
'+He said, "You're cool."+' ,
But it returns:
'+He said, "You're cool."' , 'Rawr+')
If I change my test code to ('He said, "You're cool."' , 'Rawr' (no end parenthesis) it returns what I want, but as soon as I add that last parenthesis, then it seems that my OR operator does whatever it wants to. I want it to test first if there is a comma, and break there if there is one, and if there is not one check for a parenthesis.
I've tried switching the spots of step 4 and step 5, but still the OR operator seems to always default to the (?:'\)) side.
How can I match the shortest amount possible?

I don't think your problem is the OR operator, but the greediness of the .*. It will match your full string, and then back-track until the following expressions match. The first match in this backtracking process will be 'He said, "You're cool."' , 'Rawr+'). Try .*? instead!

Related

How to check the sting is alphanumeric or not in peggyjs [duplicate]

This question already has answers here:
RegEx for Javascript to allow only alphanumeric
(22 answers)
Closed 1 year ago.
I want to check the given string is alphanumeric or not. i.e. the expected output is as follows
123 should retun false
abc should retun false
a123 should retun true
1a23 should retun true
I tried with the ^[a-zA-Z0-9]*$ regex. It is not working as expected. Can anyone suggest the working peggyjs regex? Thanks.
You can assert not only digits, and match at least a single digit restricticting to match to only a-z or a digit.
Using a case insensitive match:
^(?!\d+$)[a-z]*\d[a-z\d]*$
Regex demo
If you know the order (letters then numbers for example) you can do .*[a-zA-Z].*[0-9]
But I assume you can't make such assumptions so I would use the slightly more complex ^(?=.*[a-zA-Z])(?=.*[0-9]).* which means "a letter somewhere later, and also a number somewhere later".
PS : you can replace all [0-9] by \d if you like.
Edit : that's only assuming you don't get other kinds of characters, use Alireza's regex instead if you need to.

JS RegEx for finding number of lines in a page, separated by form feed \f

I have a use case that requires a plain-text file to have lines to consist of at most 38 characters, and 'pages' to consist of at most 28 lines. To enforce this, I'm using regular expressions. I was able to enforce the line-length without any problems, but the page-length is proving to be much trickier.
After several iterations, I came to the following as a regular expression that I feel should work, but it isn't.
let expression = /(([^\f]*)(\r\n)){29,}\f/;
It simply results in no matches.
If anyone could provide some feedback, I'd greatly appreciate it! - Jacob
Edit 1 - removed code block around second expression, it was probably making my question confusing.
Edit 2 - removed following text, it's not pertinent:
As a comparison, the following expression results in a single match, the entire document. I'm assuming it's matching all lines up until the final
let expression = /(.*(\r\n)){29,}
Edit 3 - So after some thinking, I realized that my issue is due to the initial section of the regex that matches any characters before a newline is including newlines. Therefore, I believe I need to match any characters before a newline EXCEPT (\f\r\n). However, I'm now having trouble implementing this. I tried the following:
let expression = /([^\f^\r^\n]*(\r\n)){29,}\f/;
But it's also not matching. I'm assuming that my negations are wrong...
Edit 4 - I have the following regex that matches each line: let expression = /([^\f\r\n]{0,}(\r\n))/;
This is pretty close to what I want. All I need now is to match any instances of 29 or more lines followed by \f
Thanks for all the help to those who commented, a friend ended up helping me get the final regex
let expression = /([^\f\r\n]*?\r??\n){29,}?\f/;
Edit:
As you clarified more your problem, and provided your updated regex:
/([^\f^\r^\n]*(\r\n)){29,}\f/;
Your negations are not right here, use [^\f\r\n] instead of [^\f^\r^\n]. This will negate all of \f, \r, and \n.
So, your regex becomes:
/([^\f\r\n]*(\r\n)){29,}\f/;
This will match 29 or more lines of characters (that can be anything but \f, \r or \n), the whole thing followed by a single \f.
Original answer:
Your current regular expression:
let expression = /(([^\f]*)(\r\n)){29,}\f/;
Matches strings that consist of 29 or more lines (separated by \r\n), the whole thing followed by one single \f.
As far as I understood, you want each of your lines to end with \f. Did you mean to include the \f inside?
let expression = /(([^\f]*)(\r\n\f)){29,}/;

Simple Regex skips always first character on a line [duplicate]

This question already has answers here:
Regex that can match empty string is breaking the javascript regex engine
(2 answers)
Closed 4 years ago.
I have a very strange effect when using a particular regex in JavaScript. If I use /^|.+/gm, it always skips the first character on a line.
According to regex101.com, it doesn't happen with pcre (php), but does happen in JavaScript, Python, and GoLang. Any ideas on why this could be happening?
In Javascript, empty matches still increment the current index being searched in the string by one. If anything is matched starting at position X in the string, the next match must start at least at position X + 1. (PCRE does not exhibit this behavior; empty matches which don't consume any characters permit an additional non-empty match immediately following that empty match)

Why character class and capturing group show different results in javascript regexp for a whitespace character followed by a dot?

I was solving the chapter exercises from this book - http://eloquentjavascript.net/09_regexp.html
There is a question where I need to write a regular expression for a whitespace character followed by a dot, comma, colon, or semicolon.
I wrote this one
var re1 = /\s(.|,|:|;)/;
The book had this as answer
var re2 = /\s[.,;:]/;
I understand that the second one is correct, and it is more efficient. But leaving behind efficiency, the first one should also give correct results.
The first one doesn't give correct output for the following piece of code -
console.log(re1.test("escape the dot")); // prints true
It should have given "false" but it outputs the opposite. I couldn't understand this. I tried https://www.debuggex.com/ too, but the figure also seems to be okay!
It seems that I am missing some understanding from my end.
Just as I finished this question to post, I realised my mistake that was giving me the wrong output. So, I thought I would rather share both the question and answer here so as to help anyone who might face some similar problem in future.
The thing is the period (dot) itself, when used between square brackets, loses its special meaning. The same goes for other special characters, such as +.
But they retain their special meaning when used in a capturing group.
So, the code
var re1 = /\s(.|,|:|;)/;
console.log(re1.test("escape the dot")); // prints true
is rather looking for the pattern - a space followed by either a character that's not newline ( because of period ), or any of comma, colon, and semi-colon.
To get the correct output, the correct re, if used with capturing group, would be,
var re1 = /\s(\.|,|:|;)/;

Regex: allow everything but some selected characters [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 5 years ago.
I would like to validate a textarea and I just don't get regex (It took me the day and a bunch of tutorials to figure it out).
Basically I would like to be able to allow everything (line breaks and chariots included), but the characters that could be malicious( those which would lead to a security breach).
As there are very few characters that are not allowed, I assume that it would make more sense to create a black list than a white one.
My question is: what is the standard "everything but" in Regex?
I'm using javascript and jquery.
I tried this but it doesn't work (it's awful, I know..):
var messageReg = /^[a-zA-Z0-9éèêëùüàâöïç\"\/\%\(\).'?!,#$#§-_ \n\r]+$/;
Thank you.
If you want to exclude a set of characters (some punctuation characters, for example) you would use the ^ operator at the beginning of a character set, in a regex like
/[^.?!]/
This matches any character that is not ., ?, or !.
You can use the ^ as the first character inside brackets [] to negate what's in it:
/^[^abc]*$/
This means: "from start to finish, no a, b, or c."
As Esailija mentioned, this won't do anything for real security.
The code you mentioned is almost a negated set, as murgatroid99 mentioned, the ^ goes inside the brackets. So the regular expression will match anything that is not in that list. But it looks like you really want to strip out those characters, so your regexp doesn't need to be negated.
Your code should look like:
str.replace(/[a-zA-Z0-9éèêëùüàâöïç\"\/\%\(\).'?!,#$#-_ \n\r]/g, "");
That says, remove all the characters in my regular expression.
However, that is saying you don't want to keep a-zA-Z0-9 are you sure you want to strip those out?
Also, chrome doesn't like § in Regular Expressions, you have to use the \x along with the hex code for the character

Categories

Resources