Perl/Javascript Regular Expression with an 'and' operator - javascript

I need to match incorrect backslashes in a text. The following text is an example:
\.br\ Random Words \.br\\1 Testing\.br\2\ Check
So the \.br\ are correct, however the backslashes in \1 and 2\ are not.
So I attempted a regular expression to match any \ which is not followed by a .br but that failed because it would match the closing \ in \.br\
I then looked up a few similar questions on stackoverflow and most of them stated that a series of lookaheads can be used as an 'and' operator and so I tried this:
/(?!\\\.br)\\(?!\.br\\)/
What I attempted to do, was match any backslash that was neither precedeed by a \.br nor followed by a .br\ but it didn't seem to work.
Any help would be appreciated. I hope I haven't missed out any details in the question.
Thanks,
Sid

Close. (?!PAT) means "not followed by PAT". You want "not preceded by PAT".
(?<!\\\.br)\\(?!\.br\\)
The following will be a bit faster:
\\(?<!\\\.br\\)(?!\.br\\)

I would use perl, and with a \G anchor and a \K meta character (and some atomic/possessive parts to improve efficiency):
\G(?>\\\.br\\|[^\\]++)*+\K\\
It should be faster than using lookarounds, since there's no duplication of matches (going over the same substring more than once, which is what lookarounds do).
regex101 demo.
Matches completed with 24 and 21 steps respectively (as opposed to using lookarounds using 36 and 22 steps, plus 4 failing steps).

(?:\\(?!\.br)\\)+(\S+)
The regex above will capture those characters inside backslashes that are not .br.
*Please note that the number 2 in \.br\2\ will not be captured as .br\ is correctly typed.

Related

How to replace my current regular expression without using negative lookbehind

I have the following regular expression which matches on all double quotes besides those that are escaped:
i.e:
The regular expression is as follows:
((?<![\\])")
How could I alter this to no longer use the negative lookbehind as it is not supported on some browsers?
Any help is greatly appreciated, thanks!
I wasn't able to get anything currently working
You can match
/\\"|(")/
and keep only captured matches. Being so simple, it should work with most every regex engine.
Demo
This matches what you don't want (\\")--to be discarded--and captures what you do want (")--to be kept.
This technique has been referred to by one regex expert as The Greatest Regex Trick Ever. To get to the punch line at the link search for "(at last!)".
Neither of these may be a completely satisfactory solution.
This regex won't just match unescaped ", there's additional logic required to check if the 1st character of captured groups is " and adjust the match position.:
(?:^|[^\\])(")
This may be a better choice, but it depends on positive lookahead - which may have the same issue as negative lookbehind.
Version 1a (again requires additional logic)
(?:^|\b)(?=[^\\])(")
Version 2a (depends on positive lookahead)
(?:^|\b|\\\\)(?=[^\\])(")
Assuming you need to also handle escaped slashes followed by escaped quotes (not in the question, but ok):
Version 1a (requires the additional logic):
(?:^|[^\\]|\\\\)(")
Building on this answer, I'd like to add that you may also want to ignore escaped backslashes, and match the closing quote in this string:
"ab\\"
In that case, /\\[\\"]|(")/g is what you're after.

JS RegEx for finding number of lines in a page, separated by form feed \f

I have a use case that requires a plain-text file to have lines to consist of at most 38 characters, and 'pages' to consist of at most 28 lines. To enforce this, I'm using regular expressions. I was able to enforce the line-length without any problems, but the page-length is proving to be much trickier.
After several iterations, I came to the following as a regular expression that I feel should work, but it isn't.
let expression = /(([^\f]*)(\r\n)){29,}\f/;
It simply results in no matches.
If anyone could provide some feedback, I'd greatly appreciate it! - Jacob
Edit 1 - removed code block around second expression, it was probably making my question confusing.
Edit 2 - removed following text, it's not pertinent:
As a comparison, the following expression results in a single match, the entire document. I'm assuming it's matching all lines up until the final
let expression = /(.*(\r\n)){29,}
Edit 3 - So after some thinking, I realized that my issue is due to the initial section of the regex that matches any characters before a newline is including newlines. Therefore, I believe I need to match any characters before a newline EXCEPT (\f\r\n). However, I'm now having trouble implementing this. I tried the following:
let expression = /([^\f^\r^\n]*(\r\n)){29,}\f/;
But it's also not matching. I'm assuming that my negations are wrong...
Edit 4 - I have the following regex that matches each line: let expression = /([^\f\r\n]{0,}(\r\n))/;
This is pretty close to what I want. All I need now is to match any instances of 29 or more lines followed by \f
Thanks for all the help to those who commented, a friend ended up helping me get the final regex
let expression = /([^\f\r\n]*?\r??\n){29,}?\f/;
Edit:
As you clarified more your problem, and provided your updated regex:
/([^\f^\r^\n]*(\r\n)){29,}\f/;
Your negations are not right here, use [^\f\r\n] instead of [^\f^\r^\n]. This will negate all of \f, \r, and \n.
So, your regex becomes:
/([^\f\r\n]*(\r\n)){29,}\f/;
This will match 29 or more lines of characters (that can be anything but \f, \r or \n), the whole thing followed by a single \f.
Original answer:
Your current regular expression:
let expression = /(([^\f]*)(\r\n)){29,}\f/;
Matches strings that consist of 29 or more lines (separated by \r\n), the whole thing followed by one single \f.
As far as I understood, you want each of your lines to end with \f. Did you mean to include the \f inside?
let expression = /(([^\f]*)(\r\n\f)){29,}/;

How to check for odd numbers of backslashes in a regex using Javascript?

I have recently asked a question regarding an error I have been getting using a RegExp constructor in Javascript with lookbehind assertion.
What I want to do it, to check for a number input bigger than 5 preceded by an odd number of backslash, in other words, that is not preceded by an escaped backslash
Here is an example.
\5 // match !
\\5 // no match !
\\\5 // match!
The Regex I found online is
(?<!\\)(?:\\{2})*\\(?!\\)([5-9]|[1-9]\d)
But the problem here is that (?<!\\) causes a problem with javascript throwing an error invalid regex group.
Is there a workaround for this ?
Finally, I know that my current regex also may have an error regarding the detection of a number larger than 5, for example \55 will not match. I would appreciate your help.
thank you
JS doesn't support lookbehinds (at least not all major browsers do), hence the error. You could try:
(?:^|[^\\\n])\\(?:\\{2})*(?![0-4]\b)\d+
Or if you care about decimal numbers:
(?:^|[^\\\n])\\(?:\\{2})*(?![0-4](?:\.\d*)?\b)\d+(?:\.\d*)?
Live demo
Note: You don't need \n if you don't have multi line text.
Regex breakdown:
(?: Beginning of non-capturing group
^ Start of line
| Or
[^\\\n] Match nothing but a backslash
) End of non-capturing group
\\(?:\\{2})* Match a backslash following even number of it
(?![0-4](?:\.\d*)?\b) Following number shouldn't be less than 5 (care about decimal numbers)
\d+(?:\.\d*)? Match a number
JS code:
var str = `\\5
\\\\5
\\\\\\5
\\\\\\4
\\4.
\\\\\\6
`;
console.log(
str.match(/(?:^|[^\\\n])\\(?:\\{2})*(?![0-4](?:\.\d*)?\b)\d+(?:\.\d*)?/gm)
)

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Regular expression - JavaScript

I just wanted to ask for an example of a string that would match this regular expression for JS:
/\/[a-z]{2,6}\/(\([0-9]+\)?$/
But this bit confuses me: /(\([0-9]+\)?$/
If I could get an example of a string using this regular expression and a brief explanation, that would be enough to clear it up for me.
Thanks!
EDITED: Sorry for the trouble, I missed a parentheses, however I just want to clear up, a string that would match would be such as:
/ab/(12345) or /abcdef/(1) etc right?
Here is a visual representation of the #Paul Roub explanation:
EDITED with last pattern
\/[a-z]{2,6}\/\([0-9]+\)$
Debuggex Demo
This example is OK : /abcdef/(12345)
\/
A forward slash (escaped), followed by
[a-z]{2,6}
between two and six lowercase English letters, followed by
\/
another slash
([0-9]+)?
the inside part - one or more digits. ? means "zero or one", so we're looking for a string of digits, or nothing. The parentheses would let this number be captured as a group for later processing
$
and the end of the string.
Things that would match:
/ab/0
/ab/
/acdefg/12345
things that would not match
/a/0
/abcdefgh/12345
/ab/0x
That regexp is malformed.
Opening bracket is escaped, but closing one not
Opening paren is not escaped but closing one is
I think the intended regexp was:
/\/[a-z]{2,6}\/([0-9]+)?$/
This would match:
/ab/1
/ab/
/abc/123
Wouldn't match:
/a/1
ab/1
/abcdefg/123
Cheers.

Categories

Resources