Unable to find a string matching a regex pattern - javascript

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?

This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg

If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.

If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Related

regex validating if string ends with specific set of words [duplicate]

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

How to prevent regex characters from being changed after page is rendered?

I'am stuck after searching and trying several tests, but just can't figure out how to fix the following issue.
I use these characters \x3c, \x3e and \x22 in a regEx and save is in a variable in *.component.ts but when I use the variable in the markup/HTML, it turns it into <, > and ". the result is that my Pattern doesn't work as expected.
Here is one of test on regex101.com and as you can see it works as it should be:
^(?=.*[a-zA-Z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
How can I prevent this and keep the characters as they are in the original when the page is rendered? Is it a behavior of TypeScript or JavaScript browser engine or what? Any hint would be great.
First of all, you need to use double backslashes to introduce literal backslashes into the regex patterns. I.e. if you write "\x22" as a string literal, it is in fact a mere ". So, to define \x22 in a string literal, write "\\x22".
Then, you have
^(?=.*[a-zA-Z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
The lookahead here is redundant because it requires the same set of chars as is required by the consuming part. The lookahead can be removed, or better replaced with the one you need, (?=[^A-Z]*[A-Z]), requiring at least 1 uppercase ASCII letter:
^(?=[^A-Z]*[A-Z])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
As a string literal:
"^(?=[^A-Z]*[A-Z])[A-Za-z\\d!\\x22#$%&'()*+,.:;\\x3c=\\x3e?#[\\]^_`{|}~/\\\\-]{8,50}$"
See the regex demo.

Regex finding the last string that doesnt contain a number

Usually in my system i have the following string:
http://localhost/api/module
to find out the last part of the string (which is my route) ive been using the following:
/[^\/]+$/g
However there may be cases where my string looks abit different such as:
http://localhost/api/module/123
Using the above regex it would then return 123. When my String looks like this i know that the last part will always be a number. So my question is how do i make sure that i can always find the last string that does not contain a number?
This is what i came up with which really stricty matches only module for the following lines:
http://localhost/api/module
http://localhost/api/module/123
http://localhost/api/module/123a
http://localhost/api/module/a123
http://localhost/api/module/a123a
http://localhost/api/module/1a3
(?!\w*\d\w*)[^\/][a-zA-Z]+(?=\/\w*\d+\w*|$)
Explanation
I basically just extended your expression with negative lookahead and lookbehind which basically matches your expression given both of the following conditions is true:
(?!\w*\d\w*) May contain letters, but no digits
[a-zA-Z]+ Really, truly only consists of one or more letters (was needed)
(?=\/\d+|$)The match is either followed by a slash, followed by digits or the end of the line
See this in action in my sample at Regex101.
partYouWant = urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
Here it is in action:
urlString="http://localhost/api/module/123"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
urlString="http://localhost/api/module"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
It just uses a capture expression to find the last non-numeric part.
It's going to do this too, not sure if this is what you want:
urlString="http://localhost/api/module/123/456"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
/([0-9])\w+/g
That would select the numbers. You could use it remove that part from the url. What language are you using it for ?

Javascript Regex Conditional

I have strings like this:
#WTK-56491650H #=> want to capture '56491650H'
#M123456 #=> want to capture 'M123456'
I want to match everything after the # unless there is a dash then I want everything after the dash. I have a feeling I'm close but maybe not. I've found a lot of stuff about javascript regex conditionals and I can never get it to do the if then else part. It only matches after the # and that's it.
This is what I have so far:
/((?=-{1})-(.+)|(?!-{0)#(.+))/
And the demo: https://regex101.com/r/bY0yC6/1
You can use this regex with an optional match to consume everything between # and -:
/#(?:[^-]*-)?([^#-]+)$/mg
Updated RegEx Demo
Here's a solution which uses non-capturing groups (?:stuff) which I prefer so I don't have to dig through the result groups to find the string I'm interested in.
(?:#)(?:[\w\d]+-)?([\w\d]+)
First it throws out the # character, then throws out the stuff up to and including the - character, if it is there, then groups the rest as your match.
With a single regular expression, your full match will always contain the hash and/or dash because you are using it to define an acceptable string, but the groupings of a match can provide you the information that you're looking for.
you want the string to start with a hash so your regex should contain the #
next, you don't want anything before and including a dash (.*-)?, and we add a question mark because this is an optional part (ie if there is no dash)
finally, we can grab everything that is left into a final group, which will be your answer (.*)
the full expression is then #(.*-)?(.*) as pointed out by Lux

JS regular expression, basic lookahead

I cannot figure out, for the life of me, why this regular expression
^\.(?=a)$
does not match
".a"
anyone know why?
I am going off the information provided here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
The reason it doesn't work is because the lookahead doesn't actually consume any characters, so your matching position doesn't advance.
^\.(?=a)$
Matches the beginning of line (^ -- this matches) followed by a literal . (\. -- this also matches), and then (without consuming any characters), checks to see if the next character is a literal a ((?=a)). It is, so the lookahead matches. It then asserts that your position is at the end of the string ($). This is not the case, because we're still right after the ., so the match fails.
Another possible matching expression would be
^\.(?=a$)
Which works just as above, but the assertion about the end of the line is contained in the lookahead, so this time, it matches.
Your regex is only going to match a period that's followed by an 'a', without including 'a' in the match.
Another issue is that you're using $ after a character that's basically being ignored.
Remove the $ and it will work as described.
Bonus: I've enjoyed using this lately http://www.regexpal.com/

Categories

Resources