JS regular expression, basic lookahead - javascript

I cannot figure out, for the life of me, why this regular expression
^\.(?=a)$
does not match
".a"
anyone know why?
I am going off the information provided here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

The reason it doesn't work is because the lookahead doesn't actually consume any characters, so your matching position doesn't advance.
^\.(?=a)$
Matches the beginning of line (^ -- this matches) followed by a literal . (\. -- this also matches), and then (without consuming any characters), checks to see if the next character is a literal a ((?=a)). It is, so the lookahead matches. It then asserts that your position is at the end of the string ($). This is not the case, because we're still right after the ., so the match fails.
Another possible matching expression would be
^\.(?=a$)
Which works just as above, but the assertion about the end of the line is contained in the lookahead, so this time, it matches.

Your regex is only going to match a period that's followed by an 'a', without including 'a' in the match.
Another issue is that you're using $ after a character that's basically being ignored.
Remove the $ and it will work as described.
Bonus: I've enjoyed using this lately http://www.regexpal.com/

Related

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Need help to find the right regex pattern to match

my RegEx is not working the way i think, it should.
[^a-zA-Z](\d+-)?OSM\d*(?![a-zA-Z])
I will use this regex in a javascript, to check if a string match with it.
Should match:
12345612-OSM34
12-OSM34
OSM56
7-OSM
OSM
Should not match:
-OSM
a-OSM
rOSMann
rOSMa
asdrOSMa
rOSM89
01-OSMann
OSMond
23OSM
45OSM678
One line, represents a string in my javascript.
https://www.regex101.com/r/xQ0zG1/3
The rules for matching:
match OSM if it stands alone
optional match if line starts with digit/s AND is followed by a -
optional match if line ends with digit/s
match all 3 above combined
no match if line starts with a character/word except OSM
no match if line end with chracter/word except OSM
I Hope someone can help.
You can use the following simplified pattern using anchors:
^(?:\d+-)?OSM\d*$
The flags needed (if matching multi-line paragraph) would be: g for global match and m for multi-line match, so that ^ and $ match the begin/end of each line.
EDIT
Changed the (\d+-) match to (?:\d+-) so that it doesn't group.
[^a-zA-Z](\d+-)?OSM\d*(?![a-zA-Z])
[^a-zA-Z] In regex, you specify what you want, not what you don't want. This piece of code says there must be one character that isn't a letter. I believe what you wanted to say is to match the start of a line. You don't need to specify that there's no letter, you're about to specify what there will be on the line anyway. The start of a regex is represented with ^ (outside of brackets). You'll have to use the m flag to make the regex multi-line.
(\d+-)? means one or more digits followed by a - character. The ? means this whole block isn't required. If you don't want foreign digits, you might want to use [0-9] instead, but it's not as important. This part of the code, you got right. However, if you don't need capture blocks, you could write (?:) instead of ().
\d*(?![a-zA-Z]) uses lookahead, but you almost never need to do that. Again, specifying what you don't want is a bad idea because then I could write OSMé and it would match because you didn't specify that é is forbidden. It's much simpler to specify what is allowed. In your case since you want to match line ends. So instead, you can write \d*$ which means zero or more digits followed by the end of the line.
/^(?:\d+-)?OSM\d*$/gm is the final result.

Regular expression - val.replace(/^[^a-zA-Z0-9]*|[^a-zA-Z0-9]*$/g,"'');

I was learning regular expression, It seems very much confusing to me for now.
val.replace(/^[^a-zA-Z0-9]*|[^a-zA-Z0-9]*$/g, '');
In the above expression
1) which part denotes not to include white space? as i am trying to exclude all non alphanumeric characters.
2) Since i don't want to use even '$' and ''(underscore) can i specify '$' & ''(underscore) in expression something like below?
val.replace(/^[^a-zA-Z0-9$_]*|[^a-zA-Z0-9$_]*/g, '');?
3) As 'x|y' specify that - "Find any of the alternatives specified". Then Why we have used something like this [^a-zA-Z0-9]|[^a-zA-Z0-9] which is same on both sides?
Please help me understand this, Finding it bit confused and difficult.
This regular expression replaces all starting and trailing non alphanumeric characters from the string.
It doesn't specifically specifies whitespace. It just negates every thing other than alphanumeric characters. Whatever inside square bracket is a character set - [Whatever]. A starting cap(^) INSIDE the character set says its a negation. So [^a-zA-Z0-9]* says zero or more characters which are other than a-z, A-z or 0-9.
The $ sign at the end says, to the end of string and nothing to do with $ and _ symbols. That will be already included in the character set as it all non alpha numeric characters.
Refer answer of #smathy.
Also just FYI, AFAIU regular expression can't be learned by scrolling a tutorial. You just need to go through the basics and try out the examples.
Some basic info.
When you read regular expressions, you read them from left to right. That's how the engine does it.
This is important in the case of alternations as the one on the left side(s) are always tried first.
But in the case of a $ (EOL or EOS) anchor, it might be easier to read from right to left.
Built-in assertions like line break anchors ^$ and word boundry \b along with normal assertions look ahead (?=)(?!) and look behind (?<=)(?<!), do not consume characters.
They are like single path in-line conditionals that pass or fail, where only if it passes will the expression to the right of it be examined. So they do actually Match something, they match a condition.
Format your regex so you can see what its doing. (Use a app to help you RegexFormat 5)
^ # BOS
[^a-zA-Z0-9]* # Optional not any alphanum chars
| # or,
[^a-zA-Z0-9]* # Optional not any alphanum chars
$ # EOS
Your regex in global context will always match twice, once at the beginning of the string, once at the end because of the line break anchors and because you don't actually require anything else to match.
So basically you should avoid trying to match (mix) all optional things with the built-in anchors ^$\b. That means your regex is better represented by ^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+$ since you don't care if its NOT there (in the case of *, zero or more quantifier).
Good Luck, keep studying.
To answer your third question, the alternatives run all the way to the //s, so both sides are not the same. In the original regex the left alternative is "all non alphanumerics at the start of the string" and the right alternative is "all non alphanumerics at the end of the string".

Can I write a regex expression where one symbol matches twice?

I'm matching words with regex in javascript. The following expression uses whitespace to separate the potential matches:
/(\W)(foo)(\W)/g
This works most of the time, but it fails when there are two matches separated by a single space. (e.g. "foo foo") I think this is because the space that separates them is the last \W of the first match and the first of the second.
Is there any way to modify this expression to work in this edge case?
You can use \b instead of \W. It matches a zero-width word boundary (a boundary between a \w and a \W or the start/end of the string, while \W matches a character which may not exist at the start or end of a string.
Javascript regexes have lookahead, so you can probably do something like this:
/(\W)(foo)(?=\W)/g
I don't think lookbehinds are available, but there are other techniques that have the same effect.
Of course, this is functionally different, in that the lookahead doesn't capture, so it depends on the nature of your problem. The main point here it not that it doesn't capture, but that it doesn't match; thereby avoiding your problem.
Give this a try, I think it will work for you:
/(\W)(foo| )(\W)/g
This will tell the regex to match foo or whitespace between the two \Ws.

Javascript lookahead regular expression

I'm trying to write a regular expression to parse the following string out into three distinct parts. This is for a highlighting engine I'm writing:
"\nOn and available after solution."
I have a regular expression that's dynamically created for any word a user might input. In the above example, the word is "on".
The regular expression expects a word with any amount of white space ([\s]*) followed by the search word (with no -\w following it, eg: on-time, on-wards should not be a valid result. To complicate this, there can be a -,$,< or > symbol following the example, so on-, on> or on$ are valid. This is why there is a negative lookahead after the search word in my regular expression below.
There's a complicated reason for this, but it's not relevant to the question. The last part should be the rest of the sentence. In this example, " and available after solution."
So,
p1 = "\n"
p2 = "On"
p3 = " and available after solution"
I currently have the following regular expression.
test = new RegExp('([\\s]*)(on(?!\\-\\w))([$\\-><]*?\\s(?=[.]*))',"gi")
The first part of this regular expression ([\\s]*)(on(?!\\-\\w))[$\\-><]*? works as expected. The last part does not.
In the last part, what I'm trying to do is force the regular expression engine to match whitespace before matching additional characters. If it can not match a space, then the regular expression should end. However, when I run this regular expression, I get the following results
str1 = "\nOn ly available after solution."
test.exec(str1)
["\n On ", "\n ", "On"]
So it would appear to me that the last positive look ahead is not working. Thanks for any suggestions, and if anyone needs some clarification, let me know.
EDIT:
It would appear that my regular expression was not matching because I didn't realize the following caveat:
You can use any regular expression inside the lookahead. (Note that this is not the case with lookbehind. I will explain why below.) Any valid regular expression can be used inside the lookahead. If it contains capturing parentheses, the backreferences will be saved. Note that the lookahead itself does not create a backreference. So it is not included in the count towards numbering the backreferences. If you want to store the match of the regex inside a backreference, you have to put capturing parentheses around the regex inside the lookahead, like this: (?=(regex)). The other way around will not work, because the lookahead will already have discarded the regex match by the time the backreference is to be saved.
The dot in the character class [.] means a literal dot. Change it to just . if you wish to match any character.
The lookahead (?=.*) will always match and is completely pointless. Change it to (.*) if you just want to capture that part of the string.
I think the problem is your positive lookahead on(?!\-\w) is trying to match any on that is not followed by - then \w. I think what you want instead is on(?!\-|\w), which matches on that is not followed by - OR \w

Categories

Resources