RegEx false negative with .test() - javascript

I'm making a Chrome extension that searches a page for a dollar amount (a number with no more then two decimal places immediately preceded by a "$") then tacks on a bit with how much that value would be in another currency. I found a commonly used regex that matches exactly those parameters.
/^\$?\-?([1-9]{1}[0-9]{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))$|^\-?\$?([1-9]{1}\d{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))$|^\(\$?([1-9]{1}\d{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))\)$/g
so I'm thinking I have a nice headstart. I've only been coding a couple of months and of all the concepts I've encountered, regex's give me the most headache. I test out my shiny new expression with:
var regex = /^\$?\-?([1-9]{1}[0-9]{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))$|^\-?\$?([1-9]{1}\d{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))$|^\(\$?([1-9]{1}\d{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))\)$/g;
var str = "The total it $2.25 Would you like paper or plastic?";
r = regex.test(str);
console.log(r);
and of course that sucker returns false! I tried a few more strings with "2.25" or "$2" or "$2.256" just to be sure and they all returned false.
I am thoroughly stumped. The expression came recommended, I'm using .test() correctly. All I can think of is it's probably some small newbish detail that has nothing to do with regex's.
Thanks for your time.

Your overly complex regular expression is checking the entire string. Remove the ^ and $ which denote the beginning and end of the string, respectively. Then remove the /g flag, which is used to search for multiple matches.
What's wrong with checking for /\$\d+\.\d\d/?
I find http://regex101.com/ to be a helpful resource.

Related

Need a regular expression in javascript [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 2 years ago.
I know that I can negate group of chars as in [^bar] but I need a regular expression where negation applies to the specific word - so in my example how do I negate an actual bar, and not "any chars in bar"?
A great way to do this is to use negative lookahead:
^(?!.*bar).*$
The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].
Unless performance is of utmost concern, it's often easier just to run your results through a second pass, skipping those that match the words you want to negate.
Regular expressions usually mean you're doing scripting or some sort of low-performance task anyway, so find a solution that is easy to read, easy to understand and easy to maintain.
Solution:
^(?!.*STRING1|.*STRING2|.*STRING3).*$
xxxxxx OK
xxxSTRING1xxx KO (is whether it is desired)
xxxSTRING2xxx KO (is whether it is desired)
xxxSTRING3xxx KO (is whether it is desired)
You could either use a negative look-ahead or look-behind:
^(?!.*?bar).*
^(.(?<!bar))*?$
Or use just basics:
^(?:[^b]+|b(?:$|[^a]|a(?:$|[^r])))*$
These all match anything that does not contain bar.
The following regex will do what you want (as long as negative lookbehinds and lookaheads are supported), matching things properly; the only problem is that it matches individual characters (i.e. each match is a single character rather than all characters between two consecutive "bar"s), possibly resulting in a potential for high overhead if you're working with very long strings.
b(?!ar)|(?<!b)a|a(?!r)|(?<!ba)r|[^bar]
I came across this forum thread while trying to identify a regex for the following English statement:
Given an input string, match everything unless this input string is exactly 'bar'; for example I want to match 'barrier' and 'disbar' as well as 'foo'.
Here's the regex I came up with
^(bar.+|(?!bar).*)$
My English translation of the regex is "match the string if it starts with 'bar' and it has at least one other character, or if the string does not start with 'bar'.
The accepted answer is nice but is really a work-around for the lack of a simple sub-expression negation operator in regexes. This is why grep --invert-match exits. So in *nixes, you can accomplish the desired result using pipes and a second regex.
grep 'something I want' | grep --invert-match 'but not these ones'
Still a workaround, but maybe easier to remember.
If it's truly a word, bar that you don't want to match, then:
^(?!.*\bbar\b).*$
The above will match any string that does not contain bar that is on a word boundary, that is to say, separated from non-word characters. However, the period/dot (.) used in the above pattern will not match newline characters unless the correct regex flag is used:
^(?s)(?!.*\bbar\b).*$
Alternatively:
^(?!.*\bbar\b)[\s\S]*$
Instead of using any special flag, we are looking for any character that is either white space or non-white space. That should cover every character.
But what if we would like to match words that might contain bar, but just not the specific word bar?
(?!\bbar\b)\b\[A-Za-z-]*bar[a-z-]*\b
(?!\bbar\b) Assert that the next input is not bar on a word boundary.
\b\[A-Za-z-]*bar[a-z-]*\b Matches any word on a word boundary that contains bar.
See Regex Demo
Extracted from this comment by bkDJ:
^(?!bar$).*
The nice property of this solution is that it's possible to clearly negate (exclude) multiple words:
^(?!bar$|foo$|banana$).*
I wish to complement the accepted answer and contribute to the discussion with my late answer.
#ChrisVanOpstal shared this regex tutorial which is a great resource for learning regex.
However, it was really time consuming to read through.
I made a cheatsheet for mnemonic convenience.
This reference is based on the braces [], (), and {} leading each class, and I find it easy to recall.
Regex = {
'single_character': ['[]', '.', {'negate':'^'}],
'capturing_group' : ['()', '|', '\\', 'backreferences and named group'],
'repetition' : ['{}', '*', '+', '?', 'greedy v.s. lazy'],
'anchor' : ['^', '\b', '$'],
'non_printable' : ['\n', '\t', '\r', '\f', '\v'],
'shorthand' : ['\d', '\w', '\s'],
}
Just thought of something else that could be done. It's very different from my first answer, as it doesn't use regular expressions, so I decided to make a second answer post.
Use your language of choice's split() method equivalent on the string with the word to negate as the argument for what to split on. An example using Python:
>>> text = 'barbarasdbarbar 1234egb ar bar32 sdfbaraadf'
>>> text.split('bar')
['', '', 'asd', '', ' 1234egb ar ', '32 sdf', 'aadf']
The nice thing about doing it this way, in Python at least (I don't remember if the functionality would be the same in, say, Visual Basic or Java), is that it lets you know indirectly when "bar" was repeated in the string due to the fact that the empty strings between "bar"s are included in the list of results (though the empty string at the beginning is due to there being a "bar" at the beginning of the string). If you don't want that, you can simply remove the empty strings from the list.
I had a list of file names, and I wanted to exclude certain ones, with this sort of behavior (Ruby):
files = [
'mydir/states.rb', # don't match these
'countries.rb',
'mydir/states_bkp.rb', # match these
'mydir/city_states.rb'
]
excluded = ['states', 'countries']
# set my_rgx here
result = WankyAPI.filter(files, my_rgx) # I didn't write WankyAPI...
assert result == ['mydir/city_states.rb', 'mydir/states_bkp.rb']
Here's my solution:
excluded_rgx = excluded.map{|e| e+'\.'}.join('|')
my_rgx = /(^|\/)((?!#{excluded_rgx})[^\.\/]*)\.rb$/
My assumptions for this application:
The string to be excluded is at the beginning of the input, or immediately following a slash.
The permitted strings end with .rb.
Permitted filenames don't have a . character before the .rb.

Having trouble with regular expressions

I actually have 2 questions. There appears to be a knowledge gap in my understanding of regular expressions, so I was wondering if somebody could help me out.
1)
function LongestWord(sen) {
var word = /[^\W\d]([a-z]+)[$\W\d]/gi;
var answer = word.exec(sen);
return answer;
}
console.log(LongestWord("9hi3"));
Why does this return [hi3, i] as opposed to [9hi3, hi] as intended. I am clearly stating that before a letter comes either the beginning, a number, or a non word character MUST be in my match. I also have the + symbol which being greedy should take the entire group hi.
2)
function LongestWord(sen) {
var word = /[\b\d]([a-z]+)[\b\d]/gi;
var answer = word.exec(sen);
return answer;
}
console.log(LongestWord("hi"));
More importantly, why does this return null. #1 was my attempted fix to this. But you get the idea of what I'm trying to do here.
PLEASE TELL ME WHAT IS WRONG WITH MY THINKING IN BOTH PROBLEMS RATHER THAN GIVING ME A SOLUTION. IF I DON'T LEARN WHAT I DID WRONG I WILL GO ON TO REPEAT THE SAME MISTAKES. Thank you!
Let's walk through your regular expressions, using your example string: 9hi3
1) [^\W\d]([a-z]+)[$\W\d]
First, we have [^\W\d]. Normally, ^ matches the start of the string, but when it is inside [], it actually negates that block. So, [^\W\d] actually means any one character that IS a word character, and not a digit. This obviously skips the 9, since that is a digit, and matches on the h.
The next part, ([a-z]+), matches what you are expecting, except the h was already matched, so it only matches the i.
Then, [$\W\d] is matching a $ symbol, a non-word character, or a digit. Notice that just like ^, the $ does NOT match the end of the string when inside the [].
2) [\b\d]([a-z]+)[\b\d]
For this one, you should start by looking at the documentation for exec to see why it can return null. Specifically:
If the match fails, the exec() method returns null.
So, you know that the match is failing. Why?
Again, your confusion is coming from not understanding how special characters change meaning when inside []. In this case, \b changes from matching a word-boundary, to matching a backspace character.
It is worth noting that your second regex will match the string you tested your first one with, 9hi3, because it begins and ends with digits. However, you tested it with hi.
I hope these explanations have helped you.
For future reference, you should take a look at the RegExp guide on MDN.
Also, a great tool for testing regular expressions is regexpal. I highly recommend using it to help you figure out exactly what your regular expressions are doing.

How to invert an existing regular expression in javascript?

I have created a regex to validate time as follows : ([01]?\d|2[0-3]):[0-5]\d.
Matches TRUE : 08:00, 09:00, 9:00, 13:00, 23:59.
Matches FALSE : 10.00, 24:00, 25:30, 23:62, afdasdasd, ten.
QUESTION
How to invert a javascript regular expression to validate if NOT time?
NOTE - I have seen several ways to do this on stack but cannot seem to make them work for my expression because I do not understand how the invert expression should work.
http://regexr.com?38ai1
ANSWER
Simplest solution was to invert the javascript statement and NOT the regex itself.
if (!(/^(([01]?\d|2[0-3]):[0-5]\d)/.test(obj.value))
Simply adding ! to create an if NOT statement.
A regular expression is usually used for capturing some specific condition(s) - the more specific, the better the regex. What you're looking for is an extremely broad condition to match because just about everything wouldn't be considered "time" (a whitespace, a special character, an alphabet character, etc etc etc).
As suggested in the comments, for what you're trying to achieve, it makes much more sense to look for a time and then check (and negate) the result of that regular expression.
As i mentioned in the comment, the better way is to negate the test rather then create a new regexp that matches any non-time.
However, if you really need the regexp, you could use negative lookahead to match the start of something that is not a time:
/^(?!([01]?\d|2[0-3]):[0-5]\d$)/
DEMO: http://regex101.com/r/bD3aG4
Note that i anchored the regexp (^ and $), which might not work with what you need it for.

What does this Javascript statement (regular expression) mean?

What does the RegEx test for here?
function chksql(){
if (/^\s*(?:delete|drop|truncate|alter)/.test(v)) return false;
}
I just know it's mixed with regular expression, but can't figure out what it means.
it means its checking if v is a string that starts with zero or more white space charcters followed by delete or drop or truncate or alter
so if v were " alter" this would return false.
see docs: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
I should add that checking against this happening on the client side is a really bad idea. It will be circumvented.
There are a number of good online tools for testing and exploring regular expressions these days.
One I like is debuggex.com. Here's what it displays for your regular expression:
^\s*(?:delete|drop|truncate|alter)
Debuggex Demo
To interpret that, you still need to do a bit of homework like finding out what ^ and \s mean, but the "railroad diagram" helps show what the regular expression is testing for. Just follow the lines to see what it will match. You can also try typing in test strings at the link above to see how it matches (or doesn't match) them.
Another good site is regex101.com. Here's your regular expression there. They give you an English description of what the regular expression is looking for.
Also, heed mkoryak's advice about trying to sanitize SQL on the client!

Regular expression: Get the matched value of each match in a Kleene-star expression?

In particular, is this possible with Javascript?
>> "Version 1.2.3.4".match(/\S+ (\d+)(\.\d+)*/)
["Version 1.2.3.4", "1", ".4"]
It's obvious $2 gets set to the last Kleene-"match". Is there no built-in method to retrieve the rest (".2", ".3")?
If this cannot be done easily in JS, could Perl do it?
UPDATE: Many of the answers so far have been "workarounds" which work because of the simplicity of my example. If the part that repeated that I wanted to match was more than just a number, they wouldn't work.
However, a very valid solution does exist: use /expr/g global regex matching: just filter out the parts that repeat and use that. I find this to be somewhat less flexible than the more generally applicable * operator but it will obviously get the job done in most cases.
Regex in JavaScript, like most other regex flavors, only captures the last value of the capturing group if it is matched repeatedly. The only well known regex lib (that I know of) where you get access to all of the previous matched captures is the one in .NET.
So no, you can't do this in JS.
In Perl there are a couple of ways you can accomplish such things. One of the more elegant is probably to use \G (which works in PCRE too).
For example:
"Version 1.2.3.4" =~ /(?:\S+ |\G(?!^)\.)(\d+)/g
Returns (in list context):
(1, 2, 3, 4)
Why not match the whole version string, then split by .?
>> "Version 1.2.3.4".match(/\S+ (\d+(?:\.\d+)*)/)[1].split('.')
Just capture the whole version number string and then split on the period character.
Regex for matching the whole number: /((?:\d+)(?:\.\d+)*)/
Then simply call split on the resulting capture.
Regex \.?\d+ will return you what you need, but you have to run this regex for all matches, not just one...
var n=str.match(/\.?\d+/g);
If you want to match just numbers without leading dot, then go with regex \d+.
var n=str.match(/\d+/g);

Categories

Resources