Why does .NET match my regex but Javascript does not? - javascript

I needed a regular expression to match fractions and mixed numbers, but not allow zero for any of the distinct values (whole number, numerator, denominator).
I found a solution that was close to what I needed and modified it a little.
I then tested it on RegexHero which uses the .NET regex engine.
The regular expression matched "1 1/2" as I would expect, but when I tried the same regular expression in Javascript with the .test() function, it did not match it.
My suspicion is that it has something to do with how each engine handles the whitespace, but I'm not sure. Any idea why it matched on one but not the other?
The regular expression was:
^([1-9][0-9]*/[1-9][0-9]*|[1-9][0-9]*(\s[1-9][0-9]*/[1-9][0-9]*)?)$
EDIT:
I tried Jasen's suggestion, but my test is still failing.
var ingredientRegex = /^([1-9][0-9]*\/[1-9][0-9]*|[1-9][0-9]*(\\s[1-9][0-9]*\/[1-9][0-9]*)?)$/;
function isValidFraction(value) {
return ingredientRegex.test(value);
}
It is being tested with Jasmine.
it("should match a mixed number", function() {
expect(isValidFraction("2 1/2")).toBe(true);
});
SOLUTION:
It is working now. I needed to escape the "/" characters, but I did not need to escape the "\s" as Jasen suggested.

You need to mind your escapes. The \s backslash in the character class needs escaping.
var regex = new RegExp("^([1-9][0-9]*/[1-9][0-9]*|[1-9][0-9]*(\\s[1-9][0-9]*/[1-9][0-9]*)?)$");
var str = "1 1/2";
console.log(regex.test(str)); // true
This method requires different escapes for the / character now.
var regex2 = /^([1-9][0-9]*\/[1-9][0-9]*|[1-9][0-9]*(\s[1-9][0-9]*\/[1-9][0-9]*)?)$/;
console.log(regex2.test(str)); // true
MDN RegExp

Related

Javascript string split with regex

I am trying to split a string using a regular expression for links (urls).
The regex in question is
var regex = new RegExp('(?:^(?:(?:[a-z]+:)?//)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[/?#]\S*)?$)')
If i do
regex.test("https://google.com"); // returns true
but doing -
"Go to https://google.com".split(regex);
// return ["Go to https://google.com"]
Whereas i expect it to return
["Go to ", "https://google.com"]
Any idea what's going on here?
First of all, you're using a string literal to build your regex, which means that you have to escape your backslashes (since a backslash has a special meaning in strings, used for the line feed char \n for example):
var regex = new RegExp('(?:^(?:(?:[a-z]+:)?//)(?:\\S+(?::\\S*)?#)?(?:localhost|(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:[/?#]\\S*)?$)');
Another solution would be to use the regex literal, as JavaScript proposes one, but you would then have to escape the slashes:
var regex = /(?:^(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?$)/;
Then, your regex will try to match against the entire input due to the ^ and $ anchors. So if you remove them (or better, replace them with word boundaries \b), you'll be able to find URLs in a string for example:
var regex = /(?:\b(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?\b)/;
But, the main point is that you're misunderstanding the split concept. Given the string "hello world", if you split by space, you'll end up with ["hello", "world"]: no more space anymore since it was the char that was used to split.
That is, if you split by the URL regex, the output array won't contain the URLs anymore. It seems to me that a lookahead could suit your needs:
var regex = /(?=(?:\b(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?\b))/;
"Go to https://google.com".split(regex) // ["Go to ", "https://google.com"]
The regex explained:
(?=(?:\b(?:(?:[a-z]+:)?//)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[/?#]\S*)?\b))
Debuggex Demo
By splitting a string with a positive lookahead (?=content_of_lookahead), you'll split by each interchar that is followed by the content of the lookahead.
Take a look at 8 Regular Expressions You Should Know.
To match an url you can use following regex :
var regex = "(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w# \.-]*)*\/?$";
"Go to https://google.com".split(regex);
// return ["https://google.com"]
Live example.
Hope this helps.

C# regex doesn't work in Javascript

I am using the following regex to validate an email:
^[a-zA-Z0-9!$'*+/\-_#%?^`&=~}{|]+(\.[a-zA-Z0-9!$'*+/\-_#%?^`&=~}{|]+)*#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-['&_\]]]+)(\.[\w-['&_\]]]+)*))(\]?)$
This works fine in C# but in JavaScript, its not working.... and yes I replaced every backslash with a double backslash as the following:
^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([\\w-['&_\\]]]+)(\\.[\\w-['&_\\]]]+)*))(\\]?)$
I am using XRegExp. Am I missing something here? Is there such thing as a converter to convert normal regex to JavaScript perhaps :) ?
Here is my function:
function CheckEmailAddress(email) {
var reg = new XRegExp("^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([\\w-['&_\\]]]+)(\\.[\\w-['&_\\]]]+)*))(\\]?)$")
if (reg.test(email) == false) {
return false;
}
return true;
}
It is returning false for a simple "abc#123.com" email address.
Thanks in advance!
Kevin
The problem is that your regular expression contains character class subtractions. JavaScript's RegExp does not support them, nor does XRegExp. (I initially misremembered and commented that it does, but it does not.)
However, character class subtractions can be replaced with negative lookaheads so this:
[\w-['&_\\]]]
can become this:
(?:(?!['&_\\]])\\w)
Both mean "any word character but not one in the set '&_]". The expression \w does not match ', & or ] so we can simplify to:
(?:(?!_)\\w)
Or since \w is [A-Za-z0-9_], we can just remove the underscore from the list and further simplify to:
[A-Za-z0-9]
So the final RegExp is this:
new RegExp("^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([A-Za-z0-9]+)(\\.[A-Za-z0-9]+)*))(\\]?)$")
I've done modest testing with this RegExp, but you should do due diligence on checking it.
It is not strictly necessary to go through the negative lookahead step to simplify the regular expression but knowing that character class subtractions can be replaced with negative lookaheads is useful in more general cases where manual simplification would be difficult or brittle.

Regex not working as expected in JavaScript

I wrote the following regex:
(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?
Its behaviour can be seen here: http://gskinner.com/RegExr/?34b8m
I wrote the following JavaScript code:
var urlexp = new RegExp(
'^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$', 'gi'
);
document.write(urlexp.test("blaaa"))
And it returns true even though the regex was supposed to not allow single words as valid.
What am I doing wrong?
Your problem is that JavaScript is viewing all your escape sequences as escapes for the string. So your regex goes to memory looking like this:
^(https?://)?([da-z.-]+).([a-z]{2,6})(/(w|-)*)*/?$
Which you may notice causes a problem in the middle when what you thought was a literal period turns into a regular expressions wildcard. You can solve this in a couple ways. Using the forward slash regular expression syntax JavaScript provides:
var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi
Or by escaping your backslashes (and not your forward slashes, as you had been doing - that's exclusively for when you're using /regex/mod notation, just like you don't have to escape your single quotes in a double quoted string and vice versa):
var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi')
Please note the double backslash before the w - also necessary for matching word characters.
A couple notes on your regular expression itself:
[da-z.-]
d is contained in the a-z range. Unless you meant \d? In that case, the slash is important.
(/(\w|-)*)*/?
My own misgivings about the nested Kleene stars aside, you can whittle that alternation down into a character class, and drop the terminating /? entirely, as a trailing slash will be match by the group as you've given it. I'd rewrite as:
(/[\w-]*)*
Though, maybe you'd just like to catch non space characters?
(/[^/\s]*)*
Anyway, modified this way your regular expression winds up looking more like:
^(https?://)?([\da-z.-]+)\.([a-z]{2,6})(/[\w-]*)*$
Remember, if you're going to use string notation: Double EVERY backslash. If you're going to use native /regex/mod notation (which I highly recommend), escape your forward slashes.

Issue with custom javascript regex

I have a custom regular expression which I use to detect whole numbers, fractions and floats.
var regEx = new RegExp("^((^[1-9]|(0\.)|(\.))([0-9]+)?((\s|\.)[0-9]+(/[0-9])?)?)$");
var quantity = 'd';
var matched = quantity.match(regEx);
alert(matched);
​
(The code is also found here: http://jsfiddle.net/aNb3L/ .)
The problem is that for a single letter it matches, and I can't figure out why. But for more letters it fails(which is good).
Disclaimer: I am new to regular expressions, although in http://gskinner.com/RegExr/ it doesn't match a single letter
It's easier to use straight regular expression syntax:
var regEx = /^((^[1-9]|(0\.)|(\.))([0-9]+)?((\s|\.)[0-9]+(\/[0-9])?)?)$/;
When you use the RegExp constructor, you have to double-up on the backslashes. As it is, your code only has single backslashes, so the \. subexpressions are being treated as . — and that's how single non-digit characters are slipping through.
Thus yours would also work this way:
var regEx = new RegExp("^((^[1-9]|(0\\.)|(\\.))([0-9]+)?((\\s|\\.)[0-9]+(/[0-9])?)?)$");
This happens because the string syntax also uses backslash as a quoting mechanism. When your regular expression is first parsed as a string constant, those backslashes are stripped out if you don't double them. When the string is then passed to the regular expression parser, they're gone.
The only time you really need to use the RegExp constructor is when you're building up the regular expression dynamically or when it's delivered to your code via JSON or something.
Well, for a whole number this would be your regex:
/^(0|[1-9]\d*)$/
Then you have to account for the possibility of a float:
/^(0|[1-9]\d*)(.\d+)?$/
Then you have to account for the possibility of a fraction:
/^(0|[1-9]\d*)((.\d+)|(\/[1-9]\d*)?$/
To me this regex is much easier to read than your original, but it's up to you of course.

Building regexp from JS variables not working

I am trying to build a regexp from static text plus a variable in javascript. Obviously I am missing something very basic, see comments in code below. Help is very much appreciated:
var test_string = "goodweather";
// One regexp we just set:
var regexp1 = /goodweather/;
// The other regexp we built from a variable + static text:
var regexp_part = "good";
var regexp2 = "\/" + regexp_part + "weather\/";
// These alerts now show the 2 regexp are completely identical:
alert (regexp1);
alert (regexp2);
// But one works, the other doesn't ??
if (test_string.match(regexp1))
alert ("This is displayed.");
if (test_string.match(regexp2))
alert ("This is not displayed.");
First, the answer to the question:
The other answers are nearly correct, but fail to consider what happens when the text to be matched contains a literal backslash, (i.e. when: regexp_part contains a literal backslash). For example, what happens when regexp_part equals: "C:\Windows"? In this case the suggested methods do not work as expected (The resulting regex becomes: /C:\Windows/ where the \W is erroneously interpreted as a non-word character class). The correct solution is to first escape any backslashes in regexp_part (the needed regex is actually: /C:\\Windows/).
To illustrate the correct way of handling this, here is a function which takes a passed phrase and creates a regex with the phrase wrapped in \b word boundaries:
// Given a phrase, create a RegExp object with word boundaries.
function makeRegExp(phrase) {
// First escape any backslashes in the phrase string.
// i.e. replace each backslash with two backslashes.
phrase = phrase.replace(/\\/g, "\\\\");
// Wrap the escaped phrase with \b word boundaries.
var re_str = "\\b"+ phrase +"\\b";
// Create a new regex object with "g" and "i" flags set.
var re = new RegExp(re_str, "gi");
return re;
}
// Here is a condensed version of same function.
function makeRegExpShort(phrase) {
return new RegExp("\\b"+ phrase.replace(/\\/g, "\\\\") +"\\b", "gi");
}
To understand this in more depth, follows is a discussion...
In-depth discussion, or "What's up with all these backslashes!?"
JavaScript has two ways to create a RegExp object:
/pattern/flags - You can specify a RegExp Literal expression directly, where the pattern is delimited using a pair of forward slashes followed by any combination of the three pattern modifier flags: i.e. 'g' global, 'i' ignore-case, or 'm' multi-line. This type of regex cannot be created dynamically.
new RegExp("pattern", "flags") - You can create a RegExp object by calling the RegExp() constructor function and pass the pattern as a string (without forward slash delimiters) as the first parameter and the optional pattern modifier flags (also as a string) as the second (optional) parameter. This type of regex can be created dynamically.
The following example demonstrates creating a simple RegExp object using both of these two methods. Lets say we wish to match the word "apple". The regex pattern we need is simply: apple. Additionally, we wish to set all three modifier flags.
Example 1: Simple pattern having no special characters: apple
// A RegExp literal to match "apple" with all three flags set:
var re1 = /apple/gim;
// Create the same object using RegExp() constructor:
var re2 = new RegExp("apple", "gim");
Simple enough. However, there are significant differences between these two methods with regard to the handling of escaped characters. The regex literal syntax is quite handy because you only need to escape forward slashes - all other characters are passed directly to the regex engine unaltered. However, when using the RegExp constructor method, you pass the pattern as a string, and there are two levels of escaping to be considered; first is the interpretation of the string and the second is the interpretation of the regex engine. Several examples will illustrate these differences.
First lets consider a pattern which contains a single literal forward slash. Let's say we wish to match the text sequence: "and/or" in a case-insensitive manner. The needed pattern is: and/or.
Example 2: Pattern having one forward slash: and/or
// A RegExp literal to match "and/or":
var re3 = /and\/or/i;
// Create the same object using RegExp() :
var re4 = new RegExp("and/or", "i");
Note that with the regex literal syntax, the forward slash must be escaped (preceded with a single backslash) because with a regex literal, the forward slash has special meaning (it is a special metacharacter which is used to delimit the pattern). On the other hand, with the RegExp constructor syntax (which uses a string to store the pattern), the forward slash does NOT have any special meaning and does NOT need to be escaped.
Next lets consider a pattern which includes a special: \b word boundary regex metasequence. Say we wish to create a regex to match the word "apple" as a whole word only (so that it won't match "pineapple"). The pattern (as seen by the regex engine) needs to be: \bapple\b:
Example 3: Pattern having \b word boundaries: \bapple\b
// A RegExp literal to match the whole word "apple":
var re5 = /\bapple\b/;
// Create the same object using RegExp() constructor:
var re6 = new RegExp("\\bapple\\b");
In this case the backslash must be escaped when using the RegExp constructor method, because the pattern is stored in a string, and to get a literal backslash into a string, it must be escaped with another backslash. However, with a regex literal, there is no need to escape the backslash. (Remember that with a regex literal, the only special metacharacter is the forward slash.)
Backslash SOUP!
Things get even more interesting when we need to match a literal backslash. Let's say we want to match the text sequence: "C:\Program Files\JGsoft\RegexBuddy3\RegexBuddy.exe". The pattern to be processed by the regex engine needs to be: C:\\Program Files\\JGsoft\\RegexBuddy3\\RegexBuddy\.exe. (Note that the regex pattern to match a single backslash is \\ i.e. each must be escaped.) Here is how you create the needed RegExp object using the two JavaScript syntaxes
Example 4: Pattern to match literal back slashes:
// A RegExp literal to match the ultimate Windows regex debugger app:
var re7 = /C:\\Program Files\\JGsoft\\RegexBuddy3\\RegexBuddy\.exe/;
// Create the same object using RegExp() constructor:
var re8 = new RegExp(
"C:\\\\Program Files\\\\JGsoft\\\\RegexBuddy3\\\\RegexBuddy\\.exe");
This is why the /regex literal/ syntax is generally preferred over the new RegExp("pattern", "flags") method - it completely avoids the backslash soup that can frequently arise. However, when you need to dynamically create a regex, as the OP needs to here, you are forced to use the new RegExp() syntax and deal with the backslash soup. (Its really not that bad once you get your head wrapped 'round it.)
RegexBuddy to the rescue!
RegexBuddy is a Windows app that can help with this backslash soup problem - it understands the regex syntaxes and escaping requirements of many languages and will automatically add and remove backslashes as required when pasting to and from the application. Inside the application you compose and debug the regex in native regex format. Once the regex works correctly, you export it using one of the many "copy as..." options to get the needed syntax. Very handy!
You should use the RegExp constructor to accomplish this:
var regexp2 = new RegExp(regexp_part + "weather");
Here's a related question that might help.
The forward slashes are just Javascript syntax to enclose regular expresions in. If you use normal string as regex, you shouldn't include them as they will be matched against. Therefore you should just build the regex like that:
var regexp2 = regexp_part + "weather";
I would use :
var regexp2 = new RegExp(regexp_part+"weather");
Like you have done that does :
var regexp2 = "/goodweather/";
And after there is :
test_string.match("/goodweather/")
Wich use match with a string and not with the regex like you wanted :
test_string.match(/goodweather/)
While this solution may be overkill for this specific question, if you want to build RegExps programmatically, compose-regexp can come in handy.
This specific problem would be solved by using
import {sequence} from 'compose-regexp'
const weatherify = x => sequence(x, /weather/)
Strings are escaped, so
weatherify('.')
returns
/\.weather/
But it can also accept RegExps
weatherify(/./u)
returns
/.weather/u
compose-regexp supports the whole range of RegExps features, and let one build RegExps from sub-parts, which helps with code reuse and testability.

Categories

Resources