C# regex doesn't work in Javascript

C# regex doesn't work in Javascript - javascript

I am using the following regex to validate an email:
^[a-zA-Z0-9!$'*+/\-_#%?^`&=~}{|]+(\.[a-zA-Z0-9!$'*+/\-_#%?^`&=~}{|]+)*#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-['&_\]]]+)(\.[\w-['&_\]]]+)*))(\]?)$
This works fine in C# but in JavaScript, its not working.... and yes I replaced every backslash with a double backslash as the following:
^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([\\w-['&_\\]]]+)(\\.[\\w-['&_\\]]]+)*))(\\]?)$
I am using XRegExp. Am I missing something here? Is there such thing as a converter to convert normal regex to JavaScript perhaps :) ?
Here is my function:
function CheckEmailAddress(email) {
var reg = new XRegExp("^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([\\w-['&_\\]]]+)(\\.[\\w-['&_\\]]]+)*))(\\]?)$")
if (reg.test(email) == false) {
return false;
}
return true;
}
It is returning false for a simple "abc#123.com" email address.
Thanks in advance!
Kevin

The problem is that your regular expression contains character class subtractions. JavaScript's RegExp does not support them, nor does XRegExp. (I initially misremembered and commented that it does, but it does not.)
However, character class subtractions can be replaced with negative lookaheads so this:
[\w-['&_\\]]]
can become this:
(?:(?!['&_\\]])\\w)
Both mean "any word character but not one in the set '&_]". The expression \w does not match ', & or ] so we can simplify to:
(?:(?!_)\\w)
Or since \w is [A-Za-z0-9_], we can just remove the underscore from the list and further simplify to:
[A-Za-z0-9]
So the final RegExp is this:
new RegExp("^[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+(\\.[a-zA-Z0-9!$'*+/\\-_#%?^`&=~}{|]+)*#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([A-Za-z0-9]+)(\\.[A-Za-z0-9]+)*))(\\]?)$")
I've done modest testing with this RegExp, but you should do due diligence on checking it.
It is not strictly necessary to go through the negative lookahead step to simplify the regular expression but knowing that character class subtractions can be replaced with negative lookaheads is useful in more general cases where manual simplification would be difficult or brittle.

Related

Need to write a regex in typescript for a string starting with "abcd_" and allowing only alphanumeric chars and underscore

"abcd_" shouldn't be immediately followed by another underscore. Upon searching I found the regex [a-zA-Z0-9_] for allowing only alphanumeric chars and underscore.
I am finding difficulty to combine two or more conditions.To check the start string pattern was simple as-
static myValidator(control) {
if(control.value) {
if(control.value.match(/^abcd_/)) {
return null;
} else {
return {'invalidName':true};
}
}
}

^abcd_([a-zA-Z0-9][a-zA-Z0-9_]*)?$ if abcd_ is already valid by itself and nothing needs to follow.
Otherwise ^abcd_[a-zA-Z0-9][a-zA-Z0-9_]*$ requires at least one character after abcd_.
Or if there need to be at least 6 characters after abcd_: ^abcd_[a-zA-Z0-9][a-zA-Z0-9_]{5,}$

A regex typically reads left to right. In order to combine rules, just make sure you order them correctly. For instance checking /^abcd_/ will literally look for the substring abcd_ at the start of the string. To make sure the next symbol is alphanumeric but not an underscore, we might do /^abcd_[^_\W]/ which basically reads as "not an underscore and not, not an alphanumeric" since \W is equivalent to [^A-Za-z0-9_]. Lastly we check for zero or more alphanumeric characters with \w*$, note that this w is lowercase and is equivalent to [A-Za-z0-9_], and the * means 0 or more of the preceeding subexpression and the $ makes it non-greedy.
So we end up with a final regex of:
/^abcd_[^_\W]\w*$/i
Depending on exactly what you want to be able to match (hard to tell without any expected output) then it may need to be modified.
Check this link for example matches and a more in-depth explanation of what the regex does.
https://regex101.com/r/fzKhIx/3
I would also recommend reading this guide on regular expressions in javascript:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Regex test in JavaScript if a string contains only unique characters

A string contains only [A-Za-z0-9] characters. I need to know if the tested string contains at least one repeating character.
The following should return false:
abc1
abc
The following should return true:
abc11
1abc1
aabc1
abca

Use regex with a positive look ahead and capturing group.
/(?=^[A-Za-z0-9]+$)(.)+.*\1.*/
Regex explanation here

Try using this regex for checking whether the characters are unique in the string
var str = 'aabbcc';
var isRepeat = (/([a-zA-Z0-9]).*?\1/).test(str); //Checks for repeated character in string

Can be done with:
^.*?(\w)\1.*?$
See a demo on regex101.com (actually, following matches as well).
Explanation:
If you don't mind that the character class [A-Za-z0-9] can contain _ as well, \w is a shortcut for [A-Za-z0-9_]. Afterwards, the whole expression is bound to the start (^) and end ($) of a line/string. Last but not least, the lazy .*? matches anything before and after, the (\w)\1 at least one repeating character.
If you do mind about the _, leave it as [A-Za-z0-9]:
^.*?([A-Za-z0-9])\1.*?$
Hint:
Thinking about it, I have misread your question. This approach will match words like aabc or abbc but not 1abc1 as required in your question. Use a positive lookahead for this as proposed by #Pranav. Although this does not answer the question someone might be interested in this very solution so I tend to leave the answer.

Like #Jan I didn't pay attention to question closely. So #Pranav answer is working and is accepted but it has multiple unnecessary greedy dots. I'll reduce it to:
/(?=^[a-z0-9]+$)(.)+?.*\1/im
Live demo

If you find the character set restriction too limiting, use this:
function isNotUnique(inputString) {
return !(inputString === [...new Set(inputString)].join(''));
}
It doesn't use regular expressions, but, handles most/all? characters as it relies on Set to enforce uniqueness. Example with Unicode:
let inputs = ['abc1', 'abc', 'abc11', '1abc1', 'aabc1', '☃☃', '☃', '☃abc', '(ฺ◣д◢)ฺ', '(ฺ◣д◢)'];
inputs.forEach(v => console.log(v + ': ' + isNotUnique(v)));
Outputs:
abc1: false
abc: false
abc11: true
1abc1: true
aabc1: true
☃☃: true
☃: false
☃abc: false
(ฺ◣д◢)ฺ: true
(ฺ◣д◢): false

Or just .*?(.).*?\1.* if you know already is [A-Za-z0-9].
For only check if matches (.).*?\1

Why does .NET match my regex but Javascript does not?

I needed a regular expression to match fractions and mixed numbers, but not allow zero for any of the distinct values (whole number, numerator, denominator).
I found a solution that was close to what I needed and modified it a little.
I then tested it on RegexHero which uses the .NET regex engine.
The regular expression matched "1 1/2" as I would expect, but when I tried the same regular expression in Javascript with the .test() function, it did not match it.
My suspicion is that it has something to do with how each engine handles the whitespace, but I'm not sure. Any idea why it matched on one but not the other?
The regular expression was:
^([1-9][0-9]*/[1-9][0-9]*|[1-9][0-9]*(\s[1-9][0-9]*/[1-9][0-9]*)?)$
EDIT:
I tried Jasen's suggestion, but my test is still failing.
var ingredientRegex = /^([1-9][0-9]*\/[1-9][0-9]*|[1-9][0-9]*(\\s[1-9][0-9]*\/[1-9][0-9]*)?)$/;
function isValidFraction(value) {
return ingredientRegex.test(value);
}
It is being tested with Jasmine.
it("should match a mixed number", function() {
expect(isValidFraction("2 1/2")).toBe(true);
});
SOLUTION:
It is working now. I needed to escape the "/" characters, but I did not need to escape the "\s" as Jasen suggested.

You need to mind your escapes. The \s backslash in the character class needs escaping.
var regex = new RegExp("^([1-9][0-9]*/[1-9][0-9]*|[1-9][0-9]*(\\s[1-9][0-9]*/[1-9][0-9]*)?)$");
var str = "1 1/2";
console.log(regex.test(str)); // true
This method requires different escapes for the / character now.
var regex2 = /^([1-9][0-9]*\/[1-9][0-9]*|[1-9][0-9]*(\s[1-9][0-9]*\/[1-9][0-9]*)?)$/;
console.log(regex2.test(str)); // true
MDN RegExp

RegExp in JavaScript, when a quantifier is part of the pattern

I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!

You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.

This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.

Figuring out Regex pattern

I am still not all that good when it comes to writing Regex patterns and am having issues with trying to figure out a search pattern for the following string:
{embed_video('RpcF9EYXZpZFBhY2tfRklOQUwuZj','575','352','video_player')}
I basically need to search a page for anything in between the hash {} marks.
I have tried this:
string = $(".content").text();
string.match("^{[\w-]}");
But its not working... any ideas on what I could be doing wrong?
Thanks for the help everybody! This is what I did to make it work:
$("div", "html, body").each(function(){
var text = $(this).text();
if(text.match("^\{.*\}$")) {
console.log("FOUND");
}
})

This should find the innermost content of curly braces (even nested ones).
string.match(/\{([^\{\}]*)\}/)[1]; // the [1] gets what is within the parentheses.
edit:
Thanks to the comments below here is a cleaner version:
string.match(/\{(.*?)\}/)[1];

One problem is the lack of a quantifier. As it stands, your regex is looking for a single \w or - character, denoted by your character class. You're probably looking for either of the following quantifiers:
[\w-]* - match 0 or more \w or - characters
[\w-]+ - match 1 or more \w or - characters
Another problem is the restrictions in the character class. [\w-] won't match (, ), ", spaces or other non-word characters that may appear. If you want to match all characters, use .. If you want to match all characters except }, use [^}] instead.
For example:
string = $(".content").text();
string.match("^{[^}]+}");
Using * would allow the content within the braces to be empty.
Side note: It looks to me like you're gearing up to eval() the code contained within the { and }. eval() is generally best avoided (if possible) for both security and performance reasons. In your case, you may be able to use this instead:
var string = $(".content").text(), fn, args;
if (string.charAt(0) == "{" && string.charAt(string.length - 1) == "}") {
fn = string.slice(1, string.indexOf("("));
args = string.slice(string.indexOf("("), string.lastIndexOf(")")).split(",");
window[fn].apply(null, args);
}

If you are using eclipse by any chance, there is a regular expression plugin with which you can play around and see how your regular expression searches your text.
I would try this
string.match("^\{.*\}$");

Search for the following regular expression:
var sRe = /\{([^\}]*)\}/g;
sText.match(sRe);
It means that you are searching for character "{" followed by any symbols but not "}" optionally and then ending with "}".

Try "\{.*?\}". But it won't handle the situation with nested curly braces. Here you can test your regexps online.
string.match("^\{(.*?)\}$")[1];

I think you need to escape the {} characters...they have special meaning in regex...

Develop Reference

JavaScript is the programming language of the Web.

C# regex doesn't work in Javascript - javascript

Related

Need to write a regex in typescript for a string starting with "abcd_" and allowing only alphanumeric chars and underscore

Regex test in JavaScript if a string contains only unique characters

Why does .NET match my regex but Javascript does not?

RegExp in JavaScript, when a quantifier is part of the pattern

Figuring out Regex pattern

Categories

Resources