RegExp constructor escapes slash character but not dot - javascript

Say I have this:
console.log(new RegExp('.git'));
console.log(new RegExp('scripts/npm'));
the results are:
/.git/
/scripts\/npm/
my question is - why does it escape the slash in scripts/npm, but it does not escape the . in .git? What is the rhyme and reason to that?
Note, in this case, the regex strings are being passed from the command line, so I need to convert them to regex using RegExp.

An unescaped / denotes the beginning and end of a regular expression. When you pass in a string containing / into the constructor, of course that / is part of the regular expression, not a symbol denoting the beginning or end.
The . is something else entirely, and has nothing to RE delimiters, so it's left as-is.
Note that if you want the regular expression to match a literal dot (rather than any character), you need to double-escape it when using the constructor:
console.log(new RegExp('\\.git'));

When you write regex in JS you can initialize regex strings using two /. This is called regular expression literal initialization. More about it here.
For instance
let re = /(\w+)\s(\w+)/;
Now, for the question why does it append \ before /, it is simply due to the way RegExp processes the passed string literal. This prevents the passed string from becoming corrupt ensuring all passed characters are accounted.
Furthermore if you examine the object returned by the RegExp, we can see that the actual source attribute is set to scripts\\/npm. So, the first \ indicates literal significance of the second \. Whilst from regular expressions perspective \, it simply escapes the proceeding / to form regular expression literal notation.

Related

Why does new RegExp() string not require escaping a forward slash?

Reading this document https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
"When using the constructor function, the normal string escape rules (preceding special characters with \ when included in a string) are necessary."
So why is there no difference between these two uses of RegExp()?
"a/b/c".match(new RegExp("\/", "g")) // (2) [ "/", "/" ]
"a/b/c".match(new RegExp("/", "g")) // (2) [ "/", "/" ]
Is the document incorrect or am I missing something?
The possible duplicate question indicates that forward slashes must be escaped as literals. Using a string, the example above shows this is not the case given the use case of a string with the constructor function. This question is specifically about using a string with the constructor function.
So based on the answer below, the difference appears to be with literals, both the " and the / need to be escaped, but when using a string only the " needs to be escaped.
Forward slashes are only special characters in that they designate the syntactical beginning and end of a literal regular expression in Javascript. But when you use the constructor, that's not needed, because the first parameter provided to new RegExp is the (entire) string to construct the regular expression from.

how to replace all occurrances of "\\" string in java script

This seems a very simple question but I haven't been able to get this to work.
How do I convert the following string:
var origin_str = "abc/!/!"; // Original string
var modified_str = "abc!!"; // replaced string
I tried this:
console.log(origin_str.replace(/\\/,''));
This only removes the first occurrence of backslash. I want to replaceAll. I followed this instruction in SO: How to replace all occurrences of a string in JavaScript?
origin_str.replace(new RegExp('\\', 'g'), '');
This code throws me an error SyntaxError: Invalid regular expression: /\/: \ at end of pattern. What's the regex for removing backslash in javascript.
A quick basic overview of regular expressions in JavaScript
When using regular expressions you can define the expression on two ways.
Either directly in the function or variable by using /regular expression/
Or by using the regExp contructor: new RegExp('regular expression').
Please note the difference between the two ways of defining. In the first the search pattern is encapsuled by forward slashes, while in the second one the search pattern is passed as a string.
Remember that regular expressions is in fact a search language with it's own syntax. Some characters are used to define actions: /, \, ^, $, . (dot), |, ?, *, +, (, ), [, {, ', ". These characters are called metacharacters and need to be escaped if you want them to be part of the search pattern. If not they will be treated as an option or generate script errors. Escaping is done by using the backslash. E.g. \\ escapes the second backslash and the search pattern will now search for backslashes.
There are a multitude of options you can add to your search pattern.:
Examples
adding \d will make the pattern search for a numeric value between [0-9] and/or the underscore. Simple regular expressions are parsed from left to right.
/javascript/
Searches for the word javascript in a string.
/[a-z]/
When a pattern is put between square bracket the search pattern searches for a character matching any one of the values inside the square brackets. This will find d in 229302d34330
You can build a regular expression with multiple blocks.
/(java)|(emca)script/
Find javascript or emcascript in a string. The | is the or operator.
/a/ vs. /a+/
The first matches the first a in aaabbb, the second matches a repetition of a until another character is found. So the second matches: aaa.
The plus sign + means find a one or more times. You can also use * which means zero or more times.
/^\d+$/
We've seen the \d earlier and also the plus sign. This means find one or more numeric characters. The ^ (caret) and $ (dollar sign) are new. The ^ says start searching from the begin of the string, while the $ says until the end of the string. This expression will match: 574545485 but not d43849343, 549854fff or 4348d8788.
Flags
Flags are operators and are declared after the regular expression /regular expression/flags
JavaScript has three flags you can use:
g (global) Searches multiples times for the pattern.
i (ignore case) Ignores case in pattern.
m (multiline) treat beginning and end characters (^ and $) as working over multiple lines (i.e., match the beginning or end of each line (delimited by \n or \r), not only the very beginning or end of the whole input string)
So a regular expression like this:
/d[0-9]+/ig
matches D094938 and D344783 in 98498D094938A37834D344783.
The i makes the search case-insensitive. Matching a D because of the d in the pattern. If D is followed by one or more numbers then the pattern is matched. The g flag commands the expression to look for the pattern globally or simply said: multiple times.
In your case #Qwerty provided the correct regex:
origin_str.replace(/\//g, "")
Where the search pattern is a single forward slash /. Escaped by the backslash to prevent script errors. The g flags commands the replace function to search for all occurrences of the forward slash in the string and replace them with an empty string "".
For a comprehensive tutorial and reference : http://www.regular-expressions.info/tutorial.html
Looking for this?
origin_str.replace(/\//g, "")
The syntax for replace is
.replace(/pattern/flags, replacement)
So in my case the pattern is \/ - an escaped slash
and g is global flag.

Regex not working as expected in JavaScript

I wrote the following regex:
(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?
Its behaviour can be seen here: http://gskinner.com/RegExr/?34b8m
I wrote the following JavaScript code:
var urlexp = new RegExp(
'^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$', 'gi'
);
document.write(urlexp.test("blaaa"))
And it returns true even though the regex was supposed to not allow single words as valid.
What am I doing wrong?
Your problem is that JavaScript is viewing all your escape sequences as escapes for the string. So your regex goes to memory looking like this:
^(https?://)?([da-z.-]+).([a-z]{2,6})(/(w|-)*)*/?$
Which you may notice causes a problem in the middle when what you thought was a literal period turns into a regular expressions wildcard. You can solve this in a couple ways. Using the forward slash regular expression syntax JavaScript provides:
var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi
Or by escaping your backslashes (and not your forward slashes, as you had been doing - that's exclusively for when you're using /regex/mod notation, just like you don't have to escape your single quotes in a double quoted string and vice versa):
var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi')
Please note the double backslash before the w - also necessary for matching word characters.
A couple notes on your regular expression itself:
[da-z.-]
d is contained in the a-z range. Unless you meant \d? In that case, the slash is important.
(/(\w|-)*)*/?
My own misgivings about the nested Kleene stars aside, you can whittle that alternation down into a character class, and drop the terminating /? entirely, as a trailing slash will be match by the group as you've given it. I'd rewrite as:
(/[\w-]*)*
Though, maybe you'd just like to catch non space characters?
(/[^/\s]*)*
Anyway, modified this way your regular expression winds up looking more like:
^(https?://)?([\da-z.-]+)\.([a-z]{2,6})(/[\w-]*)*$
Remember, if you're going to use string notation: Double EVERY backslash. If you're going to use native /regex/mod notation (which I highly recommend), escape your forward slashes.

Different in Convert Regular Expression String to RegExp in Javascript

Why isn't my code around the test function in Object RegExp working the same way as the regex pattern. Am I missing something or Am I using the wrong escape regex
<html>
<body>
<script type="text/javascript">
var str = "info#test.com";
//This isn't working
var regStr = "^([\w-]+(?:\.[\w-]+)*)#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$"; //This string can be any regex get from XSLT
//Escape function get from: http://stackoverflow.com/a/6969486/193850
regStr = regStr.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
console.log(regStr); // \^\(\[w\-\]\+\(\?:\.\[w\-\]\+\)\*\)#\(\(\?:\[w\-\]\+\.\)\*w\[w\-\]\{0,66\}\)\.\(\[a\-z\]\{2,6\}\(\?:\.\[a\-z\]\{2\}\)\?\)\$
var re = new RegExp(regStr , "i");
console.log(re.test(str)); //false
var filter=/^([\w-]+(?:\.[\w-]+)*)#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$/i
console.log(filter.test(str)); //true
</script>
</body>
</html>
You have to double your backslashes when you write a regular expression as a string.
Why? The string literal syntax also observes its own backslash-quoting convention, for things like quote characters, newlines, etc. Therefore, when JavaScript parses your string constant that contains the regular expression, the backslashes will disappear. Thus, you need to quote them with another backslash so that when you pass the string to the RegExp constructor it sees the regular expression you actually intended.
See, there's a confusion. The function you've used is a nice way of preprocessing your string-to-be regexes so you don't have to worry about escaping regex metacharacters - i.e., symbols that will control the regex behaviour, and not just taken literally.
But the point is that your string has already been parsed before it was taken by this escaping function: \w and \. sequences became just w and . respectively, the preceding slash was lost.
For characters not listed in Table 2.1, a preceding backslash is
ignored, but this usage is deprecated and should be avoided.
The escaper function, actually, did restore the slash before the ., but w wasn't special for it in any kind. ) Therefore the string that went into RegExp constructor had [w] instead of [\w].
It's actually quite easy to check: just console.log(regStr) after the replacement operation.

Brackets in Regular Expression

I'd like to compare 2 strings with each other, but I got a little problem with the Brackets.
The String I want to seek looks like this:
CAPPL:LOCAL.L_hk[1].vorlauftemp_soll
Quoting those to bracket is seemingly useless.
I tried it with this code
var regex = new RegExp("CAPPL:LOCAL.L_hk\[1\].vorlauftemp_soll","gi");
var value = "CAPPL:LOCAL.L_hk[1].vorlauftemp_soll";
regex.test(value);
Somebody who can help me??
It is useless because you're using string. You need to escape the backslashes as well:
var regex = new RegExp("CAPPL:LOCAL.L_hk\\[1\\].vorlauftemp_soll","gi");
Or use a regex literal:
var regex = /CAPPL:LOCAL.L_hk\[1\].vorlauftemp_soll/gi
Unknown escape characters are ignored in JavaScript, so "\[" results in the same string as "[".
In value, you have (1) instead of [1]. So if you expect the regular expression to match and it doesn't, it because of that.
Another problem is that you're using "" in your expression. In order to write regular expression in JavaScript, use /.../g instead of "...".
You may also want to escape the dot in your expression. . means "any character that is not a line break". You, on the other hand, wants the dot to be matched literally: \..
You are generating a regular expression (in which [ is a special character that can be escaped with \) using a string (in which \ is a special character).
var regex = /CAPPL:LOCAL.L_hk\[1\].vorlauftemp_soll/gi;

Categories

Resources