Escape single backslash inbetween non-backslash characters only - javascript

I have some input coming in a web page which I will re display and submit elsewhere. The current issue is that I want to double up all single backslashes that are sandwiched inbetween non-backslash characters before submitting the input elsewhere.
Test string "domain\name\\nonSingle\\\WontBe\\\\Returned", I want to only get the first single backslash, between domain and name.
This string should get nothing "\\get\\\nothing\\\\"
My current pattern that I can get closest with is [\w][\\](?!\\) however this will get the "\n" from the 1st test string i have listed. I would like to use lookbehind for the regex however javascript does not have such a thing for the version I am using. Here is the site I have been testing my regexs on http://www.regexpal.com/
Currently I am inefficiently using this regex [\w][\\](?!\\) to extract out all single backslashes sandwiched between non-backslash characters and the character before them (which I don't want) and then replacing it with the same string plus a backslash at the end of it.
For example given domain\name\\bl\\\ah my current regex [\w][\\]\(?!\\) will return "n\". This results in my code having to do some additional processing rather than just using replace.
I don't care about any double, triple or quadruple backslashes present, they can be left alone.

For example given domain\name\\bl\\\ah my current regex [\w][\\]\(?!\\) will return "n\". This results in my code having to do some additional processing rather than just using replace.
It will do just using replace, since you can insert the matched substring with $&, see:
console.log(String.raw`domain\name\\bl\\\ah`.replace(/\w\\(?!\\)/g, "$&\\"))

Easiest method of matching escapes, is to match all escaped characters.
\\(.)
And then in the replacement, decide what to do with it based on what was captured.
var s = "domain\\name\\\\backslashesInDoubleBackslashesWontBeReturned";
console.log('input:', s);
var r = s.replace(/\\(.)/g, function (match, capture1) {
return capture1 === '\\' ? match : '$' + capture1;
});
console.log('result:', r);
The closest you can get to actually matching the unescaped backslashes is
((?:^|[^\\])(?:\\\\)*)\\(?!\\)
It will match an odd number of backslashes, and capture all but the last one into capture group 1.
var re = /((?:^|[^\\])(?:\\\\)*)\\(?!\\)/g;
var s = "domain\\name\\\\escapedBackslashes\\\\\\test";
var parts = s.split(re);
console.dir(parts);
var cleaned = [];
for (var i = 1; i < parts.length; i += 2)
{
cleaned.push(parts[i-1] + parts[i]);
}
cleaned.push(parts[parts.length - 1]);
console.dir(cleaned);
The even-numbered (counting from zero) items will be unmatched text. The odd-numbered items will be the captured text.
Each captured text should be considered part of the preceding text.

Related

How to use regex with an array of keywords to replace?

I am trying to create a loop that will replace certain words with their uppercase version. However I cannot seem to get it to work with capture groups as I need to only uppercase words surrounded by whitespace or a start-line marker. If I understand correctly \b is the boundary matcher? The list below is shortened for convenience.
raw_text = 'crEate Alter Something banana'
var lower_text = raw_text.toLowerCase();
var sql_keywords = ['ALTER', 'ANY', 'CREATE']
for (i = 0; i < sql_keywords.length; i++){
search_key = '(\b)' + sql_keywords[i].toLowerCase() + '(\b)';
replace_key = sql_keywords[i].toUpperCase();
lower_text = lower_text.replace(search_key, '$1' + replace_key + '$2');
}
It loops fine but the replace fails. I assume I have formatted it incorrectly but I cannot work out how to correctly format it. To be clear, it is searching for a word surrounded by either line start or a space, then replacing the word with the upper case version while keeping the boundaries preserved.
Several issues:
A backslash inside a string literal is an escape character, so if you intend to have a literal backslash (for the purpose of generating regex syntax), you need to double it
You did not create a regular expression. A dynamic regular expression is created with a call to RegExp
You would want to provide regex option flags, including g for global, and you might as well ease things by adding the i (case insensitive) flag.
There is no reason to make a capture group of a \b as it represents no character from the input. So even if your code would work, then $1 and $2 would just resolve to empty strings -- they serve no purpose.
You are casting the input to all lower case, so you will lose the capitalisation on words that are not matched.
It will be easier when you create one regular expression for all at the same time, and use the callback argument of replace:
var raw_text = 'crEate Alter Something banana';
var sql_keywords = ['ALTER','ANY','CREATE'];
var regex = RegExp("\\b(" + sql_keywords.join("|") + ")\\b", "gi");
var result = raw_text.replace(regex, word => word.toUpperCase());
console.log(result);
BTW, you probably also want to match reserved words when they are followed by punctuation, such as a comma. \b will match any switch between alphanumerical and non-alphanumerical, and vice versa, so that seems fine.
You can use the RegExp constructor.
Then make a function:
const listRegexp = list => new RegExp(list.map(word => `(${word})`).join("|"), "gi");
Then use it:
const re = listRegexp(sql_keywords);
Then replace:
const output = raw_text.replace(r, x => x.toUpperCase())

Get all characters not matching the Reg expression Pattern in Javascript

I have below requirement where a entered text must match any of below allowed character list and get all characters not matching the reg exp pattern.
0-9
A-Z,a-z
And special characters like:
space,.#,-_&()'/*=:;
carriage return
end of line
The regular expression which I could construct is as below
/[^a-zA-Z0-9\ \.#\,\r\n*=:;\-_\&()\'\/]/g
For an given example, say input='123.#&-_()/*=:/\';#$%^"~!?[]av'. The invalid characters are '#$%^"~!?[]'.
Below is the approach I followed to get the not matched characters.
1) Construct the negation of allowed reg expn pattern like below.
/^([a-zA-Z0-9\ \.#\,\r\n*=:;\-_\&()\'\/])/g (please correct if this reg exp is right?)
2) Use replace function to get all characters
var nomatch = '';
for (var index = 0; index < input.length; index++) {
nomatch += input[index].replace(/^([a-zA-Z0-9\ \.#\,\r\n*=:;\-_\&()\'\/])/g, '');
}
so nomatch='#$%^"~!?[]' // finally
But here the replace function always returns a single not matched character. so using a loop to get all. If the input is of 100 characters then it loops 100 times and is unnecessary.
Is there any better approach get all characters not matching reg exp pattern in below lines.
A better regular expression to get not allowed characters(than the negation of reg exp I have used above)?
Avoid unnecessary looping?
A single line approach?
Great Thanks for any help on this.
You can simplify it by using reverse regex and replace all allowed characters by empty string so that output will have only not-allowed characters left.:
var re = /[\w .#,\r\n*=:;&()'\/-]+/g
var input = '123.#&-_()/*=:/\';#$%^"~!?[]av'
var input = input.replace(re, '')
console.log(input);
//=> "#$%^"~!?[]"
Also note that many special characters don't need to be escaped inside a character class.

JS Regex: Remove anything (ONLY) after a word

I want to remove all of the symbols (The symbol depends on what I select at the time) after each word, without knowing what the word could be. But leave them in before each word.
A couple of examples:
!!hello! my! !!name!!! is !!bob!! should return...
!!hello my !!name is !!bob ; for !
and
$remove$ the$ targetted$# $$symbol$$# only $after$ a $word$ should return...
$remove the targetted# $$symbol# only $after a $word ; for $
You need to use capture groups and replace:
"!!hello! my! !!name!!! is !!bob!!".replace(/([a-zA-Z]+)(!+)/g, '$1');
Which works for your test string. To work for any generic character or group of characters:
var stripTrailing = trail => {
let regex = new RegExp(`([a-zA-Z0-9]+)(${trail}+)`, 'g');
return str => str.replace(regex, '$1');
};
Note that this fails on any characters that have meaning in a regular expression: []{}+*^$. etc. Escaping those programmatically is left as an exercise for the reader.
UPDATE
Per your comment I thought an explanation might help you, so:
First, there's no way in this case to replace only part of a match, you have to replace the entire match. So we need to find a pattern that matches, split it into the part we want to keep and the part we don't, and replace the whole match with the part of it we want to keep. So let's break up my regex above into multiple lines to see what's going on:
First we want to match any number of sequential alphanumeric characters, that would be the 'word' to strip the trailing symbol from:
( // denotes capturing group for the 'word'
[ // [] means 'match any character listed inside brackets'
a-z // list of alpha character a-z
A-Z // same as above but capitalized
0-9 // list of digits 0 to 9
]+ // plus means one or more times
)
The capturing group means we want to have access to just that part of the match.
Then we have another group
(
! // I used ES6's string interpolation to insert the arg here
+ // match that exclamation (or whatever) one or more times
)
Then we add the g flag so the replace will happen for every match in the target string, without the flag it returns after the first match. JavaScript provides a convenient shorthand for accessing the capturing groups in the form of automatically interpolated symbols, the '$1' above means 'insert contents of the first capture group here in this string'.
So, in the above, if you replaced '$1' with '$1$2' you'd see the same string you started with, if you did 'foo$2' you'd see foo in place of every word trailed by one or more !, etc.

Regex to not match when not in quotes

I'm looking to create a JS Regex that matches double spaces
([-!$%^&*()_+|~=`{}\[\]:";'<>?,.\w\/]\s\s[^\s])
The RegEx should match double spaces (not including the start or end of a line, when wrapped within quotes).
Any help on this would be greatly appreciated.
For example:
var x = 1,
Y = 2;
Would be fine where as
var x = 1;
would not (more than one space after the = sign.
Also if it was
console.log("I am some console output");
would be fine as it is within double quotes
This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
We can solve it with a beautifully-simple regex:
(["']) \1|([ ]{2})
The left side of the alternation | matches complete ' ' and " ". We will ignore these matches. The right side matches and captures double spaces to Group 2, and we know they are the right ones because they were not matched by the expression on the left.
This program shows how to use the regex in JavaScript, where we will retrieve the Group 2 captures:
var the_captures = [];
var string = 'your_test_string'
var myregex = /(["']) \1|([ ]{2})/g;
var thematch = myregex.exec(string);
while (thematch != null) {
// add it to array of captures
the_captures.push(thematch[2]);
document.write(thematch[2],"<br />");
// match the next one
thematch = myregex.exec(string);
}
A Neat Variation for Perl and PCRE
In the original answer, I hadn't noticed that this was a JavaScript question (the tag was added later), so I had given this solution:
(["']) \1(*SKIP)(*FAIL)|[ ]{2}
Here, thanks to (*SKIP)(*FAIL) magic, we can directly match the spaces, without capture groups.
See demo.
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...
Simple solution:
/\s{2,}/
This matches all occurrences of one or more whitespace characters. If you need to match the entire line, but only if it contains two or more consecutive whitespace characters:
/^.*\s{2,}.*$/
If the whitespaces don't need to be consecutive:
/^(.*\s.*){2,}$/

Regex produces different result in javascript

Why does this regex return an entirely different result in javascript as compared to an on-line regex tester, found at http://www.gskinner.com/RegExr/
var patt = new RegExp(/\D([0-9]*)/g);
"/144444455".match(patt);
The return in the console is:
["/144444455"]
While it does return the correct group in the regexr tester.
All I'm trying to do is extract the first amount inside a piece of text. Regardless if that text starts with a "/" or has a bunch of other useless information.
The regex does exactly what you tell it to:
\D matches a non-digit (in this case /)
[0-9]* matches a string of digits (144444455)
You will need to access the content of the first capturing group:
var match = patt.exec(subject);
if (match != null) {
result = match[1];
}
Or simply drop the \D entirely - I'm not sure why you think you need it in the first place...
Then, you should probably remove the /g modifier if you only want to match the first number, not all numbers in your text. So,
result = subject.match(/\d+/);
should work just as well.

Categories

Resources