Get all characters not matching the Reg expression Pattern in Javascript - javascript

I have below requirement where a entered text must match any of below allowed character list and get all characters not matching the reg exp pattern.
0-9
A-Z,a-z
And special characters like:
space,.#,-_&()'/*=:;
carriage return
end of line
The regular expression which I could construct is as below
/[^a-zA-Z0-9\ \.#\,\r\n*=:;\-_\&()\'\/]/g
For an given example, say input='123.#&-_()/*=:/\';#$%^"~!?[]av'. The invalid characters are '#$%^"~!?[]'.
Below is the approach I followed to get the not matched characters.
1) Construct the negation of allowed reg expn pattern like below.
/^([a-zA-Z0-9\ \.#\,\r\n*=:;\-_\&()\'\/])/g (please correct if this reg exp is right?)
2) Use replace function to get all characters
var nomatch = '';
for (var index = 0; index < input.length; index++) {
nomatch += input[index].replace(/^([a-zA-Z0-9\ \.#\,\r\n*=:;\-_\&()\'\/])/g, '');
}
so nomatch='#$%^"~!?[]' // finally
But here the replace function always returns a single not matched character. so using a loop to get all. If the input is of 100 characters then it loops 100 times and is unnecessary.
Is there any better approach get all characters not matching reg exp pattern in below lines.
A better regular expression to get not allowed characters(than the negation of reg exp I have used above)?
Avoid unnecessary looping?
A single line approach?
Great Thanks for any help on this.

You can simplify it by using reverse regex and replace all allowed characters by empty string so that output will have only not-allowed characters left.:
var re = /[\w .#,\r\n*=:;&()'\/-]+/g
var input = '123.#&-_()/*=:/\';#$%^"~!?[]av'
var input = input.replace(re, '')
console.log(input);
//=> "#$%^"~!?[]"
Also note that many special characters don't need to be escaped inside a character class.

Related

regex match not outputting the adjacent matches javascript

i was experimenting on regex in javascript. Then i came across an issue such that let consider string str = "+d+a+", I was trying to output those characters in the string which are surrounded by +, I used str.match(/\+[a-z]\+/ig), so here what I'm expecting is ["+d+","+a+"], but what i got is just ["+d+"], "+a+" is not showing in the output. Why?
.match(/.../g) returns all non-overlapping matches. Your regex requires a + sign on each side. Given your target string:
+d+a+
^^^
^^^
Your matches would have to overlap in the middle in order to return "+a+".
You can use look-ahead and a manual loop to find overlapping matches:
var str = "+d+a+";
var re = /(?=(\+[a-z]\+))/g;
var matches = [], m;
while (m = re.exec(str)) {
matches.push(m[1]);
re.lastIndex++;
}
console.log(matches);
With regex, when a character gets consumed with a match, then it won't count for the next match.
For example, a regex like /aba/g wouldn't find 2 aba's in a string like "ababa".
Because the second "a" was already consumed.
However, that can be overcome by using a positive lookahead (?=...).
Because lookaheads just check what's behind without actually consuming it.
So a regex like /(ab)(?=(a))/g would return 2 capture groups with 'ab' and 'a' for each 'aba'.
But in this case it just needs to be followed by 1 fixed character '+'.
So it can be simplified, because you don't really need capture groups for this one.
Example snippet:
var str = "+a+b+c+";
var matches = str.match(/\+[a-z]+(?=\+)/g).map(function(m){return m + '+'});
console.log(matches);

Escape single backslash inbetween non-backslash characters only

I have some input coming in a web page which I will re display and submit elsewhere. The current issue is that I want to double up all single backslashes that are sandwiched inbetween non-backslash characters before submitting the input elsewhere.
Test string "domain\name\\nonSingle\\\WontBe\\\\Returned", I want to only get the first single backslash, between domain and name.
This string should get nothing "\\get\\\nothing\\\\"
My current pattern that I can get closest with is [\w][\\](?!\\) however this will get the "\n" from the 1st test string i have listed. I would like to use lookbehind for the regex however javascript does not have such a thing for the version I am using. Here is the site I have been testing my regexs on http://www.regexpal.com/
Currently I am inefficiently using this regex [\w][\\](?!\\) to extract out all single backslashes sandwiched between non-backslash characters and the character before them (which I don't want) and then replacing it with the same string plus a backslash at the end of it.
For example given domain\name\\bl\\\ah my current regex [\w][\\]\(?!\\) will return "n\". This results in my code having to do some additional processing rather than just using replace.
I don't care about any double, triple or quadruple backslashes present, they can be left alone.
For example given domain\name\\bl\\\ah my current regex [\w][\\]\(?!\\) will return "n\". This results in my code having to do some additional processing rather than just using replace.
It will do just using replace, since you can insert the matched substring with $&, see:
console.log(String.raw`domain\name\\bl\\\ah`.replace(/\w\\(?!\\)/g, "$&\\"))
Easiest method of matching escapes, is to match all escaped characters.
\\(.)
And then in the replacement, decide what to do with it based on what was captured.
var s = "domain\\name\\\\backslashesInDoubleBackslashesWontBeReturned";
console.log('input:', s);
var r = s.replace(/\\(.)/g, function (match, capture1) {
return capture1 === '\\' ? match : '$' + capture1;
});
console.log('result:', r);
The closest you can get to actually matching the unescaped backslashes is
((?:^|[^\\])(?:\\\\)*)\\(?!\\)
It will match an odd number of backslashes, and capture all but the last one into capture group 1.
var re = /((?:^|[^\\])(?:\\\\)*)\\(?!\\)/g;
var s = "domain\\name\\\\escapedBackslashes\\\\\\test";
var parts = s.split(re);
console.dir(parts);
var cleaned = [];
for (var i = 1; i < parts.length; i += 2)
{
cleaned.push(parts[i-1] + parts[i]);
}
cleaned.push(parts[parts.length - 1]);
console.dir(cleaned);
The even-numbered (counting from zero) items will be unmatched text. The odd-numbered items will be the captured text.
Each captured text should be considered part of the preceding text.

match a string not after another string

This
var re = /[^<a]b/;
var str = "<a>b";
console.log(str.match(re)[0]);
matches >b.
However, I don't understand why this pattern /[^<a>]b/ doesn't match anything. I want to capture only the "b".
The reason why /[^<a>]b/ doesn't do anything is that you are ignoring <, a, and > as individual characters, so rewriting it as /[^><a]b/ would do the same thing. I doubt this is what you want, though. Try the following:
var re = /<a>(b)/;
var str = "<a>b";
console.log(str.match(re)[1]);
This regex looks for a string that looks like <a>b first, but it captures the b with the parentheses. To access the b, simply use [1] when you call .match instead of [0], which would return the entire string (<a>b).
What you're using here is a match for a b preceded by any character that is not listed in the group. The syntax [^a-z+-] where the a-z+- is a range of characters (in this case, the range of the lowercase Latin letters, a plus sign and a minus sign). So, what your regex pattern matches is any b preceded by a character that is NOT < or a. Since > doesn't fall in that range, it matches it.
The range selector basically works the same as a list of characters that are seperated by OR pipes: [abcd] matches the same as (a|b|c|d). Range selectors just have an extra functionality of also matching that same string via [a-d], using a dash in between character ranges. Putting a ^ at the start of a range automatically turns this positive range selector into a negative one, so it will match anything BUT the characters in that range.
What you are looking for is a negative lookahead. Those can exclude something from matching longer strings. Those work in this format: (?!do not match) where do not match uses the normal regex syntax. In this case, you want to test if the preceding string does not match <a>, so just use:
(?!<a>)(.{3}|^.{0,2})b
That will match the b when it is either preceded by three characters that are not <a>, or by fewer characters that are at the start of the line.
PS: what you are probably looking for is the "negative lookbehind", which sadly isn't available in JavaScript regular expressions. The way that would work is (?<!<a>)b in other languages. Because JavaScript doesn't have negative lookbehinds, you'll have to use this alternative regex.
you could write a pattern to match anchor tag and then replace it with empty string
var str = "<a>b</a>";
str = str.replace(/((<a[\w\s=\[\]\'\"\-]*>)|</a>)/gi,'')
this will replace the following strings with 'b'
<a>b</a>
<a class='link-l3'>b</a>
to better get familiar with regEx patterns you may find this website very useful regExPal
Your code :
var re = /[^<a>]b/;
var str = "<a>b";
console.log(str.match(re));
Why [^<a>]b is not matching with anything ?
The meaning of [^<a>]b is any character except < or a or > then b .
Hear b is followed by > , so it will not match .
If you want to match b , then you need to give like this :
var re = /(?:[\<a\>])(b)/;
var str = "<a>b";
console.log(str.match(re)[1]);
DEMO And EXPLANATION

Regular expression with asterisk quantifier

This documentation states this about the asterisk quantifier:
Matches the preceding character 0 or more times.
It works in something like this:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
var str = "<html>";
console.log(str.match(regex));
The result of the above is : <html>
But when tried on the following code to get all the "r"s in the string below, it only returns the first "r". Why is this?
var regex = /r*/;
var str = "rodriguez";
console.log(str.match(regex));
Why, in the first example does it cause "the preceding" character/token to be repeated "0 or more times" but not in the second example?
var regex = /r*/;
var str = "rodriguez";
The regex engine will first try to match r in rodriguez from left to right and since there is a match, it consumes this match.
The regex engine then tries to match another r, but the next character is o, so it stops there.
Without the global flag g (used as so var regex = /r*/g;), the regex engine will stop looking for more matches once the regex is satisfied.
Try using:
var regex = /a*/;
var str = "cabbage";
The match will be an empty string, despite having as in the string! This is because at first, the regex engine tries to find a in cabbage from left to right, but the first character is c. Since this doesn't match, the regex tries to match 0 times. The regex is thus satisfied and the matching ends here.
It might be worth pointing out that * alone is greedy, which means it will first try to match as many as possible (the 'or more' part from the description) before trying to match 0 times.
To get all r from rodriguez, you will need the global flag as mentioned earlier:
var regex = /r*/g;
var str = "rodriguez";
You'll get all the r, plus all the empty strings inside, since * also matches 'nothing'.
Use global switch to match 1 or more r anywhere in the string:
var regex = /r+/g;
In your other regex:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
You're matching literal < followed by a letter followed by 0 or more letter or digits and it will perfectly match <html>
But if you have input as <foo>:<bar>:<abc> then it will just match <foo> not other segments. To match all segments you need to use /<[A-Za-z][A-Za-z0-9]*>/g with global switch.

Javascript regular expression matching prior and trailing characters

I have this string in a object:
<FLD>dsfgsdfgdsfg;NEW-7db5-32a8-c907-82cd82206788</FLD><FLD>dsfgsdfgsd;NEW-480e-e87c-75dc-d70cd731c664</FLD><FLD>dfsgsdfgdfsgfd;NEW-0aad-440a-629c-3e8f7eda4632</FLD>
this.model.get('value_long').match(/[<FLD>\w+;](NEW[-|\d|\w]+)[</FLD>]/g)
Returns:
[";NEW-7db5-32a8-c907-82cd82206788<", ";NEW-480e-e87c-75dc-d70cd731c664<", ";NEW-0aad-440a-629c-3e8f7eda4632<"]
What is wrong with my regular expression that it is picking up the preceding ; and trailing <
here is a link to the regex
http://regexr.com?30k3m
Updated:
this is what I would like returned:
["NEW-7db5-32a8-c907-82cd82206788", "NEW-480e-e87c-75dc-d70cd731c664", "NEW-0aad-440a-629c-3e8f7eda4632"]
here is a JSfiddle for it
http://jsfiddle.net/mwagner72/HHMLK/
Square brackets create a character class, which you do not want here, try changing your regex to the following:
<FLD>\w+;(NEW[-\d\w]+)</FLD>
Since it looks like you want to grab the capture group from each match, you can use the following code to construct an array with the capture group in it:
var regex = /<FLD>\w+;(NEW[\-\d\w]+)<\/FLD>/g;
var match = regex.exec(string);
var matches = [];
while (match !== null) {
matches.push(match[1]);
match = regex.exec(string);
}
[<FLD>\w+;] would match one of the characters inside of the square brackets, when I think what you actually want to do is match all of those. Also for the other character class, [-|\d|\w], you can remove the | because it is already implied in a character class, | should only be used for alternation inside of a group.
Here is an updated link with the new regex: http://jsfiddle.net/RTkzx/1

Categories

Resources