Regex replace total string - javascript

I have and XML file with items that contain this string:
<field name="itemid">xx</field>
Where xx = a number from 50 to 250.
I need to remove the entire string from the whole file.
How would I do this with a Regex replace?

you can use this:
str = str.replace(/<.*>/g,'');
See an example for match here
var str = "<field name='itemid'>xx</field>";
str = str.replace(/<.*>/g, 'replaced');
console.log(str)
Explanation:
< matches the character < literally
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
> matches the character > literally
g modifier: global. All matches (don't return on first match)
If you want to be more restrictive you can do this:
str = str.replace(/<field name\=\"\w*\">\d*<\/field>/g, '');
See an example for match here
var str = '<field name="test">200</field>';
str = str.replace(/<field name\=\"\w*\">\d*<\/field>/g, 'replaced');
console.log(str)
Explanation:
<field name matches the characters
\= matches the character = literally
\" matches the character " literally
\w* match any word character [a-zA-Z0-9_]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\" matches the character " literally
> matches the character > literally
\d* match a digit [0-9]
- Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
< matches the character < literally
\/ matches the character / literally
field> matches the characters field> literally (case sensitive)
g modifier: global. All matches (don't return on first match)

When you replace a tag especially XML tag you must be sure that you capture everything from opening to closing tag. In this case RegExp should look back.
var re = /<(\S+)[^>]*>.*<\/\1>/g;
var m ='some text <ab id="aa">aa <p>qq</p> aa</ab>test test <p>paragraph </p>'.replace(re,'');
console.log(m);//some text test test
\1 matches (\S+)
\S+ one or more non-white-space characters

Related

regex to allow special characters but not leading/trailing white space

I would like to allow all special characters and white space in between words only for a password input field.
If whitespace entered at the leading, trailing of string, regex should fail
Any useful javascript regex?
I tried \S this does not accept any white space, would that be sufficient?
I tried \A\s|\s*\Z , but not able to negate this.
Using something like [^\s] would suffice.
The \A (start of string) and \Z (end of string) anchors are not supported by JS RegExp.
If you use /\S/ it will only match any non-whitespace char, anywhere inside a string.
If you use /^\s|\s*$/ it will match a whitespace at the start or any 0 or more whitespaces at the end.
You need
/^\S+(?:\s+\S+)*$/
See the regex demo.
It will match:
^ - start of string
\S+ - 1 or more non-whitespace chars
(?:\s+\S+)* - any 0 or more occurrences of
\s+ - 1+ whitespaces
\S+ - 1+ non-whitespace chars
$ - end of string.
JS demo:
var strs = ['Abc 123 !##', 'abc123#', ' abc34', ' a ', 'bvc '];
var rx = /^\S+(?:\s+\S+)*$/;
for (var s of strs) {
console.log("'"+s+"'", "=>", rx.test(s));
}
I don't know if it's totally fine but in your case, I think this could apply better
^((\w)*){1}$

Return true if two or more repeated chars

The regular expressions I have tried include following:
1> var temp = "(.)\\1{2,}";
2> var temp = "^(?!.*(.)\\1{2,})";
testExp = new RegExp(temp);
The output I get is :
testExp.test("sss is true")
testExp.test("ss is false")
testexp.test("sdsdsd is false") //which should be true.
that is my regular expressions take into account only consecutive repeated characters and not others.
You may add .* before \1 (to match any 0+ chars other than line break chars) and use the following regex:
/(.)(?:.*\1){2,}/
Or, if there can be line breaks in the input string:
/([\s\S])(?:[\s\S]*\1){2,}/
See the regex demo. [\s\S] (or [^] in JS regex) will match any char while . matches any char but a line break char.
Details
(.) - capturing group #1 matching any 1 char
(?:.*\1){2,} - 2 or more consecutive occurrences of:
.* - any 0+ chars other than line break chars
\1 - backreference to Group 1 value (same char as captured in Group 1).
Try somethink like :
var str="sdsdsd";
var hasDuplicates = (/([a-zA-Z]).*?\1/).test(str)
alert("repeating string "+hasDuplicates);
var text = 'abcdeaf';
if (text.match(/(.).*\1/) {} else {}

Why does string.replace(/\W*/g,'_') prepend all characters?

I've been learning regexp in js an encountered a situation that I didn't understand.
I ran a test of the replace function with the following regexp:
/\W*/g
And expected it prepend the beginning of the string and proceed to replace all non-word characters.
The Number is (123)(234)
would become:
_The_Number_is__123___234_
This would be prepending the string because it has at least zero instances, and then replacing all non-breaking spaces and non-word characters.
Instead, it prepended every character and replaced all non-word characters.
_T_h_e__N_u_m_b_e_r__i_s__1_2_3__2_3_4__
Why did it do this?
The problem is the meaning of \W*. It means "0 or more non-word characters". This means that the empty string "" would match, given that it is indeed 0 non-word characters.
So the regex matches before every character in the string and at the end, hence why all the replacements are done.
You want either /\W/g (replacing each individual non-word character) or /\W+/g (replacing each set of consecutive non-word characters).
"The Number is (123)(234)".replace(/\W/g, '_') // "The_Number_is__123__234_"
"The Number is (123)(234)".replace(/\W+/g, '_') // "The_Number_is_123_234_"
TL;DR
Never use a pattern that can match an empty string in a regex replace method if your aim is to replace and not insert text
To replace all separate occurrences of a non-word char in a string, use .replace(/\W/g, '_') (that is, remove * quantifier that matches zero or more occurrences of the quantified subpattern)
To replace all chunks of non-word chars in a string with a single pattern, use .replace(/\W+/g, '_') (that is, replace * quantifier with + that matches one or more occurrences of the quantified subpattern)
Note: the solution below is tailored for the OP much more specific requirements.
A string is parsed by the JS regex engine as a sequence of chars and locations in between them. See the following diagram where I marked locations with hyphens:
-T-h-e- -N-u-m-b-e-r- -i-s- -(-1-2-3-)-(-2-3-4-)-
||| |
||Location between T and h, etc. ............. |
|1st symbol |
start -> end
All these positions can be analyzed and matched with a regex.
Since /\W*/g is a regex matching all non-overlapping occurrences (due to g modifier) of 0 and more (due to * quantifier) non-word chars, all the positions before word chars are matched. Between T and h, there is a location tested with the regex, and as there is no non-word char (h is a word char), the empty match is returned (as \W* can match an empty string).
So, you need to replace the start of string and each non-word char with a _. Naive approach is to use .replace(/\W|^/g, '_'). However, there is a caveat: if a string starts with a non-word character, no _ will get appended at the start of the string:
console.log("Hi there.".replace(/\W|^/g, '_')); // _Hi_there_
console.log(" Hi there.".replace(/\W|^/g, '_')); // _Hi_there_
Note that here, \W comes first in the alternation and "wins" when matching at the beginning of the string: the space is matched and then no start position is found at the next match iteration.
You may now think you can match with /^|\W/g. Look here:
console.log("Hi there.".replace(/^|\W/g, '_')); // _Hi_there_
console.log(" Hi there.".replace(/^|\W/g, '_')); // _ Hi_there_
The _ Hi_there_ second result shows how JS regex engine handles zero-width matches during a replace operation: once a zero-width match (here, it is the position at the start of the string) is found, the replacement occurs, and the RegExp.lastIndex property is incremented, thus proceeding to the position after the first character! That is why the first space is preserved, and no longer matched with \W.
A solution is to use a consuming pattern that will not allow zero-width matches:
console.log("Hi there.".replace(/^(\W?)|\W/g, function($0,$1) { return $1 ? "__" : "_"; }));
console.log(" Hi there.".replace(/^(\W?)|\W/g, function($0,$1) { return $1 ? "__" : "_"; }));
You can use RegExp /(^\W*){1}|\W(?!=\w)/g to match one \W at beginning of string or \W not followed by \w
var str = "The Number is (123)(234)";
var res = str.replace(/(^\W*){1}|\W(?!=\w)/g, "_");
console.log(res);
You should have used /\W+/g instead.
"*" means all characters by itself.
It's because you're using the * operator. That matches zero or more characters. So between every character matches. If you replace the expression with /\W+/g it works as you expected.
This should work for you
Find: (?=.)(?:^\W|\W$|\W|^|(.)$)
Replace: $1_
Cases explained:
(?= . ) # Must be at least 1 char
(?: # Ordered Cases:
^ \W # BOS + non-word (consumes bos)
| \W $ # Non-word + EOS (consumes eos)
| \W # Non-word
| ^ # BOS
| ( . ) # (1), Any char + EOS
$
)
Note this could have been done without the lookahead via
(?:^\W|\W$|\W|^$)
But, this will insert a single _ on an empty string.
So, it ends up being more elaborate.
All in all though, it's a simple replacement.
Unlike Stribnez's solution, no callback logic is required
on the replace side.

RegEx pattern test is failing in javascript

I have below RegEx to validate a string..
var str = "Thebestthingsinlifearefree";
var patt = /[^0-9A-Za-z !\\#$%&()*+,\-.\/:;<=>?#\[\]^_`{|}~]*/g;
var res = patt.test(str);
the result will always give true but I thought it would give false.. because I checking any pattern which is not in the patt variable...
The given string is valid and it contains only Alphabets with capital and small case letters. Not sure what is wrong with the pattern.
Here's your code:
var str = "Thebestthingsinlifearefree";
var patt = /[^0-9A-Za-z !\\#$%&()*+,\-.\/:;<=>?#\[\]^_`{|}~]*/g;
console.log(patt.test(str));
The regex
/[^0-9A-Za-z !\\#$%&()*+,\-.\/:;<=>?#\[\]^_`{|}~]*/g
will match anything since it accepts match of length 0 due to the quantifier *.
Just add anchors:
var str = "Thebestthingsinlifearefree";
var patt = /^[^0-9A-Za-z !\\#$%&()*+,\-.\/:;<=>?#\[\]^_`{|}~]*$/;
console.log(patt.test(str));
Here's an explanation or your regex:
[^0-9A-Za-z !\\#$%&()*+,\-.\/:;<=>?#\[\]^_`{|}~]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
! a single character in the list ! literally
\\ matches the character \ literally
#$%&()*+, a single character in the list #$%&()*+, literally (case sensitive)
\- matches the character - literally
. the literal character .
\/ matches the character / literally
:;<=>?# a single character in the list :;<=>?# literally (case sensitive)
\[ matches the character [ literally
\] matches the character ] literally
^_`{|}~ a single character in the list ^_`{|}~ literally
Note that:
A search for a missing pattern is better expressed by a negative condition in code (!patt.test...).
You need to escape certain characters like ., (, ), ?, etc. by prefixing them with a backslash (\).
var str = "Thebestthingsinlifearefree";
var patt = /[0-9A-Za-z !\\#$%&\(\)*+,\-\.\/:;<=>\?#\[\]^_`\{|\}~]/;
var res = !patt.test(str);
console.log(res);
This will print false, as expected.

How do I make my limit match greedy?

The following code only matches MN. How do I get it to match KDMN?
var str = ' New York Stock Exchange (NYSE) under the symbol "KDMN."';
var patt = new RegExp("symbol.+([A-Z]{2,5})");
var res = patt.exec(str);
console.log(res[1]);
You may use a lazy +? quantifier:
/symbol.+?([A-Z]{2,5})/
^
See the regex demo. If you keep the greedy .+, it will match as many characters as possible, and will only leave the minimum 2 chars for the next subpattern.
Or, I'd rather make this a bit more verbose:
/symbol\s+"([A-Z]{2,5})/
See another regex demo. The symbol matches a literal string symbol, \s+ will match 1 or more whitespaces, " will match a double quote, and ([A-Z]{2,5}) will capture 2 to 5 uppercase ASCII letters into Group 1.

Categories

Resources