Regular Expressions: Capture multiple groups using quantifier - javascript

Consider the following code:
<!DOCTYPE html>
<html>
<body>
<script type="text/javascript">
var str = '<12> rnbqkb-r Rnbq-b-r ';
var pat1 = new RegExp('^\\<12\\> ([rnbqkpRNBQKP-]{8}) ([rnbqkpRNBQKP-]{8})');
var pat2 = new RegExp('^\\<12\\> ([rnbqkp RNBQKP-]{8}){2}');
var pat3 = new RegExp('^\\<12\\> ([rnbqkp RNBQKP-]{8}){2}?');
document.write(str.match(pat1));
document.write('<br />');
document.write(str.match(pat2));
document.write('<br />');
document.write(str.match(pat3));
</script>
</body>
</html>
which produces
<12> rnbqkb-r Rnbq-b-r,rnbqkb-r,Rnbq-b-r
<12> rnbqkb-r Rnbq-b-, Rnbq-b-
<12> rnbqkb-r Rnbq-b-, Rnbq-b-
as output.
Why does neither pattern pat2 nor pat3 capture the first group rnbqkb-r? I would like to capture all groups without having to repeat them explicitly as in pattern pat1.

Why does neither pattern pat2 nor pat3 capture the first group rnbqkb-r?
Because you have white-space at the end of each 8-character sequence that your regexes pat2 and pat3 do not allow.
I would like to capture all groups without having to repeat them explicitly as in pattern pat1.
You can't.
It is not possible (in JavaScript) to capture two groups when your regex only contains one group.
Groups are defined thorugh parentheses. Your match result will contain as many groups as there are parentheses pairs in your regex (except modified parentheses like (?:...) which will not count towards match groups). Want two separate group matches in your match result? Define two separate groups in your regex.
If a group can match multiple times, the group's value will be whatever it matched last. All previous match occurrences for that group will be overridden by its last match.
Try
var pat1 = /^<12> ((?:[rnbqkp-]{8} ?)*)/i,
match = str.match(pat1);
if (match) {
match[1].split(/\s+/); // ["rnbqkb-r", "Rnbq-b-r", ""]
}
Notes:
Trim str beforehand if you don't want the last empty array value.
In general, prefer regex literal notation (/expression/). Use new RegExp() only for expressions you generate from dynamic values.
< and > are not special, you don't need to escape them.

Count again (8 vs 9). pat2 and pat3 are missing the space in between the two parts.
Update: Additionally, I don't thing it's possible what you are trying to achieve by using match. See How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? and use exec.

Related

Regular Expression Finding a pattern that contains a variable and matches same variable that may contain hyphen,comma,apostrophe, and white space

I have a variable that contains a compound word (ex. cocacola, pancakes). I'm having trouble finding a regex pattern which will use that particular variable in the pattern itself and will find a match in something like coca-cola, or pan cakes.
I was thinking [variableName,+-]+, but that will find matches with each letter/character in the compound word of the variable, since it is enclosed in character set.
function regex_from(word) {
return new RegExp(word.split("").map(x => x.replace(/\W/g, '\\$1')).join("[\\-,'\\s]?"));
}
const regex = regex_from('cocacola');
console.log(regex.test('coca-cola'));
console.log(regex.test('coc aco la'));
console.log(regex.test('cocoalco'));
Idea: Split input word into single characters (and escape them for use in a regex if necessary), join them back into a single string using a separator of [\-,'\s]?, and treat the whole thing as a regex.
For example, abc becomes /a[\-,'\s]?b[\-,'\s]?c/, which matches e.g. abc, a-b-c, a b'c, ab,c, etc.

or with regular expressions in javascript

let's say I have this code:
var pattern = /^\d{1}-\d{3}-\d{3}-\d{4}$/;
//var res = pattern.replace("-", ".");
how can you not make it mandatory for the user to not always use a dash when inputting a number? I wanted to let the input a dash or a dot...
You may add a ? quantifier after them to make the dashes optional:
var pattern = /^\d[-.]?\d{3}[-.]?\d{3}[-.]?\d{4}$/;
Here, [-.]? matches 1 or 0 occurrences of a - or . character.
See the regex demo.
If you want to require a consistent way of entering data, use grouping and backreferences:
/^\d([-.]?)\d{3}\1\d{3}\1\d{4}$/;
See the same demo fiddle with this pattern.
Here, ([-.]?) forms a capturing group with ID 1, and \1 is a backreference to the value stored in the Group 1 memory buffer. So, if the first group matched ., all \1 will be only matching ..

Replace multiple spaces, multiple occurrences of a comma

I am trying to clean an input field client side.
Current Value
string = 'word, another word,word,,,,,, another word, ,,;
Desired Value after cleaning
string = 'word,another word,word,another word;
Simplified version of what I have tried http://jsfiddle.net/zg2e7/362/
You can use
var str = 'word,word,word,,,,,new word, , another word';
document.body.innerHTML = str.replace(/(?:\s*,+)+\s*/g, ',');
You need to use g modifier to find and replace all instances
You need to also match optional whitespace between commas and on both sides of them.
Regex explanation:
(?:\s*,+)+ - 1 or more sequences of
\s* - 0 or more whitespace characters
,+ - 1 or more commas.
string = 'word, another word,word,,,,,, another word, ,,';
console.log(string.replace(/(,)[,\s]+|(\s)\s+/g ,'$1').replace(/^,|,$/g,''));
Try using split and trim and map and join rather than regex being that regex can be a bit clunky.
$.map(str.split(','),function(item,i){
if(item.trim()){
return item.trim()
}
}).join(',')
So split the string by the , and then use the map function to combine them. If the item has value after being trimmed then keep the value. Then after it has been mapped to a array of the valid values join them with a comma.

match a string not after another string

This
var re = /[^<a]b/;
var str = "<a>b";
console.log(str.match(re)[0]);
matches >b.
However, I don't understand why this pattern /[^<a>]b/ doesn't match anything. I want to capture only the "b".
The reason why /[^<a>]b/ doesn't do anything is that you are ignoring <, a, and > as individual characters, so rewriting it as /[^><a]b/ would do the same thing. I doubt this is what you want, though. Try the following:
var re = /<a>(b)/;
var str = "<a>b";
console.log(str.match(re)[1]);
This regex looks for a string that looks like <a>b first, but it captures the b with the parentheses. To access the b, simply use [1] when you call .match instead of [0], which would return the entire string (<a>b).
What you're using here is a match for a b preceded by any character that is not listed in the group. The syntax [^a-z+-] where the a-z+- is a range of characters (in this case, the range of the lowercase Latin letters, a plus sign and a minus sign). So, what your regex pattern matches is any b preceded by a character that is NOT < or a. Since > doesn't fall in that range, it matches it.
The range selector basically works the same as a list of characters that are seperated by OR pipes: [abcd] matches the same as (a|b|c|d). Range selectors just have an extra functionality of also matching that same string via [a-d], using a dash in between character ranges. Putting a ^ at the start of a range automatically turns this positive range selector into a negative one, so it will match anything BUT the characters in that range.
What you are looking for is a negative lookahead. Those can exclude something from matching longer strings. Those work in this format: (?!do not match) where do not match uses the normal regex syntax. In this case, you want to test if the preceding string does not match <a>, so just use:
(?!<a>)(.{3}|^.{0,2})b
That will match the b when it is either preceded by three characters that are not <a>, or by fewer characters that are at the start of the line.
PS: what you are probably looking for is the "negative lookbehind", which sadly isn't available in JavaScript regular expressions. The way that would work is (?<!<a>)b in other languages. Because JavaScript doesn't have negative lookbehinds, you'll have to use this alternative regex.
you could write a pattern to match anchor tag and then replace it with empty string
var str = "<a>b</a>";
str = str.replace(/((<a[\w\s=\[\]\'\"\-]*>)|</a>)/gi,'')
this will replace the following strings with 'b'
<a>b</a>
<a class='link-l3'>b</a>
to better get familiar with regEx patterns you may find this website very useful regExPal
Your code :
var re = /[^<a>]b/;
var str = "<a>b";
console.log(str.match(re));
Why [^<a>]b is not matching with anything ?
The meaning of [^<a>]b is any character except < or a or > then b .
Hear b is followed by > , so it will not match .
If you want to match b , then you need to give like this :
var re = /(?:[\<a\>])(b)/;
var str = "<a>b";
console.log(str.match(re)[1]);
DEMO And EXPLANATION

Regex non capturing groups in javascript

I'm a bit rusty on my regex and javascript. I have the following string var:
var subject = "javascript:loadNewsItemWithIndex(5, null);";
I want to extract 5 using a regex. This is my regex:
/(?:loadNewsItemWithIndex\()[0-9]+/)
Applied like so:
subject.match(/(?:loadNewsItemWithIndex\()[0-9]+/)
The result is:
loadNewsItemWithIndex(5
What is cleanest, most readable way to extract 5 as a one-liner? Is it possible to do this by excluding loadNewsItemWithIndex( from the match rather than matching 5 as a sub group?
The return value from String.match is an array of matches, so you can put parentheses around the number part and just retrieve that particular match index (where the first match is the entire matched result, and subsequent entries are for each capture group):
var subject = "javascript:loadNewsItemWithIndex(5, null);";
var result = subject.match(/loadNewsItemWithIndex\(([0-9]+)/);
// ^ ^ added parens
document.writeln(result[1]);
// ^ retrieve second match (1 in 0-based indexing)
Sample code: http://jsfiddle.net/LT62w/
Edit: Thanks #Alan for the correction on how non-capturing matches work.
Actually, it's working perfectly. Text that's matched inside a
non-capturing group is still consumed, the same as text that's matched
outside of any group. A capturing group is like a non-capturing group
with extra functionality: in addition to grouping, it allows you to
extract whatever it matches independently of the overall match.
I believe the following regex should work for you:
loadNewsItemWithIndex\(([0-9]+).*$
var test = new RegExp(/loadNewsItemWithIndex\(([0-9]+).*$/);
test.exec('var subject = "javascript:loadNewsItemWithIndex(5, null);";');
The break down of this is
loadNewsItemWithIndex = exactly that
\( = open parentheses
([0-9]+) = Capture the number
.* = Anything after that number
$ = end of the line
This should suffice:
<script>
var subject = "javascript:loadNewsItemWithIndex(5, null);";
number = subject.match(/[0-9]+/);
alert(number);
</script>

Categories

Resources