Named Groups in Regexp transformed on new `RegExp` construction from string - javascript

I am trying to build a Regexp from a series of smaller Regexes in either string or primitive form.
I'm using Node v10.15.0.
Here are my 3 components individually
Month Matcher: /\b(?<month>\bjan(?:uary)?\b|\bfeb(?:ruary)?\b|\bmar(?:ch)?\b|\bapr(?:il)?\b|\bmay\b|\bjun(?:e)?\b|\bjul(?:y)?\b|\baug(?:ust)?\b|\bsep(?:tember)?\b|\boct(?:ober)?\b|\bnov(?:ember)?\b|\bdec(?:ember)?\b)/i
Day Matcher: /(?<day>\d{1,2})/i
Year Matcher: /(?<year>20\d\d)/i
I am trying to create a Regexp from each of these which would look something like this:
new RegExp(/\b(?<month>\bjan(?:uary)?\b|\bfeb(?:ruary)?\b|\bmar(?:ch)?\b|\bapr(?:il)?\b|\bmay\b|\bjun(?:e)?\b|\bjul(?:y)?\b|\baug(?:ust)?\b|\bsep(?:tember)?\b|\boct(?:ober)?\b|\bnov(?:ember)?\b|\bdec(?:ember)?\b) (?<day>\d{1,2}), (?<year>20\d\d)/i);
This would match 'Apr 14, 2018', 'Jun 25, 2019' etc etc.
I've made a number of attempts constructing with:
new RegExp(/my-pattern/i)
new RegExp('my-pattern' + 'my-other-pattern, 'i')
new RegExp(new RegExp('my-pattern', 'i') + new RegExp('other-pattern', 'i') (this one feels most silly).
One strange effect I noticed was that when I tried to build a string . via addition, the constructor would clip the output - see how the 'month' named group is altered below:
var z = new RegExp('\b(?<month>\bjan(?:uary)?\b|\bfeb(?:ruary)?\b|\bmar(?:ch)?\b|\bapr(?:il)?\b|\bmay\b|\bjun(?:e)?\b|\bjul(?:y)?\b|\baug(?:ust)?\b|\b
sep(?:tember)?\b|\boct(?:ober)?\b|\bnov(?:ember)?\b|\bdec(?:ember)?\b)' + '(?<day>\d{1,2})', 'i');
undefined
>>> (?<monthjan(?:uary)feb(?:ruary)mar(?:ch)apr(?:il)majun(?:e)jul(?:y)aug(?:ust)sep(?:tember)oct(?:ober)nov(?:ember)dec(?:ember))(?<day>d{1,2})/i
Can anyone advise on the best approach for this? Otherwise I'm likely to declare the months/days/years matchers over and over again in very verbose patterns.
Thanks

This expression might help you to match your desired date strings.
((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}
You can simplify it and reduce the boundaries if you wish.
RegEx Descriptive Graph
This graph visualizes the expression, and if you want, you can test other expressions in this link:
JavaScript Test
const regex = /((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}/gm;
const str = `Apr 14, 2018`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Basic Performance Test
This JavaScript snippet returns runtime of a 1-million times for loop for performance.
const repeat = 1;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = 'Apr 14, 2018';
const regex = /(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})/gm;
var match = string.replace(regex, "Group #1: $1");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

Related

Regex optimization and best practice

I need to parse information out from a legacy interface. We do not have the ability to update the legacy message. I'm not very proficient at regular expressions, but I managed to write one that does what I want it to do. I just need peer-review and feedback to make sure it's clean.
The message from the legacy system returns values resembling the example below.
%name0=value
%name1=value
%name2=value
Expression: /\%(.*)\=(.*)/g;
var strBody = body_text.toString();
var myRegexp = /\%(.*)\=(.*)/g;
var match = myRegexp.exec(strBody);
var objPair = {};
while (match != null) {
if (match[1]) {
objPair[match[1].toLowerCase()] = match[2];
}
match = myRegexp.exec(strBody);
}
This code works, and I can add partial matches the middle of the name/values without anything breaking. I have to assume that any combination of characters could appear in the "values" match. Meaning it could have equal and percent signs within the message.
Is this clean enough?
Is there something that could break the expression?
First of all, don't escape characters that don't need escaping: %(.*)=(.*)
The problem with your expression: An equals sign in the value would break your parser. %name0=val=ue would result in name0=val=ue instead of name0=val=ue.
One possible fix is to make the first repetition lazy by appending a question mark: %(.*?)=(.*)
But this is not optimal due to unneeded backtracking. You can do better by using a negated character class: %([^=]*)=(.*)
And finally, if empty names should not be allowed, replace the first asterisk with a plus: %([^=]+)=(.*)
This is a good resource: Regex Tutorial - Repetition with Star and Plus
Your expression is fine, and wrapping it with two capturing groups is simple to get your desired variables and values.
You likely may not need to escape some chars and it would still work.
You can use this tool and test/edit/modify/change your expressions if you wish:
%(.+)=(.+)
Since your data is pretty structured, you can also do so with string split and get the same desired outputs, if you want.
RegEx Descriptive Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
JavaScript Test
const regex = /%(.+)=(.+)/gm;
const str = `%name0=value
%name1=value
%name2=value`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = '%name0=value';
const regex = /(%(.+)=(.+))/gm;
var match = string.replace(regex, "\nGroup #1: $1 \n Group #2: $2 \n Group #3: $3 \n");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

How to search strings with brackets using Regular expressions

I have a case wherein I want to search for all Hello (World) in an array. Hello (World) is coming from a variable and can change. I want to achieve this using RegExp and not indexOf or includes methods.
testArray = ['Hello (World', 'Hello (World)', 'hello (worlD)']
My match should return index 1 & 2 as answers.
Use the RegExp constructor after escaping the string (algorithm from this answer), and use some array methods:
const testArray = ['Hello (World', 'Hello (World)', 'hello (worlD)'];
const string = "Hello (World)".replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
const regex = new RegExp(string, "i");
const indexes = testArray.map((e, i) => e.match(regex) == null ? null : i).filter(e => e != null);
console.log(indexes);
This expression might help you to do so:
(\w+)\s\((\w+)
You may not need to bound it from left and right, since your input strings are well structured. You might just focus on your desired capturing groups, which I have assumed, each one is a single word, which you can simply modify that.
With a simple string replace you can match and capture both of them.
RegEx Descriptive Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = "Hello (World";
var regex = /(\w+)\s\((\w+)/g;
var match = string.replace(regex, "$1 & $2");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
testArray = ['Hello (World', 'Hello (World)', 'hello (worlD)'];
let indexes = [];
testArray.map((word,i)=>{
if(word.match(/\(.*\)/)){
indexes.push(i);
}
});
console.log(indexes);

RegEx for matching the first word

I have the following prop {priority} that outputs ‘high priority’, is there a way I can render it simply as ‘high’? could I use standard js or something like the below?
var getPriority = {priority};
var priority = getPriority.replace( regex );
console.log( priority );
How do I solve this problem?
If you wish to do that with a regular expression, this expression would do so, even if there might be a misspelling in the word "priority":
(.+)(\s[priorty]+)
It can simply use capturing groups for capturing your desired word before "priority". If you wish to add any boundaries to it, it would be much easier to do so, especially if your input string would change.
Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
const regex = /(.+)(\s[priorty]+)/gmi;
const str = `high priority
low priority
medium priority
under-processing pririty
under-processing priority
400-urget priority
400-urget Priority
400-urget PRIority`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = "high priority";
var regex = /(.+)(\s[priorty]+)/gmi;
var match = string.replace(regex, "$1");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
you can use substring to get your required string
var str = 'high priority';
console.log(str.substring(0, 4));
// expected output: "high"
so in your code
var getPriority = {priority};
var priority = getPriority.priority.substring(0, 4);
console.log( priority );
You can simply get the only first element of string using .split():
Code below will show first word of string:
var getPriority = {priority};
console.log( getPriority.priority.split(' ', 1)[0]);
Or if priority value always has priority word in the end, you can get rid of it just making it as a separator for .split():
var getPriority = {priority};
console.log( getPriority.priority.split(' priority')[0] );

Get first letter of each word in a string, in JavaScript

How would you go around to collect the first letter of each word in a string, as in to receive an abbreviation?
Input: "Java Script Object Notation"
Output: "JSON"
I think what you're looking for is the acronym of a supplied string.
var str = "Java Script Object Notation";
var matches = str.match(/\b(\w)/g); // ['J','S','O','N']
var acronym = matches.join(''); // JSON
console.log(acronym)
Note: this will fail for hyphenated/apostrophe'd words Help-me I'm Dieing will be HmImD. If that's not what you want, the split on space, grab first letter approach might be what you want.
Here's a quick example of that:
let str = "Java Script Object Notation";
let acronym = str.split(/\s/).reduce((response,word)=> response+=word.slice(0,1),'')
console.log(acronym);
I think you can do this with
'Aa Bb'.match(/\b\w/g).join('')
Explanation: Obtain all /g the alphanumeric characters \w that occur after a non-alphanumeric character (i.e: after a word boundary \b), put them on an array with .match() and join everything in a single string .join('')
Depending on what you want to do you can also consider simply selecting all the uppercase characters:
'JavaScript Object Notation'.match(/[A-Z]/g).join('')
Easiest way without regex
var abbr = "Java Script Object Notation".split(' ').map(function(item){return item[0]}).join('');
This is made very simple with ES6
string.split(' ').map(i => i.charAt(0)) //Inherit case of each letter
string.split(' ').map(i => i.charAt(0)).toUpperCase() //Uppercase each letter
string.split(' ').map(i => i.charAt(0)).toLowerCase() //lowercase each letter
This ONLY works with spaces or whatever is defined in the .split(' ') method
ie, .split(', ') .split('; '), etc.
string.split(' ') .map(i => i.charAt(0)) .toString() .toUpperCase().split(',')
To add to the great examples, you could do it like this in ES6
const x = "Java Script Object Notation".split(' ').map(x => x[0]).join('');
console.log(x); // JSON
and this works too but please ignore it, I went a bit nuts here :-)
const [j,s,o,n] = "Java Script Object Notation".split(' ').map(x => x[0]);
console.log(`${j}${s}${o}${n}`);
#BotNet flaw:
i think i solved it after excruciating 3 days of regular expressions tutorials:
==> I'm a an animal
(used to catch m of I'm) because of the word boundary, it seems to work for me that way.
/(\s|^)([a-z])/gi
Try -
var text = '';
var arr = "Java Script Object Notation".split(' ');
for(i=0;i<arr.length;i++) {
text += arr[i].substr(0,1)
}
alert(text);
Demo - http://jsfiddle.net/r2maQ/
Using map (from functional programming)
'use strict';
function acronym(words)
{
if (!words) { return ''; }
var first_letter = function(x){ if (x) { return x[0]; } else { return ''; }};
return words.split(' ').map(first_letter).join('');
}
Alternative 1:
you can also use this regex to return an array of the first letter of every word
/(?<=(\s|^))[a-z]/gi
(?<=(\s|^)) is called positive lookbehind which make sure the element in our search pattern is preceded by (\s|^).
so, for your case:
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.match(/(?<=(\s|^))[a-z]/gi)
.join('')
.toUpperCase();
};
toAbbr("java script object notation"); //result JSON
(by the way, there are also negative lookbehind, positive lookahead, negative lookahead, if you want to learn more)
Alternative 2:
match all the words and use replace() method to replace them with the first letter of each word and ignore the space (the method will not mutate your original string)
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.replace(/(\S+)(\s*)/gi, (match, p1, p2) => p1[0].toUpperCase());
};
toAbbr("java script object notation"); //result JSON
// word = not space = \S+ = p1 (p1 is the first pattern)
// space = \s* = p2 (p2 is the second pattern)
It's important to trim the word before splitting it, otherwise, we'd lose some letters.
const getWordInitials = (word: string): string => {
const bits = word.trim().split(' ');
return bits
.map((bit) => bit.charAt(0))
.join('')
.toUpperCase();
};
$ getWordInitials("Java Script Object Notation")
$ "JSON"
How about this:
var str = "", abbr = "";
str = "Java Script Object Notation";
str = str.split(' ');
for (i = 0; i < str.length; i++) {
abbr += str[i].substr(0,1);
}
alert(abbr);
Working Example.
If you came here looking for how to do this that supports non-BMP characters that use surrogate pairs:
initials = str.split(' ')
.map(s => String.fromCodePoint(s.codePointAt(0) || '').toUpperCase())
.join('');
Works in all modern browsers with no polyfills (not IE though)
Getting first letter of any Unicode word in JavaScript is now easy with the ECMAScript 2018 standard:
/(?<!\p{L}\p{M}*)\p{L}/gu
This regex finds any Unicode letter (see the last \p{L}) that is not preceded with any other letter that can optionally have diacritic symbols (see the (?<!\p{L}\p{M}*) negative lookbehind where \p{M}* matches 0 or more diacritic chars). Note that u flag is compulsory here for the Unicode property classes (like \p{L}) to work correctly.
To emulate a fully Unicode-aware \b, you'd need to add a digit matching pattern and connector punctuation:
/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu
It works in Chrome, Firefox (since June 30, 2020), Node.js, and the majority of other environments (see the compatibility matrix here), for any natural language including Arabic.
Quick test:
const regex = /(?<!\p{L}\p{M}*)\p{L}/gu;
const string = "Żerard Łyżwiński";
// Extracting
console.log(string.match(regex)); // => [ "Ż", "Ł" ]
// Extracting and concatenating into string
console.log(string.match(regex).join("")) // => ŻŁ
// Removing
console.log(string.replace(regex, "")) // => erard yżwiński
// Enclosing (wrapping) with a tag
console.log(string.replace(regex, "<span>$&</span>")) // => <span>Ż</span>erard <span>Ł</span>yżwiński
console.log("_Łukasz 1Żukowski".match(/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu)); // => null
In ES6:
function getFirstCharacters(str) {
let result = [];
str.split(' ').map(word => word.charAt(0) != '' ? result.push(word.charAt(0)) : '');
return result;
}
const str1 = "Hello4 World65 123 !!";
const str2 = "123and 456 and 78-1";
const str3 = " Hello World !!";
console.log(getFirstCharacters(str1));
console.log(getFirstCharacters(str2));
console.log(getFirstCharacters(str3));
Output:
[ 'H', 'W', '1', '!' ]
[ '1', '4', 'a', '7' ]
[ 'H', 'W', '!' ]
This should do it.
var s = "Java Script Object Notation",
a = s.split(' '),
l = a.length,
i = 0,
n = "";
for (; i < l; ++i)
{
n += a[i].charAt(0);
}
console.log(n);
The regular expression versions for JavaScript is not compatible with Unicode on older than ECMAScript 6, so for those who want to support characters such as "å" will need to rely on non-regex versions of scripts.
Event when on version 6, you need to indicate Unicode with \u.
More details: https://mathiasbynens.be/notes/es6-unicode-regex
Yet another option using reduce function:
var value = "Java Script Object Notation";
var result = value.split(' ').reduce(function(previous, current){
return {v : previous.v + current[0]};
},{v:""});
$("#output").text(result.v);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<pre id="output"/>
This is similar to others, but (IMHO) a tad easier to read:
const getAcronym = title =>
title.split(' ')
.map(word => word[0])
.join('');
ES6 reduce way:
const initials = inputStr.split(' ').reduce((result, currentWord) =>
result + currentWord.charAt(0).toUpperCase(), '');
alert(initials);
Try This Function
const createUserName = function (name) {
const username = name
.toLowerCase()
.split(' ')
.map((elem) => elem[0])
.join('');
return username;
};
console.log(createUserName('Anisul Haque Bhuiyan'));

RegEx to extract all matches from string using RegExp.exec

I'm trying to parse the following kind of string:
[key:"val" key2:"val2"]
where there are arbitrary key:"val" pairs inside. I want to grab the key name and the value.
For those curious I'm trying to parse the database format of task warrior.
Here is my test string:
[description:"aoeu" uuid:"123sth"]
which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.
In node, this is my output:
[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
'uuid',
'123sth',
index: 0,
input: '[description:"aoeu" uuid:"123sth"]' ]
But description:"aoeu" also matches this pattern. How can I get all matches back?
Continue calling re.exec(s) in a loop to obtain all the matches:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;
do {
m = re.exec(s);
if (m) {
console.log(m[1], m[2]);
}
} while (m);
Try it with this JSFiddle: https://jsfiddle.net/7yS2V/
str.match(pattern), if pattern has the global flag g, will return all the matches as an array.
For example:
const str = 'All of us except #Emran, #Raju and #Noman were there';
console.log(
str.match(/#\w*/g)
);
// Will log ["#Emran", "#Raju", "#Noman"]
To loop through all matches, you can use the replace function:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
s.replace(re, function(match, g1, g2) { console.log(g1, g2); });
This is a solution
var s = '[description:"aoeu" uuid:"123sth"]';
var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
console.log(m[1], m[2]);
}
This is based on lawnsea's answer, but shorter.
Notice that the `g' flag must be set to move the internal pointer forward across invocations.
str.match(/regex/g)
returns all matches as an array.
If, for some mysterious reason, you need the additional information comes with exec, as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler :).
function findMatches(regex, str, matches = []) {
const res = regex.exec(str)
res && matches.push(res) && findMatches(regex, str, matches)
return matches
}
// Usage
const matches = findMatches(/regex/g, str)
as stated in the comments before, it's important to have g at the end of regex definition to move the pointer forward in each execution.
We are finally beginning to see a built-in matchAll function, see here for the description and compatibility table. It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera. Seems like it was drafted in December 2018 so give it some time to reach all browsers, but I trust it will get there.
The built-in matchAll function is nice because it returns an iterable. It also returns capturing groups for every match! So you can do things like
// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);
for (match of matches) {
console.log("letter before:" + match[1]);
console.log("letter after:" + match[2]);
}
arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array
It also seem like every match object uses the same format as match(). So each object is an array of the match and capturing groups, along with three additional properties index, input, and groups. So it looks like:
[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]
For more information about matchAll there is also a Google developers page. There are also polyfills/shims available.
If you have ES9
(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later)
Use the new yourString.matchAll( /your-regex/g ).
If you don't have ES9
If you have an older system, here's a function for easy copy and pasting
function findAll(regexPattern, sourceString) {
let output = []
let match
// auto-add global flag while keeping others as-is
let regexPatternWithGlobal = RegExp(regexPattern,[...new Set("g"+regexPattern.flags)].join(""))
while (match = regexPatternWithGlobal.exec(sourceString)) {
// get rid of the string copy
delete match.input
// store the match data
output.push(match)
}
return output
}
example usage:
console.log( findAll(/blah/g,'blah1 blah2') )
outputs:
[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]
Based on Agus's function, but I prefer return just the match values:
var bob = "> bob <";
function matchAll(str, regex) {
var res = [];
var m;
if (regex.global) {
while (m = regex.exec(str)) {
res.push(m[1]);
}
} else {
if (m = regex.exec(str)) {
res.push(m[1]);
}
}
return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch); // yeilds: [>, <]
Iterables are nicer:
const matches = (text, pattern) => ({
[Symbol.iterator]: function * () {
const clone = new RegExp(pattern.source, pattern.flags);
let match = null;
do {
match = clone.exec(text);
if (match) {
yield match;
}
} while (match);
}
});
Usage in a loop:
for (const match of matches('abcdefabcdef', /ab/g)) {
console.log(match);
}
Or if you want an array:
[ ...matches('abcdefabcdef', /ab/g) ]
Here is my function to get the matches :
function getAllMatches(regex, text) {
if (regex.constructor !== RegExp) {
throw new Error('not RegExp');
}
var res = [];
var match = null;
if (regex.global) {
while (match = regex.exec(text)) {
res.push(match);
}
}
else {
if (match = regex.exec(text)) {
res.push(match);
}
}
return res;
}
// Example:
var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');
res.forEach(function (item) {
console.log(item[0]);
});
If you're able to use matchAll here's a trick:
Array.From has a 'selector' parameter so instead of ending up with an array of awkward 'match' results you can project it to what you really need:
Array.from(str.matchAll(regexp), m => m[0]);
If you have named groups eg. (/(?<firstname>[a-z][A-Z]+)/g) you could do this:
Array.from(str.matchAll(regexp), m => m.groups.firstName);
Since ES9, there's now a simpler, better way of getting all the matches, together with information about the capture groups, and their index:
const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
console.log(match);
}
// ["mice", index: 0, input: "mice like to dice rice", groups:
undefined]
// ["dice", index: 13, input: "mice like to dice rice",
groups: undefined]
// ["rice", index: 18, input: "mice like to dice
rice", groups: undefined]
It is currently supported in Chrome, Firefox, Opera. Depending on when you read this, check this link to see its current support.
Use this...
var all_matches = your_string.match(re);
console.log(all_matches)
It will return an array of all matches...That would work just fine....
But remember it won't take groups in account..It will just return the full matches...
I would definatly recommend using the String.match() function, and creating a relevant RegEx for it. My example is with a list of strings, which is often necessary when scanning user inputs for keywords and phrases.
// 1) Define keywords
var keywords = ['apple', 'orange', 'banana'];
// 2) Create regex, pass "i" for case-insensitive and "g" for global search
regex = new RegExp("(" + keywords.join('|') + ")", "ig");
=> /(apple|orange|banana)/gi
// 3) Match it against any string to get all matches
"Test string for ORANGE's or apples were mentioned".match(regex);
=> ["ORANGE", "apple"]
Hope this helps!
This isn't really going to help with your more complex issue but I'm posting this anyway because it is a simple solution for people that aren't doing a global search like you are.
I've simplified the regex in the answer to be clearer (this is not a solution to your exact problem).
var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);
// We only want the group matches in the array
function purify_regex(reResult){
// Removes the Regex specific values and clones the array to prevent mutation
let purifiedArray = [...reResult];
// Removes the full match value at position 0
purifiedArray.shift();
// Returns a pure array without mutating the original regex result
return purifiedArray;
}
// purifiedResult= ["description", "aoeu"]
That looks more verbose than it is because of the comments, this is what it looks like without comments
var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);
function purify_regex(reResult){
let purifiedArray = [...reResult];
purifiedArray.shift();
return purifiedArray;
}
Note that any groups that do not match will be listed in the array as undefined values.
This solution uses the ES6 spread operator to purify the array of regex specific values. You will need to run your code through Babel if you want IE11 support.
Here's a one line solution without a while loop.
The order is preserved in the resulting list.
The potential downsides are
It clones the regex for every match.
The result is in a different form than expected solutions. You'll need to process them one more time.
let re = /\s*([^[:]+):\"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'
(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))
[ [ 'description:"aoeu"',
'description',
'aoeu',
index: 0,
input: 'description:"aoeu"',
groups: undefined ],
[ ' uuid:"123sth"',
'uuid',
'123sth',
index: 0,
input: ' uuid:"123sth"',
groups: undefined ] ]
My guess is that if there would be edge cases such as extra or missing spaces, this expression with less boundaries might also be an option:
^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
Test
const regex = /^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$/gm;
const str = `[description:"aoeu" uuid:"123sth"]
[description : "aoeu" uuid: "123sth"]
[ description : "aoeu" uuid: "123sth" ]
[ description : "aoeu" uuid : "123sth" ]
[ description : "aoeu"uuid : "123sth" ] `;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:
const re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
const matches = [...re.exec('[description:"aoeu" uuid:"123sth"]').entries()]
console.log(matches)
Basically, this is ES6 way to convert Iterator returned by exec to a regular Array
Here is my answer:
var str = '[me nombre es] : My name is. [Yo puedo] is the right word';
var reg = /\[(.*?)\]/g;
var a = str.match(reg);
a = a.toString().replace(/[\[\]]/g, "").split(','));

Categories

Resources