How to determine matched group's offset in JavaScript's replace? [duplicate] - javascript

I want to match a regex like /(a).(b)(c.)d/ with "aabccde", and get the following information back:
"a" at index = 0
"b" at index = 2
"cc" at index = 3
How can I do this? String.match returns list of matches and index of the start of the complete match, not index of every capture.
Edit: A test case which wouldn't work with plain indexOf
regex: /(a).(.)/
string: "aaa"
expected result: "a" at 0, "a" at 2
Note: The question is similar to Javascript Regex: How to find index of each subexpression?, but I cannot modify the regex to make every subexpression a capturing group.

There is currently a proposal (stage 4) to implement this in native Javascript:
RegExp Match Indices for ECMAScript
ECMAScript RegExp Match Indices provide additional information about the start and end indices of captured substrings relative to the start of the input string.
...We propose the adoption of an additional indices property on the array result (the substrings array) of RegExp.prototype.exec(). This property would itself be an indices array containing a pair of start and end indices for each captured substring. Any unmatched capture groups would be undefined, similar to their corresponding element in the substrings array. In addition, the indices array would itself have a groups property containing the start and end indices for each named capture group.
Here's an example of how things would work. The following snippets run without errors in, at least, Chrome:
const re1 = /a+(?<Z>z)?/d;
// indices are relative to start of the input string:
const s1 = "xaaaz";
const m1 = re1.exec(s1);
console.log(m1.indices[0][0]); // 1
console.log(m1.indices[0][1]); // 5
console.log(s1.slice(...m1.indices[0])); // "aaaz"
console.log(m1.indices[1][0]); // 4
console.log(m1.indices[1][1]); // 5
console.log(s1.slice(...m1.indices[1])); // "z"
console.log(m1.indices.groups["Z"][0]); // 4
console.log(m1.indices.groups["Z"][1]); // 5
console.log(s1.slice(...m1.indices.groups["Z"])); // "z"
// capture groups that are not matched return `undefined`:
const m2 = re1.exec("xaaay");
console.log(m2.indices[1]); // undefined
console.log(m2.indices.groups.Z); // undefined
So, for the code in the question, we could do:
const re = /(a).(b)(c.)d/d;
const str = 'aabccde';
const result = re.exec(str);
// indices[0], like result[0], describes the indices of the full match
const matchStart = result.indices[0][0];
result.forEach((matchedStr, i) => {
const [startIndex, endIndex] = result.indices[i];
console.log(`${matchedStr} from index ${startIndex} to ${endIndex} in the original string`);
console.log(`From index ${startIndex - matchStart} to ${endIndex - matchStart} relative to the match start\n-----`);
});
Output:
aabccd from index 0 to 6 in the original string
From index 0 to 6 relative to the match start
-----
a from index 0 to 1 in the original string
From index 0 to 1 relative to the match start
-----
b from index 2 to 3 in the original string
From index 2 to 3 relative to the match start
-----
cc from index 3 to 5 in the original string
From index 3 to 5 relative to the match start
Keep in mind that the indices array contains the indices of the matched groups relative to the start of the string, not relative to the start of the match.
A polyfill is available here.

I wrote MultiRegExp for this a while ago. As long as you don't have nested capture groups, it should do the trick. It works by inserting capture groups between those in your RegExp and using all the intermediate groups to calculate the requested group positions.
var exp = new MultiRegExp(/(a).(b)(c.)d/);
exp.exec("aabccde");
should return
{0: {index:0, text:'a'}, 1: {index:2, text:'b'}, 2: {index:3, text:'cc'}}
Live Version

I created a little regexp Parser which is also able to parse nested groups like a charm. It's small but huge. No really. Like Donalds hands. I would be really happy if someone could test it, so it will be battle tested. It can be found at: https://github.com/valorize/MultiRegExp2
Usage:
let regex = /a(?: )bc(def(ghi)xyz)/g;
let regex2 = new MultiRegExp2(regex);
let matches = regex2.execForAllGroups('ababa bcdefghixyzXXXX'));
Will output:
[ { match: 'defghixyz', start: 8, end: 17 },
{ match: 'ghi', start: 11, end: 14 } ]

Updated Answer: 2022
See String.prototype.matchAll
The matchAll() method matches the string against a regular expression and returns an iterator of matching results.
Each match is an array, with the matched text as the first item, and then one item for each parenthetical capture group. It also includes the extra properties index and input.
let regexp = /t(e)(st(\d?))/g;
let str = 'test1test2';
for (let match of str.matchAll(regexp)) {
console.log(match)
}
// => ['test1', 'e', 'st1', '1', index: 0, input: 'test1test2', groups: undefined]
// => ['test2', 'e', 'st2', '2', index: 5, input: 'test1test2', groups: undefined]

Based on the ecma regular expression syntax I've written a parser respective an extension of the RegExp class which solves besides this problem (full indexed exec method) as well other limitations of the JavaScript RegExp implementation for example: Group based search & replace. You can test and download the implementation here (is as well available as NPM module).
The implementation works as follows (small example):
//Retrieve content and position of: opening-, closing tags and body content for: non-nested html-tags.
var pattern = '(<([^ >]+)[^>]*>)([^<]*)(<\\/\\2>)';
var str = '<html><code class="html plain">first</code><div class="content">second</div></html>';
var regex = new Regex(pattern, 'g');
var result = regex.exec(str);
console.log(5 === result.length);
console.log('<code class="html plain">first</code>'=== result[0]);
console.log('<code class="html plain">'=== result[1]);
console.log('first'=== result[3]);
console.log('</code>'=== result[4]);
console.log(5=== result.index.length);
console.log(6=== result.index[0]);
console.log(6=== result.index[1]);
console.log(31=== result.index[3]);
console.log(36=== result.index[4]);
I tried as well the implementation from #velop but the implementation seems buggy for example it does not handle backreferences correctly e.g. "/a(?: )bc(def(\1ghi)xyz)/g" - when adding paranthesis in front then the backreference \1 needs to be incremented accordingly (which is not the case in his implementation).

So, you have a text and a regular expression:
txt = "aabccde";
re = /(a).(b)(c.)d/;
The first step is to get the list of all substrings that match the regular expression:
subs = re.exec(txt);
Then, you can do a simple search on the text for each substring. You will have to keep in a variable the position of the last substring. I've named this variable cursor.
var cursor = subs.index;
for (var i = 1; i < subs.length; i++){
sub = subs[i];
index = txt.indexOf(sub, cursor);
cursor = index + sub.length;
console.log(sub + ' at index ' + index);
}
EDIT: Thanks to #nhahtdh, I've improved the mechanism and made a complete function:
String.prototype.matchIndex = function(re){
var res = [];
var subs = this.match(re);
for (var cursor = subs.index, l = subs.length, i = 1; i < l; i++){
var index = cursor;
if (i+1 !== l && subs[i] !== subs[i+1]) {
nextIndex = this.indexOf(subs[i+1], cursor);
while (true) {
currentIndex = this.indexOf(subs[i], index);
if (currentIndex !== -1 && currentIndex <= nextIndex)
index = currentIndex + 1;
else
break;
}
index--;
} else {
index = this.indexOf(subs[i], cursor);
}
cursor = index + subs[i].length;
res.push([subs[i], index]);
}
return res;
}
console.log("aabccde".matchIndex(/(a).(b)(c.)d/));
// [ [ 'a', 1 ], [ 'b', 2 ], [ 'cc', 3 ] ]
console.log("aaa".matchIndex(/(a).(.)/));
// [ [ 'a', 0 ], [ 'a', 1 ] ] <-- problem here
console.log("bababaaaaa".matchIndex(/(ba)+.(a*)/));
// [ [ 'ba', 4 ], [ 'aaa', 6 ] ]

I'm not exactly sure exactly what your requirements are for your search, but here's how you could get the desired output in your first example using Regex.exec() and a while-loop.
JavaScript
var myRe = /^a|b|c./g;
var str = "aabccde";
var myArray;
while ((myArray = myRe.exec(str)) !== null)
{
var msg = '"' + myArray[0] + '" ';
msg += "at index = " + (myRe.lastIndex - myArray[0].length);
console.log(msg);
}
Output
"a" at index = 0
"b" at index = 2
"cc" at index = 3
Using the lastIndex property, you can subtract the length of the currently matched string to obtain the starting index.

Related

Shortest regular expression match if already part of another match

I would like to retrieve the shortest match from a long text where strings are repeated throughout the text. However, matches within text that has already been matched aren't being found.
Here's a simplified version of the issue I'm facing:
Code: "ababc".match(/a.+c/g)
Observed result: ["ababc"]
Expected result: ["ababc", "abc"]
Therefore I'm wondering whether there is an easier way to retrieve the substring "abc" than manually writing recursive code to search within matches.
As mentioned in my comment, you can't do what you want with regex alone.
You gave a simplified example, so I'm not sure how far this will take you, but here is my stab at doing what you are looking for. I have a sneaking suspicion your "a" and "c" characters are not the same, so you will need to modify this accordingly (e.g. pass them as arguments to the function).
function getShortestMatch(str) {
var str = str || '';
var match,
index,
regex,
length,
results = [];
// iterate along the string one character at a time
for (index = 0, length = str.length; index < length; index++) {
// if the current character is 'a' (the beginning part of our substring match)
if (str[index] === 'a') {
// create a new regex that first consumes everything up to
// the starting character. Then matches for everything from there to
// the ending substring char 'c'. It is a lazy match so it will stop
// at the first matched ending char 'c'
regex = new RegExp('^.{' + index + '}(a.+?c)');
match = str.match(regex);
// if there is a match, then push to the results array
if (match && match[1]) {
results.push(match[1]);
}
}
}
// sort the results array ascending (shortest first)
results.sort(function(a,b){
return a.length - b.length;
});
// log all results matched to the console for sake of example
console.log(results);
// return the first (shortest) element
return results[0];
}
Example
getShortestMatch('ababcabbc');
// output showing all results found (from console.log in the function)
["abc", "abbc", "ababc"]
// return value
"abc"
Note: This function does not attempt to find all possible matches to "everything between an 'a' and a 'c'", since your question was about finding the shortest one. If for some reason you want all possible matches to that, then a greedy .+ regex would be thrown into the mix.
Loop through substrings starting from each successive character (using slice), matching against a regexp which is anchored to the start of the string (^), and uses non-greedy matching (?):
const input = "ababc";
const regexp = /^a.+?c/;
const results = [];
for (var i = 0; i < input.length; i++) {
var match = input.slice(i).match(regexp);
if (match) results.push(match[0]);
}
console.log("all results are", results);
var shortest = results.sort((a, b) => a.length - b.length)[0];
console.log("shortest result is", shortest);
This is the answer I went with due to its effectiveness, simplicity and efficiency:
let seq = "us warship";
let source = "The traditional US adversary has also positioned a spy ship off the coast of Delaware and carried out flights near a US Navy warship, concerning American officials.";
let re = new RegExp(`\\b${seq.replace(/\s/g, "\\b.+?\\b")}\\b`, "gi");
let snippet = null;
let matches;
while (matches = re.exec(source)) {
let match = matches[0];
if (!snippet || match.length < snippet.length) {
snippet = match;
}
re.lastIndex -= (match.length - 1);
}
console.log(snippet); // "US Navy warship"
Source: https://stackoverflow.com/a/8236152/1055499

Javascript: What does this `Array(i+1)` do?

Hi I found a solution for a problem on codewars and I'm not sure what a piece of the syntax does. The function takes a string of characters, and based on the length, returns it in a certain fashion.
input = "abcd"; output = "A-Bb-Ccc-Dddd"
input = "gFkLM"; output = "G-Ff-Kkk-Llll-Mmmmm"
This guy posted this solution
function accum(str) {
var letters = str.split('');
var result = [];
for (var i = 0; i < letters.length; i++) {
result.push(letters[i].toUpperCase() + Array(i + 1).join(letters[i].toLowerCase()));
}
return result.join('-');
}
Kinda confused about the solution overall, but one thing is particularly nagging me. See that Array(i + 1) ? What does that do? Sorry, not a very easy thing to google.
I believe that this allocates an array of length i + 1. But more importantly, what is the code doing? You have to know what the join() function does... It concatenates elements in an array delimitated by the function argument. For example:
['one', 'two', 'three'].join(' ') === 'one two three'
In this case, the array is filled with undefined elements, so you get something like this:
[undefined].join('a') === ''
[undefined, undefined].join('b') === 'b'
[undefined, undefined, undefined].join('c') === 'cc'
[undefined, undefined, undefined, undefined].join('d') === 'ddd'
So in the beginning for statement, i starts out at 0. Now if you go inside the for statement where it says i+1, i would be 1. And then when the for loop updates and i equals 1, i+1 inside the for loop would equal 2. This process would continue for the length of the string. Hope this helps.
I have just checked
let x= Array(3);
console.log(x);
The output is [undefined, undefined, undefined]
So it actually creates array of size 3 with all the elements as undefined.
When we call join wit a character as param it creates a string with the same character repeating 2 times i.e (3-1).
console.log(x.join('a')); // logs aa
Commented code walk-though ....
function accum(str) {
/* converts string to character array.*/
var letters = str.split('');
/* variable to store result */
var result = [];
/* for each character concat (1.) + (2.) and push into results.
1. letters[i].toUpperCase() :
UPPER-CASE of the character.
2. Array(i + 1).join(letters[i].toLowerCase()) :
create an array with EMPTY SLOTS of length that is, +1 than the current index.
And join them to string with the current charater's LOWER-CASE as the separator.
Ex:
Index | ArrayLength, Array | Separator | Joined String
0 1, [null] 'a' ''
1 2, [null,null] 'b' 'b'
2 3, [null,null,null] 'c' 'cc'
3 4, [null,null,null,null] 'd' 'ddd'
NOTE:
Join on an array with EMPTY SLOTS, inserts the seperator inbetween the slot values.
Meaning, if N is the length of array. Then there will be N-1 seperators inserted into the joined string
*/
for (var i = 0; i < letters.length; i++) {
result.push(letters[i].toUpperCase() + Array(i + 1).join(letters[i].toLowerCase()));
}
/* finally join all sperated by '-' and return ...*/
return result.join('-');
}

how to split a string based on value in javascript

Is there a way to seperate a string into an array based on value? For example if I have a string "1119994444455" how would I turn that into [111, 999,44444,55]? I've attempted doing this but my method doesn't seem to be working.
my code:
var nums = [];
for(i in input){
i = parseInt(i);
if(i - beforeI != 0 && beforeI >= 0){
insertionIndex++;
}
nums[insertionIndex] += i.toString();
console.log(nums[insertionIndex]);
var beforeI = i
}
You can simply use a Regular Expression, like this
console.log("1119994444455".match(/(\d)\1*/g));
// [ '111', '999', '44444', '55' ]
Here, (\d) captures a number and \1* matches zero or more occurrences of the same captured number. The g at the end makes sure we don't stop after finding the first such match.

JavaScript split string by regex

I will have a string never long than 8 characters in length, e.g.:
// represented as array to demonstrate multiple examples
var strs = [
'11111111',
'1RBN4',
'12B5'
]
When ran through a function, I would like all digit characters to be summed to return a final string:
var strsAfterFunction = [
'8',
'1RBN4',
'3B5'
]
Where you can see all of the 8 single 1 characters in the first string end up as a single 8 character string, the second string remains unchanged as at no point are there adjacent digit characters and the third string changes as the 1 and 2 characters become a 3 and the rest of the string is unchanged.
I believe the best way to do this, in pseudo-code, would be:
1. split the array by regex to find multiple digit characters that are adjacent
2. if an item in the split array contains digits, add them together
3. join the split array items
What would be the .split regex to split by multiple adajcent digit characters, e.g.:
var str = '12RB1N1'
=> ['12', 'R', 'B', '1', 'N', '1']
EDIT:
question:
What about the string "999" should the result be "27", or "9"
If it was clear, always SUM the digits, 999 => 27, 234 => 9
You can do this for the whole transformation :
var results = strs.map(function(s){
return s.replace(/\d+/g, function(n){
return n.split('').reduce(function(s,i){ return +i+s }, 0)
})
});
For your strs array, it returns ["8", "1RBN4", "3B5"].
var results = string.match(/(\d+|\D+)/g);
Testing:
"aoueoe34243euouoe34432euooue34243".match(/(\d+|\D+)/g)
Returns
["aoueoe", "34243", "euouoe", "34432", "euooue", "34243"]
George... My answer was originally similar to dystroy's, but when I got home tonight and found your comment I couldn't pass up a challenge
:)
Here it is without regexp. fwiw it might be faster, it would be an interesting benchmark since the iterations are native.
function p(s){
var str = "", num = 0;
s.split("").forEach(function(v){
if(!isNaN(v)){
(num = (num||0) + +v);
} else if(num!==undefined){
(str += num + v,num = undefined);
} else {
str += v;
}
});
return str+(num||"");
};
// TESTING
console.log(p("345abc567"));
// 12abc18
console.log(p("35abc2134mb1234mnbmn-135"));
// 8abc10mb10mnbmn-9
console.log(p("1 d0n't kn0w wh#t 3153 t0 thr0w #t th15 th1n6"));
// 1d0n't0kn0w0wh#t12t0thr0w0#t0th6th1n6
// EXTRY CREDIT
function fn(s){
var a = p(s);
return a === s ? a : fn(a);
}
console.log(fn("9599999gh999999999999999h999999999999345"));
// 5gh9h3
and here is the Fiddle & a new Fiddle without overly clever ternary

RegEx to extract all matches from string using RegExp.exec

I'm trying to parse the following kind of string:
[key:"val" key2:"val2"]
where there are arbitrary key:"val" pairs inside. I want to grab the key name and the value.
For those curious I'm trying to parse the database format of task warrior.
Here is my test string:
[description:"aoeu" uuid:"123sth"]
which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.
In node, this is my output:
[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
'uuid',
'123sth',
index: 0,
input: '[description:"aoeu" uuid:"123sth"]' ]
But description:"aoeu" also matches this pattern. How can I get all matches back?
Continue calling re.exec(s) in a loop to obtain all the matches:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;
do {
m = re.exec(s);
if (m) {
console.log(m[1], m[2]);
}
} while (m);
Try it with this JSFiddle: https://jsfiddle.net/7yS2V/
str.match(pattern), if pattern has the global flag g, will return all the matches as an array.
For example:
const str = 'All of us except #Emran, #Raju and #Noman were there';
console.log(
str.match(/#\w*/g)
);
// Will log ["#Emran", "#Raju", "#Noman"]
To loop through all matches, you can use the replace function:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
s.replace(re, function(match, g1, g2) { console.log(g1, g2); });
This is a solution
var s = '[description:"aoeu" uuid:"123sth"]';
var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
console.log(m[1], m[2]);
}
This is based on lawnsea's answer, but shorter.
Notice that the `g' flag must be set to move the internal pointer forward across invocations.
str.match(/regex/g)
returns all matches as an array.
If, for some mysterious reason, you need the additional information comes with exec, as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler :).
function findMatches(regex, str, matches = []) {
const res = regex.exec(str)
res && matches.push(res) && findMatches(regex, str, matches)
return matches
}
// Usage
const matches = findMatches(/regex/g, str)
as stated in the comments before, it's important to have g at the end of regex definition to move the pointer forward in each execution.
We are finally beginning to see a built-in matchAll function, see here for the description and compatibility table. It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera. Seems like it was drafted in December 2018 so give it some time to reach all browsers, but I trust it will get there.
The built-in matchAll function is nice because it returns an iterable. It also returns capturing groups for every match! So you can do things like
// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);
for (match of matches) {
console.log("letter before:" + match[1]);
console.log("letter after:" + match[2]);
}
arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array
It also seem like every match object uses the same format as match(). So each object is an array of the match and capturing groups, along with three additional properties index, input, and groups. So it looks like:
[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]
For more information about matchAll there is also a Google developers page. There are also polyfills/shims available.
If you have ES9
(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later)
Use the new yourString.matchAll( /your-regex/g ).
If you don't have ES9
If you have an older system, here's a function for easy copy and pasting
function findAll(regexPattern, sourceString) {
let output = []
let match
// auto-add global flag while keeping others as-is
let regexPatternWithGlobal = RegExp(regexPattern,[...new Set("g"+regexPattern.flags)].join(""))
while (match = regexPatternWithGlobal.exec(sourceString)) {
// get rid of the string copy
delete match.input
// store the match data
output.push(match)
}
return output
}
example usage:
console.log( findAll(/blah/g,'blah1 blah2') )
outputs:
[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]
Based on Agus's function, but I prefer return just the match values:
var bob = "> bob <";
function matchAll(str, regex) {
var res = [];
var m;
if (regex.global) {
while (m = regex.exec(str)) {
res.push(m[1]);
}
} else {
if (m = regex.exec(str)) {
res.push(m[1]);
}
}
return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch); // yeilds: [>, <]
Iterables are nicer:
const matches = (text, pattern) => ({
[Symbol.iterator]: function * () {
const clone = new RegExp(pattern.source, pattern.flags);
let match = null;
do {
match = clone.exec(text);
if (match) {
yield match;
}
} while (match);
}
});
Usage in a loop:
for (const match of matches('abcdefabcdef', /ab/g)) {
console.log(match);
}
Or if you want an array:
[ ...matches('abcdefabcdef', /ab/g) ]
Here is my function to get the matches :
function getAllMatches(regex, text) {
if (regex.constructor !== RegExp) {
throw new Error('not RegExp');
}
var res = [];
var match = null;
if (regex.global) {
while (match = regex.exec(text)) {
res.push(match);
}
}
else {
if (match = regex.exec(text)) {
res.push(match);
}
}
return res;
}
// Example:
var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');
res.forEach(function (item) {
console.log(item[0]);
});
If you're able to use matchAll here's a trick:
Array.From has a 'selector' parameter so instead of ending up with an array of awkward 'match' results you can project it to what you really need:
Array.from(str.matchAll(regexp), m => m[0]);
If you have named groups eg. (/(?<firstname>[a-z][A-Z]+)/g) you could do this:
Array.from(str.matchAll(regexp), m => m.groups.firstName);
Since ES9, there's now a simpler, better way of getting all the matches, together with information about the capture groups, and their index:
const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
console.log(match);
}
// ["mice", index: 0, input: "mice like to dice rice", groups:
undefined]
// ["dice", index: 13, input: "mice like to dice rice",
groups: undefined]
// ["rice", index: 18, input: "mice like to dice
rice", groups: undefined]
It is currently supported in Chrome, Firefox, Opera. Depending on when you read this, check this link to see its current support.
Use this...
var all_matches = your_string.match(re);
console.log(all_matches)
It will return an array of all matches...That would work just fine....
But remember it won't take groups in account..It will just return the full matches...
I would definatly recommend using the String.match() function, and creating a relevant RegEx for it. My example is with a list of strings, which is often necessary when scanning user inputs for keywords and phrases.
// 1) Define keywords
var keywords = ['apple', 'orange', 'banana'];
// 2) Create regex, pass "i" for case-insensitive and "g" for global search
regex = new RegExp("(" + keywords.join('|') + ")", "ig");
=> /(apple|orange|banana)/gi
// 3) Match it against any string to get all matches
"Test string for ORANGE's or apples were mentioned".match(regex);
=> ["ORANGE", "apple"]
Hope this helps!
This isn't really going to help with your more complex issue but I'm posting this anyway because it is a simple solution for people that aren't doing a global search like you are.
I've simplified the regex in the answer to be clearer (this is not a solution to your exact problem).
var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);
// We only want the group matches in the array
function purify_regex(reResult){
// Removes the Regex specific values and clones the array to prevent mutation
let purifiedArray = [...reResult];
// Removes the full match value at position 0
purifiedArray.shift();
// Returns a pure array without mutating the original regex result
return purifiedArray;
}
// purifiedResult= ["description", "aoeu"]
That looks more verbose than it is because of the comments, this is what it looks like without comments
var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);
function purify_regex(reResult){
let purifiedArray = [...reResult];
purifiedArray.shift();
return purifiedArray;
}
Note that any groups that do not match will be listed in the array as undefined values.
This solution uses the ES6 spread operator to purify the array of regex specific values. You will need to run your code through Babel if you want IE11 support.
Here's a one line solution without a while loop.
The order is preserved in the resulting list.
The potential downsides are
It clones the regex for every match.
The result is in a different form than expected solutions. You'll need to process them one more time.
let re = /\s*([^[:]+):\"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'
(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))
[ [ 'description:"aoeu"',
'description',
'aoeu',
index: 0,
input: 'description:"aoeu"',
groups: undefined ],
[ ' uuid:"123sth"',
'uuid',
'123sth',
index: 0,
input: ' uuid:"123sth"',
groups: undefined ] ]
My guess is that if there would be edge cases such as extra or missing spaces, this expression with less boundaries might also be an option:
^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
Test
const regex = /^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$/gm;
const str = `[description:"aoeu" uuid:"123sth"]
[description : "aoeu" uuid: "123sth"]
[ description : "aoeu" uuid: "123sth" ]
[ description : "aoeu" uuid : "123sth" ]
[ description : "aoeu"uuid : "123sth" ] `;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:
const re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
const matches = [...re.exec('[description:"aoeu" uuid:"123sth"]').entries()]
console.log(matches)
Basically, this is ES6 way to convert Iterator returned by exec to a regular Array
Here is my answer:
var str = '[me nombre es] : My name is. [Yo puedo] is the right word';
var reg = /\[(.*?)\]/g;
var a = str.match(reg);
a = a.toString().replace(/[\[\]]/g, "").split(','));

Categories

Resources