JS / RegEx to remove characters grouped within square braces - javascript

I hope I can explain myself clearly here and that this is not too much of a specific issue.
I am working on some javascript that needs to take a string, find instances of chars between square brackets, store any returned results and then remove them from the original string.
My code so far is as follows:
parseLine : function(raw)
{
var arr = [];
var regex = /\[(.*?)]/g;
var arr;
while((arr = regex.exec(raw)) !== null)
{
console.log(" ", arr);
arr.push(arr[1]);
raw = raw.replace(/\[(.*?)]/, "");
console.log(" ", raw);
}
return {results:arr, text:raw};
}
This seems to work in most cases. If I pass in the string [id1]It [someChar]found [a#]an [id2]excellent [aa]match then it returns all the chars from within the square brackets and the original string with the bracketed groups removed.
The problem arises when I use the string [id1]It [someChar]found [a#]a [aa]match.
It seems to fail when only a single letter (and space?) follows a bracketed group and starts missing groups as you can see in the log if you try it out. It also freaks out if i use groups back to back like [a][b] which I will need to do.
I'm guessing this is my RegEx - begged and borrowed from various posts here as I know nothing about it really - but I've had no luck fixing it and could use some help if anyone has any to offer. A fix would be great but more than that an explanation of what is actually going on behind the scenes would be awesome.
Thanks in advance all.

You could use the replace method with a function to simplify the code and run the regexp only once:
function parseLine(raw) {
var results = [];
var parsed = raw.replace(/\[(.*?)\]/g, function(match,capture) {
results.push(capture);
return '';
});
return { results : results, text : parsed };
}

The problem is due to the lastIndex property of the regex /\[(.*?)]/g; not resetting, since the regex is declared as global. When the regex has global flag g on, lastIndex property of RegExp is used to mark the position to start the next attempt to search for a match, and it is expected that the same string is fed to the RegExp.exec() function (explicitly, or implicitly via RegExp.test() for example) until no more match can be found. Either that, or you reset the lastIndex to 0 before feeding in a new input.
Since your code is reassigning the variable raw on every loop, you are using the wrong lastIndex to attempt the next match.
The problem will be solved when you remove g flag from your regex. Or you could use the solution proposed by Tibos where you supply a function to String.replace() function to do replacement and extract the capturing group at the same time.

You need to escape the last bracket: \[(.*?)\].

Related

Javascript exec maintaing state

I am currently trying to build a little templating engine in Javascript by replacing tags in a html5 tag by find and replace with a regex.
I am using exec on my regular expression and I am looping over the results. I am wondering why the regular expressions breaks in its current form with the /g flag on the regular expression but is fine without it?
Check the broken example and remove the /g flag on the regular expression to view the correct output.
var TemplateEngine = function(tpl, data) {
var re = /(?:<|<)%(.*?)(?:%>|>)/g, match;
while(match = re.exec(tpl)) {
tpl = tpl.replace(match[0], data[match[1]])
}
return tpl;
}
https://jsfiddle.net/stephanv/u5d9en7n/
Can somebody explain to me a little bit more on depth why my example breaks exactly on:
<p><%more%></p>
The reason is explained in javascript string exec strange behavior.
The solution you need is actually a String.replace with a callback as a replacement:
var TemplateEngine = function(tpl, data) {
var re = /(?:<|<)%(.*?)(?:%>|>)/g, match;
return tpl.replace(re, function($0, $1) {
return data[$1] ? data[$1] : $0;
});
}
See the updated fiddle
Here, the regex finds all non-overlapping matches in the string, sequentially, and passes the match to the callback method. $0 is the full match and $1 is the Group 1 contents. If data[$1] exists, it is used to replace the whole match, else, the whole match is inserted back.
Check this link https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex. When using the g flag the object that you store the regex in (re) will keep track of the position of the last match in the lastIndex property and the next time you use that object the search will start from the position of lastIndex.
To solve this you could either manually reset the lastIndex property each time or not save the regex in an object and use it inline like so:
while(match = /(?:<|<)%(.*?)(?:%>|>)/g.exec(tpl)) {

How to split a string by a character not directly preceded by a character of the same type?

Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.
Expected output:
["We", "need", ".to", "..split", "asap"]
In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.
I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?
I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?
Thanks for the help!
Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:
var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });
It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.
EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:
var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });
(I think this regex is ugly and would welcome an improvement.)
You can do this without any lookaheads:
var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;
var matches, output = [];
while(matches = regex.exec(subject)) {
output.push(matches[1]);
}
document.write(JSON.stringify(output));
It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).
See: JavaScript Regex Global Match Groups
An alternative solution with the original one liner (plus one line) is:
document.write(JSON.stringify(
"We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
.map(function(s) { return s.replace(/^\./, ''); })
));
Take your pick!
Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.
var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
// if `val[0]` does not begin with "." split "."
// else split "." if not followed by "."
return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
});
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);
document.write(JSON.stringify(res));

Convert string to function with parameters

I am struggling with writing a regular expression to turn the string
"fetchSomething('param1','param2','param3')"
into the proper function call. I can do it with some splitting and substrings but would rather do it with a .match using capture groups for efficiency's sake (and my own education).
However when I use
'something("stuff","moreStuff","yetMoreStuff")'.match(/(?:\(|,)("?\w+"?)/g)
I get
["("stuff"", ","moreStuff"", ","yetMoreStuff""]
Which is the same result regardless of the ?:, this confuses me since I thought ?: would cause it to ignore the first capture group? Or am I completely miss understanding capture groups?
You get the whole string when you have the g flag active. If you're going only after the sub-matches, then you will need to use .exec and a loop:
var regex = /(?:\(|,)("?\w+"?)/g;
var s = 'something("stuff","moreStuff","yetMoreStuff")';
var match, matches=[];
while ( (match=regex.exec(s)) !== null ) {
matches.push(match[1]);
}
alert(matches);
jsfiddle

JavaScript regex back references returning an array of matches from single capture group (multiple groups)

I'm fairly sure after spending the night trying to find an answer that this isn't possible, and I've developed a work around - but, if someone knows of a better method, I would love to hear it...
I've gone through a lot of iterations on the code, and the following is just a line of thought really. At some point I was using the global flag, I believe, in order for match() to work, and I can't remember if it was necessary now or not.
var str = "#abc#def#ghi&jkl";
var regex = /^(?:#([a-z]+))?(?:&([a-z]+))?$/;
The idea here, in this simplified code, is the optional group 1, of which there is an unspecified amount, will match #abc, #def and #ghi. It will only capture the alpha characters of which there will be one or more. Group 2 is the same, except matches on & symbol. It should also be anchored to the start and end of the string.
I want to be able to back reference all matches of both groups, ie:
result = str.match(regex);
alert(result[1]); //abc,def,ghi
alert(result[1][0]); //abc
alert(result[1][1]); //def
alert(result[1][2]); //ghi
alert(result[2]); //jkl
My mate says this works fine for him in .net, unfortunately I simply can't get it to work - only the last matched of any group is returned in the back reference, as can be seen in the following:
(additionally, making either group optional makes a mess, as does setting global flag)
var str = "#abc#def#ghi&jkl";
var regex = /(?:#([a-z]+))(?:&([a-z]+))/;
var result = str.match(regex);
alert(result[1]); //ghi
alert(result[1][0]); //g
alert(result[2]); //jkl
The following is the solution I arrived at, capturing the whole portion in question, and creating the array myself:
var str = "#abc#def#ghi&jkl";
var regex = /^([#a-z]+)?(?:&([a-z]+))?$/;
var result = regex.exec(str);
alert(result[1]); //#abc#def#ghi
alert(result[2]); //jkl
var result1 = result[1].toString();
result[1] = result1.split('#')
alert(result[1][1]); //abc
alert(result[1][2]); //def
alert(result[1][3]); //ghi
alert(result[2]); //jkl
That's simply not how .match() works in JavaScript. The returned array is an array of simple strings. There's no "nesting" of capture groups; you just count the ( symbols from left to right.
The first string (at index [0]) is always the overall matched string. Then come the capture groups, one string (or null) per array element.
You can, as you've done, rearrange the result array to your heart's content. It's just an array.
edit — oh, and the reason your result[1][0] was "g" is that array indexing notation applied to a string gets you the individual characters of the string.

How to read all string inside parentheses using regex

I wanted to get all strings inside a parentheses pair. for example, after applying regex on
"fun('xyz'); fun('abcd'); fun('abcd.ef') { temp('no'); "
output should be
['xyz','abcd', 'abcd.ef'].
I tried many option but was not able to get desired result.
one option is
/fun\((.*?)\)/gi.exec("fun('xyz'); fun('abcd'); fun('abcd.ef')").
Store the regex in a variable, and run it in a loop...
var re = /fun\((.*?)\)/gi,
string = "fun('xyz'); fun('abcd'); fun('abcd.ef')",
matches = [],
match;
while(match = re.exec(string))
matches.push(match[1]);
Note that this only works for global regex. If you omit the g, you'll have an infinite loop.
Also note that it'll give an undesired result if there a ) between the quotation marks.
You can use this code will almost do the job:
"fun('xyz'); fun('abcd'); fun('abcd.ef')".match(/'.*?'/gi);
You'll get ["'xyz'", "'abcd'", "'abcd.ef'"] which contains extra ' around the string.
The easiest way to find what you need is to use this RegExp: /[\w.]+(?=')/g
var string = "fun('xyz'); fun('abcd'); fun('abcd.ef')";
string.match(/[\w.]+(?=')/g); // ['xyz','abcd', 'abcd.ef']
It will work with alphanumeric characters and point, you will need to change [\w.]+ to add more symbols.

Categories

Resources