Javascript exec maintaing state

Javascript exec maintaing state - javascript

I am currently trying to build a little templating engine in Javascript by replacing tags in a html5 tag by find and replace with a regex.
I am using exec on my regular expression and I am looping over the results. I am wondering why the regular expressions breaks in its current form with the /g flag on the regular expression but is fine without it?
Check the broken example and remove the /g flag on the regular expression to view the correct output.
var TemplateEngine = function(tpl, data) {
var re = /(?:<|<)%(.*?)(?:%>|>)/g, match;
while(match = re.exec(tpl)) {
tpl = tpl.replace(match[0], data[match[1]])
}
return tpl;
}
https://jsfiddle.net/stephanv/u5d9en7n/
Can somebody explain to me a little bit more on depth why my example breaks exactly on:
<p><%more%></p>

The reason is explained in javascript string exec strange behavior.
The solution you need is actually a String.replace with a callback as a replacement:
var TemplateEngine = function(tpl, data) {
var re = /(?:<|<)%(.*?)(?:%>|>)/g, match;
return tpl.replace(re, function($0, $1) {
return data[$1] ? data[$1] : $0;
});
}
See the updated fiddle
Here, the regex finds all non-overlapping matches in the string, sequentially, and passes the match to the callback method. $0 is the full match and $1 is the Group 1 contents. If data[$1] exists, it is used to replace the whole match, else, the whole match is inserted back.

Check this link https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex. When using the g flag the object that you store the regex in (re) will keep track of the position of the last match in the lastIndex property and the next time you use that object the search will start from the position of lastIndex.
To solve this you could either manually reset the lastIndex property each time or not save the regex in an object and use it inline like so:
while(match = /(?:<|<)%(.*?)(?:%>|>)/g.exec(tpl)) {

Related

Match only # and not ## without negative lookbehind

Using JavaScript, I need a regex that matches any instance of #{this-format} in any string. My original regex was the following:
#{[a-z-]*}
However, I also need a way to "escape" those instances. I want it so that if you add an extra #, the match gets escaped, like ##{this}.
I originally used a negative lookbehind:
(?<!#)#{[a-z-]*}
And that would work just fine, except... lookbehinds are an ECMAScript2018 feature, only supported by Chrome.
I read some people suggesting the usage of a negated character set. So my little regex became this:
(?:^|[^#])#{[a-z-]*}
...which would have worked just as well, except it doesn't work if you put two of these together: #{foo}#{bar}
So, anyone knows how can I achieve this? Remember that these conditions need to be met:
Find #{this} anywhere in a string
Be able to escape like ##{this}
Be able to put multiple adjacent, like #{these}#{two}
Lookbehinds must not be used

If you include ## in your regex pattern as an alternate match option, it will consume the ## instead of allowing a match on the subsequent bracketed entity. Like this:
##|(#{[a-z-]*})
You can then evaluate the inner match object in javascript. Here is a jsfiddle to demonstrate, using the following code.
var targetText = '#{foo} in a #{bar} for a ##{foo} and #{foo}#{bar} things.'
var reg = /##|(#{[a-z-]*})/g;
var result;
while((result = reg.exec(targetText)) !== null) {
if (result[1] !== undefined) {
alert(result[1]);
}
}

You could use (?:^|[^#])# to match the start of the pattern, and capture the following #{<sometext>} in a group. Since you don't want the initial (possible) [^#] to be in the result, you'll have to iterate over the matches manually and extract the group that contains the substring you want. For example:
function test(str) {
const re = /(?=(?:^|[^#])(#{[a-z-]*}))./g;
let match;
const matches = [];
while (match = re.exec(str)) {
matches.push(match[1]); // extract the captured group
}
return matches;
}
console.log(test('##{this}'))
console.log(test('#{these}#{two}'))

Regex do not match content but whole searched string

I'm using this regex to match an "href" attribute in a <a> tag:
var href_matches = postRep.match(/href="(.*?)"/g);
The regex matches correctly the href except it returns the whole "href=http:example.com" string.
How do I manage to get only the href value (eg. "example.com")?

You can either run exec() on the regex :
var url_match = /href="(.*?)"/g.exec(postRep);
or remove the global flag
var url_match = postRep.match(/href="(.*?)"/);
Using String's match() function won't return captured groups if the
global modifier is set.

Just another idea.
You can try something like this function:
function getHrefs(inputString) {
var out = [];
inputString.replace(/\bhref\b=['"]([^'"]+)['"]/gi, function(result, backreference) {
out.push(backreference);
return '';
});
return out;
}
Improved solution (much shortest):
function getHrefs(inputString) {
return (inputString.match(/\bhref\b=['"][^'"]+(?=['"])/gi) || []).map(s => s.replace(/^href=["']/,""));
}
Edit:
There is other option - exec. But with exec you will need loop to get all matches (if you need this).

You can use regex lookbehinds to check if the "href=" is there without actually including it in the match.
For example, the regex (?<=href=)example\.com applied to href=example.com should only match example.com.
EDIT:
This method only works in languages that support regex lookbehinds. Javascript doesn't support this feature. (thanks to Georgi Naumov for pointing this out)

Convert string to function with parameters

I am struggling with writing a regular expression to turn the string
"fetchSomething('param1','param2','param3')"
into the proper function call. I can do it with some splitting and substrings but would rather do it with a .match using capture groups for efficiency's sake (and my own education).
However when I use
'something("stuff","moreStuff","yetMoreStuff")'.match(/(?:\(|,)("?\w+"?)/g)
I get
["("stuff"", ","moreStuff"", ","yetMoreStuff""]
Which is the same result regardless of the ?:, this confuses me since I thought ?: would cause it to ignore the first capture group? Or am I completely miss understanding capture groups?

You get the whole string when you have the g flag active. If you're going only after the sub-matches, then you will need to use .exec and a loop:
var regex = /(?:\(|,)("?\w+"?)/g;
var s = 'something("stuff","moreStuff","yetMoreStuff")';
var match, matches=[];
while ( (match=regex.exec(s)) !== null ) {
matches.push(match[1]);
}
alert(matches);
jsfiddle

javascript regexp match tag names

I can't remember the name of it, but I believe you can reference already matched strings within a RegExp object. What I want to do is match all tags within a given string eg
<ul><li>something in the list</li></ul>
the RegExp should be able to match only the same tags, then I will use a recursive function to put all the individual matches in an array. The regex that should work if I can reference the first match would be.
var reg = /(?:<(.*)>(.*)<(?:FIRST_MATCH)\/>)/g;
The matched array should then contain
match[0] = "<ul><li>something in the list</li></ul>";
match[1] = "ul";
match[2] = ""; // no text to match
match[3] = "li";
match[4] = "something in the list";
thanks for any help

It seems like you mean backreference (\1, \2):
var s = '<ul><li>something in the list</li></ul>';
s.match(/<([^>]+)><([^>]+)>(.*?)<\/\2><\/\1>/)
// => ["<ul><li>something in the list</li></ul>",
// "ul",
// "li",
// "something in the list"]
The result is not exactly same with what you want. But point is that the backreference \1, \2 match the string that was matched by earlier group.

It is not possible to parse HTML using regular expressions (if you're interested in the specifics, it is because HTML parsing requires a stronger type of automaton than a finite state automaton which is what a regular expression can express - look up FSA vs FST for more info).
You might be able to get away with some hack for a specific problem, but if you want to reliably parse HTML using Javascript then there are other ways to do this. Search the web for: parse html javascript and you'll get plenty of pointers on how to do this.

I made a dirty workaround. Still needs work thought.
var str = '<div><ul id="list"><li class="something">this is the text</li></ul></div>';
function parseHTMLFromString(str){
var structure = [];
var matches = [];
var reg = /(<(.+)(?:\s([^>]+))*>)(.*)<\/\2>/;
str.replace(reg, function(){
//console.log(arguments);
matches.push(arguments[4]);
structure.push(arguments[1], arguments[4]);
});
while(matches.length){
matches.shift().replace(reg, function(){
console.log(arguments);
structure.pop();
structure.push(arguments[1], arguments[4]);
matches.push(arguments[4]);
});
}
return structure;
}
// parseHTMLFromString(str); // ["<div>", "<ul id="list">", "<li class="something">", "this is the text"]

JS / RegEx to remove characters grouped within square braces

I hope I can explain myself clearly here and that this is not too much of a specific issue.
I am working on some javascript that needs to take a string, find instances of chars between square brackets, store any returned results and then remove them from the original string.
My code so far is as follows:
parseLine : function(raw)
{
var arr = [];
var regex = /\[(.*?)]/g;
var arr;
while((arr = regex.exec(raw)) !== null)
{
console.log(" ", arr);
arr.push(arr[1]);
raw = raw.replace(/\[(.*?)]/, "");
console.log(" ", raw);
}
return {results:arr, text:raw};
}
This seems to work in most cases. If I pass in the string [id1]It [someChar]found [a#]an [id2]excellent [aa]match then it returns all the chars from within the square brackets and the original string with the bracketed groups removed.
The problem arises when I use the string [id1]It [someChar]found [a#]a [aa]match.
It seems to fail when only a single letter (and space?) follows a bracketed group and starts missing groups as you can see in the log if you try it out. It also freaks out if i use groups back to back like [a][b] which I will need to do.
I'm guessing this is my RegEx - begged and borrowed from various posts here as I know nothing about it really - but I've had no luck fixing it and could use some help if anyone has any to offer. A fix would be great but more than that an explanation of what is actually going on behind the scenes would be awesome.
Thanks in advance all.

You could use the replace method with a function to simplify the code and run the regexp only once:
function parseLine(raw) {
var results = [];
var parsed = raw.replace(/\[(.*?)\]/g, function(match,capture) {
results.push(capture);
return '';
});
return { results : results, text : parsed };
}

The problem is due to the lastIndex property of the regex /\[(.*?)]/g; not resetting, since the regex is declared as global. When the regex has global flag g on, lastIndex property of RegExp is used to mark the position to start the next attempt to search for a match, and it is expected that the same string is fed to the RegExp.exec() function (explicitly, or implicitly via RegExp.test() for example) until no more match can be found. Either that, or you reset the lastIndex to 0 before feeding in a new input.
Since your code is reassigning the variable raw on every loop, you are using the wrong lastIndex to attempt the next match.
The problem will be solved when you remove g flag from your regex. Or you could use the solution proposed by Tibos where you supply a function to String.replace() function to do replacement and extract the capturing group at the same time.

You need to escape the last bracket: \[(.*?)\].

Develop Reference

JavaScript is the programming language of the Web.

Javascript exec maintaing state - javascript

Related

Match only # and not ## without negative lookbehind

Regex do not match content but whole searched string

Convert string to function with parameters

javascript regexp match tag names

JS / RegEx to remove characters grouped within square braces

Categories

Resources