Javascript .replace() with RegEx causes browser to hang/crash - javascript

My goal it to loop through a set of given elements, and replace there inner HTML with a linkifying Regex so I can convert HTML text in the form of http://*.*/* into http://*.*/*
So I'm running a bit of vanilla javascript:
for (var i = 0; i < document.getElementsByClassName('title').length; i++) {
var title = document.getElementsByClassName('title')[i]
title.innerHTML = title.innerHTML.replace(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig,"<a target='_blank' href='$1'>$1</a>")
}
Here's just the RegExp I'm using:
/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig
So, why on earth would this loop cause the browser to hang? The loop is over text no longer than 256 characters and there are usually between 5 and 30 .title elements, definitely not the levels of data that would crash/hang a browser. I've only experienced it in Chrome/Safari, unsure if it happens in Firefox/Opera or not.

Try storing results.
var titles = document.querySelectorAll(".title");
// querySelectorAll is supported in slightly more browsers than getElementsByClassName
var l = titles.length, i, title;
for( i=0; i<l; i++) {
title = titles[i];
title.innerHTML = title.innerHTML.replace(/..../,'....');
}
If the hanging continues, it's probably because the regex is matching stuff you've already replaced. Try adding a negative lookahead to ensure there is no single quote immediately after the URL you are matching. (?=!')

Related

JavaScript + Regex:: Replace "foo" with "bar" across entire document, excluding URLs

I'm trying to replace all instances of "foo" on a page with "bar", but to exclude instances occurring within image or URL links.
The current code I have is a simple replace:
document.documentElement.innerHTML = document.documentElement.innerHTML.replace(/foo/g, "bar");
But it breaks images and links containing "foo" in the address.
I'm looking for a regular expression replacement that will take the following:
foo
barfoo
foo
<img src="foo.jpg">
And give me:
bar
barbar
bar
<img src="foo.jpg">
If this can't be accomplished with regex in JavaScript, would there be a more elegant way to only run the replacement against non-URL strings?
Yeah, you're not going to want to use regex to do this. What you want to do is replace the text of every text node in your DOM tree. Try something like this.
var allElements = document.getElementsByTagName("*"); // Get every element.
for (var i = 0; i < allElements.length; i++) {
var children = allElements.item(i).childNodes;
for (var j = 0; j < children.length; j++) {
if (children[j].nodeType === 3 /* is this node a text node? */) {
children[j].nodeValue = children[j].nodeValue.replace(/* run your replacement regex here */).
}
}
}
There are 2 problems to solve.
Firstly, you need to get all the text nodes. This is a problem in and of itself.
This thread on stackoverflow discusses some techniques.
getElementsByTagName() equivalent for textNodes
Once you have your text nodes, you can run your regex on each node, and be fairly certain that you got everything.

JavaScript regex that matches the .innerHTML attribute of any element

I am currently building a Chrome extension which has to find specific pages in a website specifically the Log In / Sign In page, the Sign Up / Register page, the About page and the Contact Us page.
I am trying to achieve this by first getting the list of elements in the page (which I have already done). Now I need to check the innerHTML of the element such that it is a leaf node in the DOM and contains a part of the keyword, and I am trying to do this with a regex. I managed to build a regex which successfully returns what's in between a start or end tag of an element (i.e. the tag name along with its attributes), but not the innerHTML. Below is what I have done so far (with the example for the About page:
var list = document.body.getElementsByTagName("*");
var aboutElement = /^[^<.+>].*About.*[^(<.+>]$/i;
for (var i = 0; i <= list.length; i++) {
if ((aboutElement.test(list[i].innerHTML)) || (aboutElement.test(list[i].alt))) {
list[i].click();
}
}
Any idea what I should add to it such that it only matches leaf nodes (nodes which do not contain other nodes) and not what's in a start or end tag? I also think that with what I've done it's going to match everything in the innerHTML because of the .* part so I may need to change that as well. Any help would be greatly appreciated!
Thanks to two of the answers in the comments I managed to solve the problem. I used .textContent and changed the regex as shown below and it worked.
var list = document.body.getElementsByTagName("*");
var aboutElement = /^(.*?\s*(\bAbout\b)[^$]*)$/i;
for (var i = 0; i <= list.length; i++) {
if ((aboutElement.test(list[i].textContent)) || (aboutElement.test(list[i].alt))) {
list[i].click();
}
}

Delete from all elements of the same class

I was looking for a way to search through all elements of the same class and remove any as these were causing unwanted gaps in my page layout.
Initially I used this code:
var el = document.querySelector('.offer');
el.innerHTML = el.innerHTML.replace(' ', '');
But this only finds the first node with the class of offer so isn't much use.
I'm answering my own question because I had to piece it together from a number of posts on here as well as other sites and I hope it helps others in my position.
Firstly I needed to use .queryselectorAll instead of .querySelector to return all elements with the offer class.
But the next line wont work since .queryselectorAll returns a string of nodes rather than just the first one it comes across.
el.innerHTML = el.innerHTML.replace(' ', '');
The solution is to loop through each element and replace each instance of
var el = document.querySelectorAll('.offer');
for(var i = 0; i < el.length; i++){
el[i].innerHTML = el[i].innerHTML.replace(' ', '');
}

Why does this crash browser tab?

Its not like I couldn't do it otherwise, but I'm just curious: Why does this code crashes the browser tab?
var links = document.getElementsByTagName("a");
for (var i = 0; i < links.length; i++) {
var a = document.createElement("A");
a.innerHTML = "[?]";
a.href = links[i].href; //this is the evil line
a.onclick = function () {
return false;
};
links[i].parentNode.appendChild(a);
}
Because the NodeList (I think they call it an HTMLCollection now) you get back from getElementsByTagName is live. So when you add a new a to the document, the browser adds it to the list you're looping through. Since you add another one each time you loop, you'll never reach the end of the loop.
If you want a disconnected array or collection instead, you can do:
var collection = document.querySelectorAll("a");
or
var array = Array.prototype.slice.call(document.getElementsByTagName("a"));
querySelectorAll supports the full range of CSS selectors. It's supported by all modern browsers, and also IE8. But it may be slower than cloning the getElementsByTagName NodeList (not that that usually matters).
Element.getElementsByTagName() returns a live HTMLCollection, meaning each time you add a new link element to the page, the length of links increases, leading to an infinite loop.

how to collect form ids using jQuery

A jQuery selector $(".thumb_up") returns a collection of forms like this:
[<form id="like_post_78" ...</form> <form id="like_post_79"> ... </form>]
Ultimately I want to generate a string consisting of the numerical ending portion of the form ids.
"78,79"
What's the most efficient way of getting this?
The easiest way is probably:
var form_ids = $('form').map(function(){return this.id.replace(/[a-z_]/gi,'');}).get().join(', ');
console.log(form_ids); // or alert() or whatever...
JS Fiddle demo.
I've just updated the regex portion of the above, from /[a-z_]/gi to /\D/g (which basically greedily (g) replaces any non-digit characters (\D) with 'nothing'), to give the following code:
var form_ids = $('form').map(function(){return this.id.replace(/\D/g,'');}).get().join(', ');
console.log(form_ids);
JS Fiddle demo.
Edited after thinking on #James Hill's accurate observation (below) that easiest != most efficient:
keep in mind that the OP asks for the most efficient method, not the easiest.
Therefore, using plain JavaScript (which should, to the best of my knowledge, be available cross-browser):
var form_ids = [];
var forms = document.getElementsByTagName('form');
for (var i=0; i<forms.length; i++){
form_ids.push(forms[i].id.replace(/\D/g,''));
}
console.log(form_ids.join(', '));
JS Fiddle demo.
With a comparison of the two over at JS Perf, showing that JavaScript tends to be the faster (which implies it's more efficient, presumably because it's running native JavaScript/ECMAscript, rather than abstracted code that then calls native JavaScript/ECMAscript...)).
References:
map().
get().
join() at the MDN.
replace() at the MDN.
Option 1
Use jQuery's each() function in combination with the class selector:
var aryIDs = [];
$(".thumb_up").each(function(){
//Add ID to the array while stripping off all non-numeric data using RegEx
aryIDs.push(this.id.replace(/\D/g, ""));
});
//Get the ids
var csvIDs = aryIDs.toString();
Option 2
Grab the elements with jQuery and then use a plain old for loop:
var aryIDs = [];
var divs = $(".thumb_up");
for(var i= 0; i < divs.length; i++)
{
aryIDs.push(divs[i].id.replace(/\D/g, ""));
}
var csvIDs = aryIDs.toString();
Here's a working jsFiddle of the latter example.
Performance
As for performance, the for loop should be faster every time. Check out a simple jsPerf I created to compare the performance of .each(), .map(), and a standard for loop.
var str = "";
$(".thumb_up").each(function(){
var id = $(this).attr('id').split('like_post_');
str += id[1] + ',';
});
You'll end up with an extra "," at the end, but you can get to what you want with this basic example.
Just for the record, there is a document.forms collection that is every form in the document, so getting an array of all form ids is as simple as:
var ids = [];
var forms = document.forms;
for (var i = forms.length; i;) {
ids[--i] = forms[i].id;
}
If your definition of "efficiency" means fastest, the above should run rings about any of the jQuery answers (it does). If you only want forms with a particular class, it wouldn't be hard to filter them out with test of the form's className property in the loop.

Categories

Resources