How to prevent resource loading of unattached elements in Chrome - javascript

I'm working on Chrome extension and I have following problem:
var myDiv = document.createElement('div');
myDiv.innerHTML = '<img src="a.png">';
What happens now is that Chrome tries to load the "a.png" resource, even If I don't attach the "div" element to document. Is there a way to prevent it?
_In the extension I need to get data from a site that doesn't provide any API, so I have to parse the whole HTML to get the necessary data. Writing my own simple HTML parser could be tricky so I would rather use the native HTML parser. However, in Chrome when I put the whole source code to some temporary non-attached element (so it would get parsed and I could filter the necessary data), ale the images (and possibly other resources) start to load as well, causing higher traffic or (in case of relative paths) lots of errors in console. _

To prevent the resources from being loaded, you'll need to create your Node in an entirely new #document. You can use document.implementation.createHTMLDocument for this.
var dom = document.implementation.createHTMLDocument(); // make new #document
// now use this to..
var myDiv = dom.createElement('div'); // ..create a <div>
myDiv.innerHTML = '<img src="a.png">'; // ..parse HTML

You can delay parsing/loading html by storing it in non-standard attribute, then assigning it to innerHtml, "when the time comes":
myDiv.setAttribute('deferredHtml', '<img src="http://upload.wikimedia.org/wikipedia/commons/4/4e/Single_apple.png">');
global.loadDeferredImage = function() {
if(myDiv.hasAttribute('deferredHtml')) {
myDiv.innerHTML = myDiv.getAttribute('deferredHtml');
myDiv.removeAttribute('deferredHtml');
}
};
... onclick="loadDeferredImage()"
I created jsfiddle illustrating this idea:
http://jsfiddle.net/akhikhl/CbCst/3/

Related

Get elements inside several html documents

I have a functionality in my system that transcripts from voice to text using an external library.
This is what the library renders:
What I need is really simple: to get the text from the generated textareas.
The textareas are rendered without any name or id, so I can only access them by class in the Google Chrome console. Whenever I try to get them by class in my javascript code, I get an array of [0] elements.
I think that the problem is that this library renders a new #document and I'm not able to get it's content in my $(document).ready function because it scopes the 'parent' document.
How it renders.
Any thoughts on this? Thank you.
I hope the code below helps.
// Get you iframe by Id or other way
let iframe = document.getElementById("myFrame");
// After iframe has been loaded
iframe.onload= function() {
// Get the element inside your iframe
// There are a lot of ways to do it
// It is good practice to store DOM objects in variables that start with $
let $elementByTag = iframe.contentWindow.document.getElementsByTagName("p")[0];
let $elementById = iframe.contentWindow.document.getElementById("elementId");
let $elementByClass = iframe.contentWindow.document.getElementsByClassName("classHere");
let $elementBySelector = iframe.contentWindow.document.querySelector("#dad .classname");
// After get the element extract the text/html
let text = $element.innerText
let html = $element.innerHTML
};

Erase/reset DOM and global variables with JavaScript

I'm writing an electron app, but it's a question about JavaScript/HTML5 in general. I want to load local content in a webview and then open iframes from particular remote resource inside it. Unfortunately I can't because of X-FRAME options. So I came with a workaround. The idea is to load the remote content, erase the dom and inject my own local content using custom file protocol to embed local resources.
Basically I want to totally erase everything, no matter what is loaded into the webview. I got the erasing the dom part with document.write(). But how could I unset all variables that could have been set by that page? Or could I prevent the document from being written to in the first place? Or is there any better, less hacky way to do what I want to do? This is my current code which erases dom:
It runs from a preload script, before anything else:
(function() {
var originalProperties = Object.getOwnPropertyNames(window); //global variables, before dom is loaded
var injectDOM = function() {
document.removeEventListener('DOMContentLoaded', injectDOM);
//trying to erase global variables set by remote resource, if any. Is there a better way?
var newProperties = Object.getOwnPropertyNames(window);
var difference = newProperties.filter(x => originalProperties.indexOf(x) == -1);
for (i = 0; i < newVariables.length; i++) {
if (window.hasOwnProperty(newVariables[i])) {
window[newVariables[i]] = null;
delete window[newVariables[i]]
//some variables still stay, delete return false, however they are nulled
//but is there a better way to do that and what about possible attached event listeners?
}
}
var html = '';
html += '<!-- automagically injected-->';
html += '<!DOCTYPE html>';
html += '<html>';
html += '<head>';
//html += '<script>alert("test");</script>';
html += '</head>';
html += '<body>';
html += 'hello world';
html += '</body>';
html += '</html'>;
document.write(html);
}
document.addEventListener('DOMContentLoaded', injectDOM);
I also tried comparing Object.getOwnPropertyNames(window) before and after the dom was loaded, but something tells me its not the best way to do it.
Update: I managed to solve the problem more elgantly with #wOxxOm's help. I posted my solution in the original github issue https://github.com/electron/electron/issues/5036

How can I load a DOM from a string in PhantomJS?

Most of the examples I have found on the web involve loading a URL.
However, if I simply have a string that contains an svg or html and I want to load it into a dom for manipulation, I cannot figure out how to manipulate it.
var fs=require('fs')
var content = fs.read("EarlierSavedPage.svg")
// How do I load content into a DOM?
I realize that, in this example where is a local file is being read, there is a workaround for reading the local file directly, but I am interested more generally in whether a page can be loaded from a string.
I have already looked at the documentation but did not see anything obvious.
The default page in PhantomJS is a comparable to about:blank and is essentially
<html>
<body>
</body>
</html>
It means that you can directly add your svg to the DOM to and render it. It seems that you have to render it asynchronously to give the browser time to actually compute the svg. Here is a complete script:
var page = require('webpage').create(),
fs = require('fs')
var content = fs.read("EarlierSavedPage.svg")
page.evaluate(function(content){
document.body.innerHTML = content;
}, content);
setTimeout(function(){
page.render("EarlierSavedPage.png"); // render or do whatever
phantom.exit();
}, 0); // phantomjs is single threaded so you need to do this asynchronously, but immediately
When you load an HTML file into content, then you can directly assign it to the current DOM (to page.content):
page.content = content;
This would likely also need some asynchronous decoupling like above.
The other way would be to actually load the HTML file with page.open:
page.open(filePathToHtmlFile, function(success){
// do something like render
phantom.exit();
});

Multiple HTML DOMs - Parse and Transfer Data

I am requesting full HTML5 documents via Ajax using jQuery. I want to be able to parse them and transfer elements to my main page DOM, ideally with all major browsers, including mobile. I don't want to create an iframe as I want the process to be as quick as possible. With Chrome & Firefox I can do the following:
var contents = $(document.createElement('html'));
contents[0].innerHTML = data; // data : HTML document string
This will create a proper document, somewhat surprisingly, just without a doctype. In IE9, however, one may not use the innerHTML to set the contents of the html element. I tried to do the following, without any luck:
Create a DOM, open it, write to it and close it. Issue: on doc.open, IE9 throws an exception called Unspecified error..
var doc = document.implementation.createHTMLDocument('');
doc.open();
doc.write(data);
doc.close();
Create an ActiveX DOM. This time, the result is better but upon transferring / copying elements between documents IE9 crashes. Bad because no IE8 support (adoptNode / importNode support).
var doc = new ActiveXObject('htmlfile');
doc.open();
doc.write(data);
doc.close();
contents = $(doc.documentElement);
document.adoptNode(contents);
I was thinking about recursively recreating the elements, instead of transferring them between my documents, but that seems like an expensive task, given that I can have a lot nodes to transfer. I like my last ActiveX example as that will most likely work in IE8 and earlier (for parsing, at least).
Any ideas on this? Again, not only I need to be able to parse the head and body, but I also need to be able to append these new elements to my main dom.
Thanks much!
Answering my own question... To solve my issue I used all solutions mentioned in my post, with try/catch blocks if a browser throws an error (oh, how we love thee IE!). The following works in IE8, IE9, Chrome 23, Firefox 17, iOS 4 and 5, Android 3 & 4. I have not tested Android 2.1-2.3 and IE7.
var contents = $('');
try {
contents = $(document.createElement('html'));
contents[0].innerHTML = data;
}
catch(e) {
try {
var doc = document.implementation.createHTMLDocument('');
doc.open();
doc.write(data);
doc.close();
contents = $(doc.documentElement);
}
catch(e) {
var doc = new ActiveXObject('htmlfile');
doc.open();
doc.write(data);
doc.close();
contents = $(doc.documentElement);
}
}
At this point we can find elements using jQuery. Transferring them to a different DOM creates a bit of a problem. There are a couple of methods that do this, but they are not widely supported yet (importNode & adoptNode) and/or are buggy. Given that our selector string is called 'selector', below I re-created the found elements and append them to '.someDiv'.
var fnd = contents.find(selector);
if(fnd.length) {
var newSelection = $('');
fnd.each(function() {
var n = document.createElement(this.tagName);
var attr = $(this).prop('attributes');
n.innerHTML = this.innerHTML;
$.each(attr,function() { $(n).attr(this.name, this.value); });
newSelection.push(n);
});
$('.someDiv').append(newSelection);
};

How to use location object to parse URL without redirecting the page in javascript?

The browser has a very efficient URL parser that let you get location.href, hash, query, etc from the current URL. I'd like to use it instead of coding something using regexes.
If you set location.href or do location.replace(url), the page gets redirected in Chrome. I tried to get the prototype of location in this browser, but I can't find location.prototype. There is a location.__proto__ which is described as the Location class in the js console, but I can't find a way to instantiate it. Plus, I need a cross browser solution and __proto__ is not available in IE.
If it's not possible, don't give me a regex alternative, just tell me the hard truth, provided you can back it up with evidences.
Yes, it's very much possible! If you create a new a object, you can use the location fields without redirecting the browser.
For instance:
var a = document.createElement('a');
a.href = "http://openid.neosmart.net/mqudsi#fake"
You can now access .hash, .pathname, .host, and all the other location goodies!
> console.log(a.host);
openid.neosmart.net
I wrote a generalized version of the wonderful Mahmoud solution:
var parseUrl = (function(){
var div = document.createElement('div');
div.innerHTML = "<a></a>";
return function(url){
div.firstChild.href = url;
div.innerHTML = div.innerHTML;
return div.firstChild;
};
})();
It works that way:
var url = parseUrl('http://google.com');
var url = zerobin.parseUrl('http://google.com');
console.log(url.protocol);
"http:"
console.log(url.host);
"google.com"
The parseUrl code is a bit complicated because IE requires the link HTML code to be processed by its HTML parser if you want it to parse the URL. So we create a closure in which we store a <div> with a <a> as child (avoid recreating it a each call), and when we need URL parsing, we just take the HTML of div, and inject it back to itself, forcing IE to parse it.

Categories

Resources