Parse a HTML String with JS without triggering any page loads?

Parse a HTML String with JS without triggering any page loads? - javascript

As this answer indicates, a good way to parse HTML in JavaScript is to simply re-use the browser's HTML-parsing capabilities like so:
var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
// process 'el' as desired
However, this triggers loading extra pages for certain HTML strings, for example:
var foo = document.createElement('div')
foo.innerHTML = '<img src="http://example.com/img.png">';
As soon as this example is run, the browser attempts to load the page:
How might I process HTML from JavaScript without this behavior?

I don't know if there is a perfect solution for this, but since this is merely for processing, you can before assigning innerHTMl replace all src attributes to be notSrc="xyz.com", this way it wont be loaded, and if you need them later in processing you can account for this.
The browser mainly will load images, scripts, and css files, this will fix the first 2, the css can be done by replacing the href attribute.

If you want to parse HTML response without loading any unnecessary resources like images or scripts inside, use DOMImplementation’s createHTMLDocument() to create new document which is not connected to the current one parsed by the browser and behaves as well as normal document.

Related

Why doesn't it format the javascript code? [duplicate]

In tutorials I've learnt to use document.write. Now I understand that by many this is frowned upon. I've tried print(), but then it literally sends it to the printer.
So what are alternatives I should use, and why shouldn't I use document.write? Both w3schools and MDN use document.write.

The reason that your HTML is replaced is because of an evil JavaScript function: document.write().
It is most definitely "bad form." It only works with webpages if you use it on the page load; and if you use it during runtime, it will replace your entire document with the input. And if you're applying it as strict XHTML structure it's not even valid code.
the problem:
document.write writes to the document stream. Calling document.write on a closed (or loaded) document automatically calls document.open which will clear the document.
-- quote from the MDN
document.write() has two henchmen, document.open(), and document.close(). When the HTML document is loading, the document is "open". When the document has finished loading, the document has "closed". Using document.write() at this point will erase your entire (closed) HTML document and replace it with a new (open) document. This means your webpage has erased itself and started writing a new page - from scratch.
I believe document.write() causes the browser to have a performance decrease as well (correct me if I am wrong).
an example:
This example writes output to the HTML document after the page has loaded. Watch document.write()'s evil powers clear the entire document when you press the "exterminate" button:
I am an ordinary HTML page. I am innocent, and purely for informational purposes. Please do not <input type="button" onclick="document.write('This HTML page has been succesfully exterminated.')" value="exterminate"/>
me!
the alternatives:
.innerHTML This is a wonderful alternative, but this attribute has to be attached to the element where you want to put the text.
Example: document.getElementById('output1').innerHTML = 'Some text!';
.createTextNode() is the alternative recommended by the W3C.
Example: var para = document.createElement('p');
para.appendChild(document.createTextNode('Hello, '));
NOTE: This is known to have some performance decreases (slower than .innerHTML). I recommend using .innerHTML instead.
the example with the .innerHTML alternative:
I am an ordinary HTML page.
I am innocent, and purely for informational purposes.
Please do not
<input type="button" onclick="document.getElementById('output1').innerHTML = 'There was an error exterminating this page. Please replace <code>.innerHTML</code> with <code>document.write()</code> to complete extermination.';" value="exterminate"/>
me!
<p id="output1"></p>

Here is code that should replace document.write in-place:
document.write=function(s){
var scripts = document.getElementsByTagName('script');
var lastScript = scripts[scripts.length-1];
lastScript.insertAdjacentHTML("beforebegin", s);
}

You can combine insertAdjacentHTML method and document.currentScript property.
The insertAdjacentHTML() method of the Element interface parses the specified text as HTML or XML and inserts the resulting nodes into the DOM tree at a specified position:
'beforebegin': Before the element itself.
'afterbegin': Just inside the element, before its first child.
'beforeend': Just inside the element, after its last child.
'afterend': After the element itself.
The document.currentScript property returns the <script> element whose script is currently being processed. Best position will be beforebegin — new HTML will be inserted before <script> itself. To match document.write's native behavior, one would position the text afterend, but then the nodes from consecutive calls to the function aren't placed in the same order as you called them (like document.write does), but in reverse. The order in which your HTML appears is probably more important than where they're place relative to the <script> tag, hence the use of beforebegin.
document.currentScript.insertAdjacentHTML(
'beforebegin',
'This is a document.write alternative'
)

As a recommended alternative to document.write you could use DOM manipulation to directly query and add node elements to the DOM.

Just dropping a note here to say that, although using document.write is highly frowned upon due to performance concerns (synchronous DOM injection and evaluation), there is also no actual 1:1 alternative if you are using document.write to inject script tags on demand.
There are a lot of great ways to avoid having to do this (e.g. script loaders like RequireJS that manage your dependency chains) but they are more invasive and so are best used throughout the site/application.

I fail to see the problem with document.write. If you are using it before the onload event fires, as you presumably are, to build elements from structured data for instance, it is the appropriate tool to use. There is no performance advantage to using insertAdjacentHTML or explicitly adding nodes to the DOM after it has been built. I just tested it three different ways with an old script I once used to schedule incoming modem calls for a 24/7 service on a bank of 4 modems.
By the time it is finished this script creates over 3000 DOM nodes, mostly table cells. On a 7 year old PC running Firefox on Vista, this little exercise takes less than 2 seconds using document.write from a local 12kb source file and three 1px GIFs which are re-used about 2000 times. The page just pops into existence fully formed, ready to handle events.
Using insertAdjacentHTML is not a direct substitute as the browser closes tags which the script requires remain open, and takes twice as long to ultimately create a mangled page. Writing all the pieces to a string and then passing it to insertAdjacentHTML takes even longer, but at least you get the page as designed. Other options (like manually re-building the DOM one node at a time) are so ridiculous that I'm not even going there.
Sometimes document.write is the thing to use. The fact that it is one of the oldest methods in JavaScript is not a point against it, but a point in its favor - it is highly optimized code which does exactly what it was intended to do and has been doing since its inception.
It's nice to know that there are alternative post-load methods available, but it must be understood that these are intended for a different purpose entirely; namely modifying the DOM after it has been created and memory allocated to it. It is inherently more resource-intensive to use these methods if your script is intended to write the HTML from which the browser creates the DOM in the first place.
Just write it and let the browser and interpreter do the work. That's what they are there for.
PS: I just tested using an onload param in the body tag and even at this point the document is still open and document.write() functions as intended. Also, there is no perceivable performance difference between the various methods in the latest version of Firefox. Of course there is a ton of caching probably going on somewhere in the hardware/software stack, but that's the point really - let the machine do the work. It may make a difference on a cheap smartphone though. Cheers!

The question depends on what you are actually trying to do.
Usually, instead of doing document.write you can use someElement.innerHTML or better, document.createElement with an someElement.appendChild.
You can also consider using a library like jQuery and using the modification functions in there: http://api.jquery.com/category/manipulation/

This is probably the most correct, direct replacement: insertAdjacentHTML.

Try to use getElementById() or getElementsByName() to access a specific element and then to use innerHTML property:
<html>
<body>
<div id="myDiv1"></div>
<div id="myDiv2"></div>
</body>
<script type="text/javascript">
var myDiv1 = document.getElementById("myDiv1");
var myDiv2 = document.getElementById("myDiv2");
myDiv1.innerHTML = "<b>Content of 1st DIV</b>";
myDiv2.innerHTML = "<i>Content of second DIV element</i>";
</script>
</html>

Use
var documentwrite =(value, method="", display="")=>{
switch(display) {
case "block":
var x = document.createElement("p");
break;
case "inline":
var x = document.createElement("span");
break;
default:
var x = document.createElement("p");
}
var t = document.createTextNode(value);
x.appendChild(t);
if(method==""){
document.body.appendChild(x);
}
else{
document.querySelector(method).appendChild(x);
}
}
and call the function based on your requirement as below
documentwrite("My sample text"); //print value inside body
documentwrite("My sample text inside id", "#demoid", "block"); // print value inside id and display block
documentwrite("My sample text inside class", ".democlass","inline"); // print value inside class and and display inline

I'm not sure if this will work exactly, but I thought of
var docwrite = function(doc) {
document.write(doc);
};
This solved the problem with the error messages for me.

What's wrong with document.write? What's a viable alternative? [duplicate]

In tutorials I've learnt to use document.write. Now I understand that by many this is frowned upon. I've tried print(), but then it literally sends it to the printer.
So what are alternatives I should use, and why shouldn't I use document.write? Both w3schools and MDN use document.write.

The reason that your HTML is replaced is because of an evil JavaScript function: document.write().
It is most definitely "bad form." It only works with webpages if you use it on the page load; and if you use it during runtime, it will replace your entire document with the input. And if you're applying it as strict XHTML structure it's not even valid code.
the problem:
document.write writes to the document stream. Calling document.write on a closed (or loaded) document automatically calls document.open which will clear the document.
-- quote from the MDN
document.write() has two henchmen, document.open(), and document.close(). When the HTML document is loading, the document is "open". When the document has finished loading, the document has "closed". Using document.write() at this point will erase your entire (closed) HTML document and replace it with a new (open) document. This means your webpage has erased itself and started writing a new page - from scratch.
I believe document.write() causes the browser to have a performance decrease as well (correct me if I am wrong).
an example:
This example writes output to the HTML document after the page has loaded. Watch document.write()'s evil powers clear the entire document when you press the "exterminate" button:
I am an ordinary HTML page. I am innocent, and purely for informational purposes. Please do not <input type="button" onclick="document.write('This HTML page has been succesfully exterminated.')" value="exterminate"/>
me!
the alternatives:
.innerHTML This is a wonderful alternative, but this attribute has to be attached to the element where you want to put the text.
Example: document.getElementById('output1').innerHTML = 'Some text!';
.createTextNode() is the alternative recommended by the W3C.
Example: var para = document.createElement('p');
para.appendChild(document.createTextNode('Hello, '));
NOTE: This is known to have some performance decreases (slower than .innerHTML). I recommend using .innerHTML instead.
the example with the .innerHTML alternative:
I am an ordinary HTML page.
I am innocent, and purely for informational purposes.
Please do not
<input type="button" onclick="document.getElementById('output1').innerHTML = 'There was an error exterminating this page. Please replace <code>.innerHTML</code> with <code>document.write()</code> to complete extermination.';" value="exterminate"/>
me!
<p id="output1"></p>

Here is code that should replace document.write in-place:
document.write=function(s){
var scripts = document.getElementsByTagName('script');
var lastScript = scripts[scripts.length-1];
lastScript.insertAdjacentHTML("beforebegin", s);
}

You can combine insertAdjacentHTML method and document.currentScript property.
The insertAdjacentHTML() method of the Element interface parses the specified text as HTML or XML and inserts the resulting nodes into the DOM tree at a specified position:
'beforebegin': Before the element itself.
'afterbegin': Just inside the element, before its first child.
'beforeend': Just inside the element, after its last child.
'afterend': After the element itself.
The document.currentScript property returns the <script> element whose script is currently being processed. Best position will be beforebegin — new HTML will be inserted before <script> itself. To match document.write's native behavior, one would position the text afterend, but then the nodes from consecutive calls to the function aren't placed in the same order as you called them (like document.write does), but in reverse. The order in which your HTML appears is probably more important than where they're place relative to the <script> tag, hence the use of beforebegin.
document.currentScript.insertAdjacentHTML(
'beforebegin',
'This is a document.write alternative'
)

As a recommended alternative to document.write you could use DOM manipulation to directly query and add node elements to the DOM.

Just dropping a note here to say that, although using document.write is highly frowned upon due to performance concerns (synchronous DOM injection and evaluation), there is also no actual 1:1 alternative if you are using document.write to inject script tags on demand.
There are a lot of great ways to avoid having to do this (e.g. script loaders like RequireJS that manage your dependency chains) but they are more invasive and so are best used throughout the site/application.

The question depends on what you are actually trying to do.
Usually, instead of doing document.write you can use someElement.innerHTML or better, document.createElement with an someElement.appendChild.
You can also consider using a library like jQuery and using the modification functions in there: http://api.jquery.com/category/manipulation/

This is probably the most correct, direct replacement: insertAdjacentHTML.

Try to use getElementById() or getElementsByName() to access a specific element and then to use innerHTML property:
<html>
<body>
<div id="myDiv1"></div>
<div id="myDiv2"></div>
</body>
<script type="text/javascript">
var myDiv1 = document.getElementById("myDiv1");
var myDiv2 = document.getElementById("myDiv2");
myDiv1.innerHTML = "<b>Content of 1st DIV</b>";
myDiv2.innerHTML = "<i>Content of second DIV element</i>";
</script>
</html>

Use
var documentwrite =(value, method="", display="")=>{
switch(display) {
case "block":
var x = document.createElement("p");
break;
case "inline":
var x = document.createElement("span");
break;
default:
var x = document.createElement("p");
}
var t = document.createTextNode(value);
x.appendChild(t);
if(method==""){
document.body.appendChild(x);
}
else{
document.querySelector(method).appendChild(x);
}
}
and call the function based on your requirement as below
documentwrite("My sample text"); //print value inside body
documentwrite("My sample text inside id", "#demoid", "block"); // print value inside id and display block
documentwrite("My sample text inside class", ".democlass","inline"); // print value inside class and and display inline

I'm not sure if this will work exactly, but I thought of
var docwrite = function(doc) {
document.write(doc);
};
This solved the problem with the error messages for me.

Add tags to <head> reliably in Javascript

I am writing a program that does the following:
Creates an iframe in the DOM
Makes an AJAX request to a page (a site's main page)
If the page has changed, I use iframe.srcdoc = contents; to the iframe, where contents is what came back from AJAX
Note that this way any image etc. with a relative URL specified will not render correctly. To make it look right, I have to add a <base> tag to <head>.
I am very reluctant to use regexp like this:
contents = contents.replace('<head>','<head><base href="http://www.example.com/">');
Because it might stuff things up (but, am I being way too overcautious and over-paranoid?).
NOTE: I cannot do this by manipulating DOM: if I do iframe.srcdoc = contents; and then add the <base> tag, the page will still render incorrectly. The <base> tag needs to be there before I assign it to iframe.srcdoc...
How would you go about this?
Merc.

Use appendChild DOM operation to add the element.
document.getElementsByTagName('head')[0].appendChild('<base href="http://www.site.com" />');

In my holy opinion using appendChild with string-values it not the best idea, so here is my approach.
// create new "base"-node
var node = document.createElement('base');
// set href="http://www.site.com"
node.setAttribute('href', 'http://www.site.com');
// append new "base"-node to first "head"-node in html-document
document.getElementsByTagName('head')[0].appendChild(node);
see W3-School The HTML DOM (Document Object Model) for details about DOM-Manipulation, DOM-Understanding and Javascript-Reference.
Solution for inject "base"-tag with string-manipuation (kind of "non-dom-offline") is Regex to prepend base-tag before closing head-tag.
contents = contents.replace(/<\/head>/ig, '<base href="http://www.site.com" />$&');
An other solution can is using jQuery to construct an "offline-DOM" of the contents of the iframe and using DOM-Manipulation-Methods.
contents = jQuery(contents).find('head:first').append('<base ... />').html()
// no guarantee here that this will work ;-) it was just out of my mind, but should work.

Is it possible to reliably insert a HTML element at script's location?

I'm writing a Javascript file which will be a component in a webpage. I'd like it to be simple to use - just reference the script file in your page, and it is there. To that end however there is a complication - where should the HTML go that the Javascript generates? One approach would be to require a placeholder element in the page with a fixed ID or class or something. But that's an extra requirement. It would be better if the HTML was generated at the location that the script is placed (or, at the start of body, if the script is placed in head). Also, for extra customizability, if the fixed ID was found, the HTML would be placed inside that placeholder.
So I'm wondering - how do I detect my script's location in the page? And how do I place HTML there? document.write() comes to mind, but that is documented as being pretty unreliable. Also it doesn't help if the script is in the head. Not to mention what happens if my script is loaded dynamically via some AJAX call, but I suppose that can be left as an unsupported scenario.

I am doing that with this code...
// This is for Firefox only at the moment.
var thisScriptElement = document.currentScript,
// Generic `a` element for exploiting its ability to return `pathname`.
a = document.createElement('a');
if ( ! thisScriptElement) {
// Iterate backwards, to look for our script.
var scriptElements = document.body.getElementsByTagName('script'),
i = scriptElements.length;
while (i--) {
if ( ! scriptElements[i].src) {
continue;
}
a.href = scriptElements[i].src;
if (a.pathname.replace(/^.*\//, '') == 'name-of-your-js-code.js') {
thisScriptElement = scriptElements[i];
break;
}
}
}
Then, to add your element, it's simple as...
currentScript.parentNode.insertBefore(newElement, currentScript);
I simply add a script element anywhere (and multiple times if necessary) in the body element to include it...
<script type="text/javascript" src="somewhere/name-of-your-js-code.js?"></script>
Ensure the code runs as is, not in DOM ready or window's load event.
Basically, we first check for document.currentScript, which is Firefox only but still useful (if it becomes standardised and/or other browsers implement it, it should be most reliable and fastest).
Then I create a generic a element to exploit some of its functionality, such as extracting the path portion of the href.
I then iterate backwards over the script elements (because in parse order the last script element should be the currently executing script), comparing the filename to what we know ours is called. You may be able to skip this, but I am doing this to be safe.

document.write is very reliable if used as you indicate (a default SharePoint 2010 page uses it 6 times). If placed in the head, it will write content to immediately after the body element. The trick is to build a single string of HTML and write it in one go, don't write snippets of half-formed HTML.
An alternative is to use document.getElementsByTagName('script') while the document is loading and assume the the last one is the current script element. Then you can look at the parent and if it's the head, use the load or DOM ready event to add your elements after the body. Otherwise, just add it before or after the script element as appropriate.

Javascript execution order

I want to give a static javascript block of code to a html template designer, which can be:
either inline or external or both
used once or more in the html template
and each block can determine its position in the template relative to the other javascript code blocks.
An example could be image banners served using javascript. I give code to template designer who places it in two places, once for a horizontal banner in the header and once for a vertical banner. The same code runs in both blocks but knowing their positions can determine if to serve a horizontal or a vertical image banner.
Make sense?
Another example: Say you have the same 2 javascript tags in a web page calling an external script on a server. Can the server and/or scripts determine which javascript tag it belongs to?
NOTE: Can we say this is a challenge? I know that I can avoid this puzzle very easily but I come across this on a regular basis.

JavaScript code can locate all <script> elements on the page and it can probably examine the attributes and the content to check from which element it came from. But that's probably not what you want.
What you want is a piece of JavaScript which replaces tags on the page with ad banners. The usual solution is to add a special element, say a IMG, for this and give that IMG an id or a class or maybe even a custom attribute (like adtype="vertical") and then use JavaScript to locate these elements and replace the content by changing the src attribute.
For example, using jQuery, you can should your images like so:
<img src="empty.gif" width="..." height="..." class="ad" adtype="..." />
Then you can locate each image with
$('img.ad')
[EDIT] Well, the server obviously knows which script belongs into which script tag because it inserts the script. So this is a no-brainer.
If the script wants to find out where it is in the DOM, add something which it can use to identify itself, say:
<script>var id= '329573485745';
Then you can walk all script tags and check which one contains the value of the variable id.
If you call an external script, then you can do the same but you must add the ID to the script tag as you emit the HTML:
<script id="329573485745" src="..." />
Then the external script can examine the DOM and lookup the element with this id. You will want to use an UUID for this, btw.
This way, a piece of JS can locate the script tag which added itself to the page.

Best thing would probably be to make an insert once function, and then have him insert only the function call where needed.
Like this:
timescalled=0
function buildad(){
var toinsert="" //Code to generate the desired piece of HTML
document.write(toinsert)
timescalled+=1 //So you can tell how many times the function have been called
}
Now a script block calling the function can simply be inserted wherever a banner is needed
<script type="text/javascript">buildad()</script>

Thanks for the tips everyone but I'll be answering my own question.
I figured out several ways of accomplishing the task and I give you the one which works nicely and is easy to understand.
The following chunk of code relies on outputting dummy divs and jQuery.
<script>
// Unique identifier for all dummy divs
var rnd1="_0xDEFEC8ED_";
// Unique identifier for this dummy div
var rnd2=Math.floor(Math.random()*999999);
// The dummy div
var d="<div class='"+rnd1+" "+rnd2+"'></div>";
// Script which :
// Calculates index of THIS dummy div
// Total dummy divs
// Outputs to dummy div for debugging
var f1="<script>$(document).ready(function(){";
var f2="var i=$('."+rnd1+"').index($('."+rnd2+"'))+1;";
var f3="var t=$('."+rnd1+"').length;";
var f4="$('."+rnd2+"').html(i+' / '+t);";
var f5="});<\/script>";
document.write(d+f1+f2+f3+f4+f5);
</script>

Why not not just place the function call on the page instead of the entire code block? This way you can pass in a parameter to tell it what type of advertisement is needed?
BuildAd('Tower');
BuildAd('Banner');
Javascript itself has no clue of it's position in a page. You have to target a control on the page to get it's location.

I don't think it is possible for JavaScript code to know where it was loaded from. It certainly doesn't run at the point it is found, since execution isn't directly tied to the loading process (code usually runs after the whole DOM is loaded). In fact, in the case of externals, it doesn't even make sense, since only one copy of the code will be loaded no matter how many times it is encountered.

It shouldn't be the same code for each banner - there will be a parameter passed to whatever is serving the image banner which will specify the intended size.
Can you give a specific example of what you need this for?
To edit for your recent example: The simple answer is no. I could help you approach the problem from a different direction if you post details of your problem

The term "static block of code" leaves a lot of room for interpretation.
Inline scripts (e.g., ones that rely on document.write and so must be parsed and executed during the HTML parsing phase) cannot tell where they are in the DOM at runtime. You have to tell them (as in one of the first answers you got).
I think you'll probably find that you need to change your approach.
A common way to keep code and markup separate (which is useful when providing tools to HTML designers who aren't coders) is to have them use a script tag like so:
<script defer async type='text/javascript' src='pagestuff.js'></script>
...which then triggers itself when the page is loaded (using window.onload if necessary, but there are several techniques for being triggered earlier than that, which you want because window.onload doesn't trigger until the images have all loaded).
That script then looks for markers in the markup and manipulates the page accordingly. For instance (this example uses Prototype, but you can do the same with raw JavaScript, jQuery, Closure, etc.):
document.observe("dom:loaded", initPage);
function initPage() {
var verticals = $$('div.vertical');
/* ...do something with the array of "vertical" divs in `verticals`,
such as: */
var index;
for (index = 0; index < verticals.length; ++index) {
vertical.update("I'm vertical #" + index);
}
}
The designers can then have blocks on the page that are filled in by code which they flag up in a way that's normal for them (classes or attributes, etc.). The code figures out what it should do based on the classes/attributes of the blocks it finds when it runs.

Develop Reference

JavaScript is the programming language of the Web.