Dynamically inserting javascript into HTML that uses document.write

Dynamically inserting javascript into HTML that uses document.write - javascript

I am currently loading a lightbox style popup that loads it's HTML from an XHR call. This content is then displayed in a 'modal' popup using element.innerHTML = content This works like a charm.
In another section of this website I use a Flickr 'badge' (http://www.elliotswan.com/2006/08/06/custom-flickr-badge-api-documentation/) to load flickr images dynamically. This is done including a script tag that loads a flickr javascript, which in turn does some document.write statments.
Both of them work perfectly when included in the HTML. Only when loading the flickr badge code inside the lightbox, no content is rendered at all. It seems that using innerHTML to write document.write statements is taking it a step too far, but I cannot find any clue in the javascript implementations (FF2&3, IE6&7) of this behavior.
Can anyone clarify if this should or shouldn't work? Thanks.

In general, script tags aren't executed when using innerHTML. In your case, this is good, because the document.write call would wipe out everything that's already in the page. However, that leaves you without whatever HTML document.write was supposed to add.
jQuery's HTML manipulation methods will execute scripts in HTML for you, the trick is then capturing the calls to document.write and getting the HTML in the proper place. If it's simple enough, then something like this will do:
var content = '';
document.write = function(s) {
content += s;
};
// execute the script
$('#foo').html(markupWithScriptInIt);
$('#foo .whereverTheDocumentWriteContentGoes').html(content);
It gets complicated though. If the script is on another domain, it will be loaded asynchronously, so you'll have to wait until it's done to get the content. Also, what if it just writes the HTML into the middle of the fragment without a wrapper element that you can easily select? writeCapture.js (full disclosure: I wrote it) handles all of these problems. I'd recommend just using it, but at the very least you can look at the code to see how it handles everything.
EDIT: Here is a page demonstrating what sounds like the effect you want.

I created a simple test page that illustrates the problem:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<title>Document Write Testcase</title>
</head>
<body>
<div id="container">
</div>
<div id="container2">
</div>
<script>
// This doesn't work!
var container = document.getElementById('container');
container.innerHTML = "<script type='text/javascript'>alert('foo');document.write('bar');<\/script>";
// This does!
var container2 = document.getElementById('container2');
var script = document.createElement("script");
script.type = 'text/javascript';
script.innerHTML = "alert('bar');document.write('foo');";
container.appendChild(script);
</script>
</body>
</html>
This page alerts 'bar' and prints 'foo', while I expected it to also alert 'foo' and print 'bar'. But, unfortunately, since the script tag is part of a larger HTML page, I cannot single out that tag and append it like the example above. Well, I can, but that would require scanning innerHTML content for script tags, and replacing them in the string by placeholders, and then inserting them using the DOM. Sounds not that trivial.

Use document.writeln(content); instead of document.write(content).
However, the better method is using the concatenation of innerHTML, like this:
element.innerHTML += content;
The element.innerHTML = content; method will replace the old content with the new one, which will overwrite your element's innerHTML!
Whereas using the the += operator in element.innerHTML += content will append your text after the old content. (similar to what document.write does.)

document.write is about as deprecated as they come. Thanks to the wonders of JavaScript, though, you can just assign your own function to the write method of the document object which uses innerHTML on an element of your choosing to append the supplied content.

Can I get some clarification first to make sure I get the problem?
document.write calls will add content to the markup at the point in the markup at which they occur. For example if you include document.write calls in a function but call the function elsewhere, the document.write output will happen at the point in the markup the function is defined not where it is called.
Therefore for this to work at all the Flickr document.write statements will need to be part of the content in element.innerHTML = content. Is this definitely the case?
You might quickly test if this should work at all by adding a single and simple document.write call in the content that is set as the innerHTML and see what this does:
<script>
var content = "<p>1st para</p><script>document.write('<p>2nd para</p>');</script>"
element.innerHTML = content;
</script>
If that works, the concept of document.write working in content set as the innerHTML of an element might just work.
My gut feeling is that it won't work, but it should be pretty straightforward to test the concept.

So you're using a DOM method to create a script element and append that to an existing element and this then causes the content of the appended script element to execute? That sounds good.
You say that the script tag is part of a larger HTML page and therefore cannot be singled out. Can you not give the script tag an ID and target it? I'm probably missing something obvious here.

In theory, yes, I can single out a script tag that way. The problem is that we potentially have dozens of situations where this occurs, so I am trying to find some cause or documentation of this behavior.
Also, the script tag does not seem to be a part of the DOM anymore after it gets loaded. In our environment, my container div remains empty, so I cannot fetch the script tag. It should work, though, because in my example above the script does not get executed, but is still part of the DOM.

Related

Why doesn't it format the javascript code? [duplicate]

In tutorials I've learnt to use document.write. Now I understand that by many this is frowned upon. I've tried print(), but then it literally sends it to the printer.
So what are alternatives I should use, and why shouldn't I use document.write? Both w3schools and MDN use document.write.

The reason that your HTML is replaced is because of an evil JavaScript function: document.write().
It is most definitely "bad form." It only works with webpages if you use it on the page load; and if you use it during runtime, it will replace your entire document with the input. And if you're applying it as strict XHTML structure it's not even valid code.
the problem:
document.write writes to the document stream. Calling document.write on a closed (or loaded) document automatically calls document.open which will clear the document.
-- quote from the MDN
document.write() has two henchmen, document.open(), and document.close(). When the HTML document is loading, the document is "open". When the document has finished loading, the document has "closed". Using document.write() at this point will erase your entire (closed) HTML document and replace it with a new (open) document. This means your webpage has erased itself and started writing a new page - from scratch.
I believe document.write() causes the browser to have a performance decrease as well (correct me if I am wrong).
an example:
This example writes output to the HTML document after the page has loaded. Watch document.write()'s evil powers clear the entire document when you press the "exterminate" button:
I am an ordinary HTML page. I am innocent, and purely for informational purposes. Please do not <input type="button" onclick="document.write('This HTML page has been succesfully exterminated.')" value="exterminate"/>
me!
the alternatives:
.innerHTML This is a wonderful alternative, but this attribute has to be attached to the element where you want to put the text.
Example: document.getElementById('output1').innerHTML = 'Some text!';
.createTextNode() is the alternative recommended by the W3C.
Example: var para = document.createElement('p');
para.appendChild(document.createTextNode('Hello, '));
NOTE: This is known to have some performance decreases (slower than .innerHTML). I recommend using .innerHTML instead.
the example with the .innerHTML alternative:
I am an ordinary HTML page.
I am innocent, and purely for informational purposes.
Please do not
<input type="button" onclick="document.getElementById('output1').innerHTML = 'There was an error exterminating this page. Please replace <code>.innerHTML</code> with <code>document.write()</code> to complete extermination.';" value="exterminate"/>
me!
<p id="output1"></p>

Here is code that should replace document.write in-place:
document.write=function(s){
var scripts = document.getElementsByTagName('script');
var lastScript = scripts[scripts.length-1];
lastScript.insertAdjacentHTML("beforebegin", s);
}

You can combine insertAdjacentHTML method and document.currentScript property.
The insertAdjacentHTML() method of the Element interface parses the specified text as HTML or XML and inserts the resulting nodes into the DOM tree at a specified position:
'beforebegin': Before the element itself.
'afterbegin': Just inside the element, before its first child.
'beforeend': Just inside the element, after its last child.
'afterend': After the element itself.
The document.currentScript property returns the <script> element whose script is currently being processed. Best position will be beforebegin — new HTML will be inserted before <script> itself. To match document.write's native behavior, one would position the text afterend, but then the nodes from consecutive calls to the function aren't placed in the same order as you called them (like document.write does), but in reverse. The order in which your HTML appears is probably more important than where they're place relative to the <script> tag, hence the use of beforebegin.
document.currentScript.insertAdjacentHTML(
'beforebegin',
'This is a document.write alternative'
)

As a recommended alternative to document.write you could use DOM manipulation to directly query and add node elements to the DOM.

Just dropping a note here to say that, although using document.write is highly frowned upon due to performance concerns (synchronous DOM injection and evaluation), there is also no actual 1:1 alternative if you are using document.write to inject script tags on demand.
There are a lot of great ways to avoid having to do this (e.g. script loaders like RequireJS that manage your dependency chains) but they are more invasive and so are best used throughout the site/application.

I fail to see the problem with document.write. If you are using it before the onload event fires, as you presumably are, to build elements from structured data for instance, it is the appropriate tool to use. There is no performance advantage to using insertAdjacentHTML or explicitly adding nodes to the DOM after it has been built. I just tested it three different ways with an old script I once used to schedule incoming modem calls for a 24/7 service on a bank of 4 modems.
By the time it is finished this script creates over 3000 DOM nodes, mostly table cells. On a 7 year old PC running Firefox on Vista, this little exercise takes less than 2 seconds using document.write from a local 12kb source file and three 1px GIFs which are re-used about 2000 times. The page just pops into existence fully formed, ready to handle events.
Using insertAdjacentHTML is not a direct substitute as the browser closes tags which the script requires remain open, and takes twice as long to ultimately create a mangled page. Writing all the pieces to a string and then passing it to insertAdjacentHTML takes even longer, but at least you get the page as designed. Other options (like manually re-building the DOM one node at a time) are so ridiculous that I'm not even going there.
Sometimes document.write is the thing to use. The fact that it is one of the oldest methods in JavaScript is not a point against it, but a point in its favor - it is highly optimized code which does exactly what it was intended to do and has been doing since its inception.
It's nice to know that there are alternative post-load methods available, but it must be understood that these are intended for a different purpose entirely; namely modifying the DOM after it has been created and memory allocated to it. It is inherently more resource-intensive to use these methods if your script is intended to write the HTML from which the browser creates the DOM in the first place.
Just write it and let the browser and interpreter do the work. That's what they are there for.
PS: I just tested using an onload param in the body tag and even at this point the document is still open and document.write() functions as intended. Also, there is no perceivable performance difference between the various methods in the latest version of Firefox. Of course there is a ton of caching probably going on somewhere in the hardware/software stack, but that's the point really - let the machine do the work. It may make a difference on a cheap smartphone though. Cheers!

The question depends on what you are actually trying to do.
Usually, instead of doing document.write you can use someElement.innerHTML or better, document.createElement with an someElement.appendChild.
You can also consider using a library like jQuery and using the modification functions in there: http://api.jquery.com/category/manipulation/

This is probably the most correct, direct replacement: insertAdjacentHTML.

Try to use getElementById() or getElementsByName() to access a specific element and then to use innerHTML property:
<html>
<body>
<div id="myDiv1"></div>
<div id="myDiv2"></div>
</body>
<script type="text/javascript">
var myDiv1 = document.getElementById("myDiv1");
var myDiv2 = document.getElementById("myDiv2");
myDiv1.innerHTML = "<b>Content of 1st DIV</b>";
myDiv2.innerHTML = "<i>Content of second DIV element</i>";
</script>
</html>

Use
var documentwrite =(value, method="", display="")=>{
switch(display) {
case "block":
var x = document.createElement("p");
break;
case "inline":
var x = document.createElement("span");
break;
default:
var x = document.createElement("p");
}
var t = document.createTextNode(value);
x.appendChild(t);
if(method==""){
document.body.appendChild(x);
}
else{
document.querySelector(method).appendChild(x);
}
}
and call the function based on your requirement as below
documentwrite("My sample text"); //print value inside body
documentwrite("My sample text inside id", "#demoid", "block"); // print value inside id and display block
documentwrite("My sample text inside class", ".democlass","inline"); // print value inside class and and display inline

I'm not sure if this will work exactly, but I thought of
var docwrite = function(doc) {
document.write(doc);
};
This solved the problem with the error messages for me.

What's wrong with document.write? What's a viable alternative? [duplicate]

In tutorials I've learnt to use document.write. Now I understand that by many this is frowned upon. I've tried print(), but then it literally sends it to the printer.
So what are alternatives I should use, and why shouldn't I use document.write? Both w3schools and MDN use document.write.

The reason that your HTML is replaced is because of an evil JavaScript function: document.write().
It is most definitely "bad form." It only works with webpages if you use it on the page load; and if you use it during runtime, it will replace your entire document with the input. And if you're applying it as strict XHTML structure it's not even valid code.
the problem:
document.write writes to the document stream. Calling document.write on a closed (or loaded) document automatically calls document.open which will clear the document.
-- quote from the MDN
document.write() has two henchmen, document.open(), and document.close(). When the HTML document is loading, the document is "open". When the document has finished loading, the document has "closed". Using document.write() at this point will erase your entire (closed) HTML document and replace it with a new (open) document. This means your webpage has erased itself and started writing a new page - from scratch.
I believe document.write() causes the browser to have a performance decrease as well (correct me if I am wrong).
an example:
This example writes output to the HTML document after the page has loaded. Watch document.write()'s evil powers clear the entire document when you press the "exterminate" button:
I am an ordinary HTML page. I am innocent, and purely for informational purposes. Please do not <input type="button" onclick="document.write('This HTML page has been succesfully exterminated.')" value="exterminate"/>
me!
the alternatives:
.innerHTML This is a wonderful alternative, but this attribute has to be attached to the element where you want to put the text.
Example: document.getElementById('output1').innerHTML = 'Some text!';
.createTextNode() is the alternative recommended by the W3C.
Example: var para = document.createElement('p');
para.appendChild(document.createTextNode('Hello, '));
NOTE: This is known to have some performance decreases (slower than .innerHTML). I recommend using .innerHTML instead.
the example with the .innerHTML alternative:
I am an ordinary HTML page.
I am innocent, and purely for informational purposes.
Please do not
<input type="button" onclick="document.getElementById('output1').innerHTML = 'There was an error exterminating this page. Please replace <code>.innerHTML</code> with <code>document.write()</code> to complete extermination.';" value="exterminate"/>
me!
<p id="output1"></p>

Here is code that should replace document.write in-place:
document.write=function(s){
var scripts = document.getElementsByTagName('script');
var lastScript = scripts[scripts.length-1];
lastScript.insertAdjacentHTML("beforebegin", s);
}

You can combine insertAdjacentHTML method and document.currentScript property.
The insertAdjacentHTML() method of the Element interface parses the specified text as HTML or XML and inserts the resulting nodes into the DOM tree at a specified position:
'beforebegin': Before the element itself.
'afterbegin': Just inside the element, before its first child.
'beforeend': Just inside the element, after its last child.
'afterend': After the element itself.
The document.currentScript property returns the <script> element whose script is currently being processed. Best position will be beforebegin — new HTML will be inserted before <script> itself. To match document.write's native behavior, one would position the text afterend, but then the nodes from consecutive calls to the function aren't placed in the same order as you called them (like document.write does), but in reverse. The order in which your HTML appears is probably more important than where they're place relative to the <script> tag, hence the use of beforebegin.
document.currentScript.insertAdjacentHTML(
'beforebegin',
'This is a document.write alternative'
)

As a recommended alternative to document.write you could use DOM manipulation to directly query and add node elements to the DOM.

Just dropping a note here to say that, although using document.write is highly frowned upon due to performance concerns (synchronous DOM injection and evaluation), there is also no actual 1:1 alternative if you are using document.write to inject script tags on demand.
There are a lot of great ways to avoid having to do this (e.g. script loaders like RequireJS that manage your dependency chains) but they are more invasive and so are best used throughout the site/application.

The question depends on what you are actually trying to do.
Usually, instead of doing document.write you can use someElement.innerHTML or better, document.createElement with an someElement.appendChild.
You can also consider using a library like jQuery and using the modification functions in there: http://api.jquery.com/category/manipulation/

This is probably the most correct, direct replacement: insertAdjacentHTML.

Try to use getElementById() or getElementsByName() to access a specific element and then to use innerHTML property:
<html>
<body>
<div id="myDiv1"></div>
<div id="myDiv2"></div>
</body>
<script type="text/javascript">
var myDiv1 = document.getElementById("myDiv1");
var myDiv2 = document.getElementById("myDiv2");
myDiv1.innerHTML = "<b>Content of 1st DIV</b>";
myDiv2.innerHTML = "<i>Content of second DIV element</i>";
</script>
</html>

Use
var documentwrite =(value, method="", display="")=>{
switch(display) {
case "block":
var x = document.createElement("p");
break;
case "inline":
var x = document.createElement("span");
break;
default:
var x = document.createElement("p");
}
var t = document.createTextNode(value);
x.appendChild(t);
if(method==""){
document.body.appendChild(x);
}
else{
document.querySelector(method).appendChild(x);
}
}
and call the function based on your requirement as below
documentwrite("My sample text"); //print value inside body
documentwrite("My sample text inside id", "#demoid", "block"); // print value inside id and display block
documentwrite("My sample text inside class", ".democlass","inline"); // print value inside class and and display inline

I'm not sure if this will work exactly, but I thought of
var docwrite = function(doc) {
document.write(doc);
};
This solved the problem with the error messages for me.

Append html to jQuery element without running scripts inside the html

I have written some code that takes a string of html and cleans away any ugly HTML from it using jQuery (see an early prototype in this SO question). It works pretty well, but I stumbled on an issue:
When using .append() to wrap the html in a div, all script elements in the code are evaluated and run (see this SO answer for an explanation why this happens). I don't want this, I really just want them to be removed, but I can handle that later myself as long as they are not run.
I am using this code:
var wrapper = $('<div/>').append($(html));
I tried to do it this way instead:
var wrapper = $('<div>' + html + '</div>');
But that just brings forth the "Access denied" error in IE that the append() function fixes (see the answer I referenced above).
I think I might be able to rewrite my code to not require a wrapper around the html, but I am not sure, and I'd like to know if it is possible to append html without running scripts in it, anyway.
My questions:
How do I wrap a piece of unknown html
without running scripts inside it,
preferably removing them altogether?
Should I throw jQuery out the window
and do this with plain JavaScript and
DOM manipulation instead? Would that help?
What I am not trying to do:
I am not trying to put some kind of security layer on the client side. I am very much aware that it would be pointless.
Update: James' suggestion
James suggested that I should filter out the script elements, but look at these two examples (the original first and the James' suggestion):
jQuery("<p/>").append("<br/>hello<script type='text/javascript'>console.log('gnu!'); </script>there")
keeps the text nodes but writes gnu!
jQuery("<p/>").append(jQuery("<br/>hello<script type='text/javascript'>console.log('gnu!'); </script>there").not('script'))`
Doesn't write gnu!, but also loses the text nodes.
Update 2:
James has updated his answer and I have accepted it. See my latest comment to his answer, though.

How about removing the scripts first?
var wrapper = $('<div/>').append($(html).not('script'));
Create the div container
Use plain JS to put html into div
Remove all script elements in the div
Assuming script elements in the html are not nested in other elements:
var wrapper = document.createElement('div');
wrapper.innerHTML = html;
$(wrapper).children().remove('script');
var wrapper = document.createElement('div');
wrapper.innerHTML = html;
$(wrapper).find('script').remove();
This works for the case where html is just text and where html has text outside any elements.

You should remove the script elements:
var wrapper = $('<div/>').append($(html).remove("script"));
Second attempt:
node-validator can be used in the browser:
https://github.com/chriso/node-validator
var str = sanitize(large_input_str).xss();
Alternatively, PHPJS has a strip_tags function (regex/evil based):
http://phpjs.org/functions/strip_tags:535

The scripts in the html kept executing for me with all the simple methods mentioned here, then I remembered jquery has a tool for this (since 1.8), jQuery.parseHTML. There's still a catch, according to the documentation events inside attributes(i.e. <img onerror>) will still run.
This is what I'm using:
var $dom = $($.parseHTML(d));
$dom will be a jquery object with the elements found

what happens when you use javascript to insert a javascript widget?

can anyone explain what happens when you use javascript to insert a javascript based widget?
here's my js code:
var para = document.getElementsByTagName("p");
var cg = document.createElement("div");
cg.setAttribute("class", "twt");
cg.innerHTML='<a href="http://twitter.com/share" class="twitter-share-button"
data-count="vertical" data-via="xah_lee">Tweet</a>
<script type="text/javascript" src="http://platform.twitter.com/widgets.js"></script>';
document.body.insertBefore(cg, para[1]);
it inserts the twitter widget, before the first paragraph. As you can see above, the twitter widget calls for a javascript that shows how many time the page has been tweeted.
doesn't work in Firefox, Chrome, but semi-works in IE8. What should be the expected behavior when this happens? Does the newly inserted js code supposed to execute? If so, how's it differ from if the code is on the page itself?

In order to execute the JS code you insert into a DIV via innerHTML, you need to do something like the following (courtesy of Yuriy Fuksenko at http://www.coderanch.com/t/117983/HTML-JavaScript/Execute-JavaScript-function-present-HTML )
function setAndExecute(divId, innerHTML) {
var div = document.getElementById(divId);
div.innerHTML = innerHTML;
var x = div.getElementsByTagName("script");
for (var i=0;i<x.length;i++) {
eval(x[i].text);
}
}
A slightly more advanced approach is here: http://zeta-puppis.com/2006/03/07/javascript-script-execution-in-innerhtml-the-revenge/ - look for <script> tags, take their content and create a new element into the <head>.

innerHTML does not work to insert script tags (because the linked script, in most browsers, will fail to execute). Really, you should insert the script tag once on the server side and insert only the link at the location of each post (that is, if you are adding this to a blog home page that shows multiple posts, each with their own URLs).
If, for some reason, you decide that you must use one snippet of JavaScript to do it all, at least import the tweet button script in a way that will work, for example, the Google Analytics way or the MediaWiki way (look for the importScriptURI function). (Note that I do not know the specifics of the tweet button, so it might not even work.)

How do I get the original innerHTML source without the Javascript generated contents?

Is it possible to get in some way the original HTML source without the changes made by the processed Javascript? For example, if I do:
<div id="test">
<script type="text/javascript">document.write("hello");</script>
</div>
If I do:
alert(document.getElementById('test').innerHTML);
it shows:
<script type="text/javascript">document.write("hello");</script>hello
In simple terms, I would like the alert to show only:
<script type="text/javascript">document.write("hello");</script>
without the final hello (the result of the processed script).

I don't think there's a simple solution to just "grab original source" as it'll have to be something that's supplied by the browser. But, if you are only interested in doing this for a section of the page, then I have a workaround for you.
You can wrap the section of interest inside a "frozen" script:
<script id="frozen" type="text/x-frozen-html">
The type attribute I just made up, but it will force the browser to ignore everything inside it. You then add another script tag (proper javascript this time) immediately after this one - the "thawing" script. This thawing script will get the frozen script by ID, grab the text inside it, and do a document.write to add the actual contents to the page. Whenever you need the original source, it's still captured as text inside the frozen script.
And there you have it. The downside is that I wouldn't use this for the whole page... (SEO, syntax highlighting, performance...) but it's quite acceptable if you have a special requirement on part of a page.
Edit: Here is some sample code. Also, as #FlashXSFX correctly pointed out, any script tags within the frozen script will need to be escaped. So in this simple example, I'll make up a <x-script> tag for this purpose.
<script id="frozen" type="text/x-frozen-html">
<div id="test">
<x-script type="text/javascript">document.write("hello");</x-script>
</div>
</script>
<script type="text/javascript">
// Grab contents of frozen script and replace `x-script` with `script`
function getSource() {
return document.getElementById("frozen")
.innerHTML.replace(/x-script/gi, "script");
}
// Write it to the document so it actually executes
document.write(getSource());
</script>
Now whenever you need the source:
alert(getSource());
See the demo: http://jsbin.com/uyica3/edit

A simple way is to fetch it form the server again. It will be in the cache most probably. Here is my solution using jQuery.get(). It takes the original uri of the page and loads the data with an ajax call:
$.get(document.location.href, function(data,status,jq) {console.log(data);})
This will print the original code without any javascript. It does not do any error handling!
If don't want to use jQuery to fetch the source, consult the answer to this question: How to make an ajax call without jquery?

Could you send an Ajax request to the same page you're currently on and use the result as your original HTML? This is foolproof given the right conditions, since you are literally getting the original HTML document. However, this won't work if the page changes on every request (with dynamic content), or if, for whatever reason, you cannot make a request to that specific page.

Brute force approach
var orig = document.getElementById("test").innerHTML;
alert(orig.replace(/<\/script>[.\n\r]*.*/i,"</script>"));
EDIT:
This could be better
var orig = document.getElementById("test").innerHTML + "<<>>";
alert(orig.replace( /<\/script>[^(<<>>)]+<<>>/i, "<\/script>"));

If you override document.write to add some identifiers at the beginning and end of everything written to the document by the script, you will be able to remove those writes with a regular expression.
Here's what I came up with:
<script type="text/javascript" language="javascript">
var docWrite = document.write;
document.write = myDocWrite;
function myDocWrite(wrt) {
docWrite.apply(document, ['<!--docwrite-->' + wrt + '<!--/docwrite-->']);
}
</script>
Added your example somewhere in the page after the initial script:
<div id="test">
<script type="text/javascript"> document.write("hello");</script>
</div>
Then I used this to alert what was inside:
var regEx = /<!--docwrite-->(.*?)<!--\/docwrite-->/gm;
alert(document.getElementById('test').innerHTML.replace(regEx, ''));

If you want the pristine document, you'll need to fetch it again. There's no way around that. If it weren't for the document.write() (or similar code that would run during the load process) you could load the original document's innerHTML into memory on load/domready, before you modify it.

I can't think of a solution that would work the way you're asking. The only code that Javascript has access to is via the DOM, which only contains the result after the page has been processed.
The closest I can think of to achieve what you want is to use Ajax to download a fresh copy of the raw HTML for your page into a Javascript string, at which point since it's a string you can do whatever you like with it, including displaying it in an alert box.

A tricky way is using <style> tag for template. So that you do not need rename x-script any more.
console.log(document.getElementById('test').innerHTML);
<style id="test" type="text/html+template">
<script type="text/javascript">document.write("hello");</script>
</style>
But I do not like this ugly solution.

I think you want to traverse the DOM nodes:
var childNodes = document.getElementById('test').childNodes, i, output = [];
for (i = 0; i < childNodes.length; i++)
if (childNodes[i].nodeName == "SCRIPT")
output.push(childNodes[i].innerHTML);
return output.join('');

Develop Reference

JavaScript is the programming language of the Web.