How to understand and view parsing of HTML-embedded JavaScript? - javascript

I want to learn more about XSS, but I can't seem to find good resources on how HTML-embedded JavaScript, like the below code snippet, is parsed.
How can I view in the browser, how this code is parsed? I.e. how many rounds of parsing are performed, how each round transforms the input (e.g. decoding) etc.
<!DOCTYPE html>
<html>
<body>
<button type="button" onclick="setTimeout(() => alert(1), 1000)">Click this!</button>
</body>
</html>
After HTML parsing is performed, that decodes HTML encoded entites, what does the program look like? Does HTML parsing also mess with the onclick attribute?

Your question is fairly broad but seems to generally relate to how browsers render web pages.
Your browser requests the web document and begins interpreting it, reading line by line and building a Object Oriented representation of the Document Object Model (DOM); the DOM interfaces between javascript logic and the html document to dynamically construct the application.
When the browser reaches your button it renders the element with the written attributes; upon parsing the onclick attribute, it sets a event listener to the DOM for the click event and the event listener invokes the defined function asynchronously with the set parameters when the event is detected.
Please update your question if you require clarification on something specific or not addressed.

Related

How to detect JavaScript from html in NodeJs and stop JS rendering

Is there a way we can detect if an html file carries Javascript ? and can we stop rendering Javascript from html, in Node JS ?
I know we can stop the html rendering all together by setting the response content-type from text/html to text/plain. But I'm trying to figure out some way to stop rendering the JS only.
Kindly let me know if it's even possible, Thanks.
I'm guessing you're sending the file to a browser from Node.js (you talked about changing the content type header).
To do this, you'll need to:
Parse the file with an HTML parser (there are a few available for Node.js). Be sure it's one that normalizes input, so that (for instance), xxx is normalized to .... (Thanks Quentin for emphasizing that!)
Using the resulting document model, remove:
any script elements
any onxyz attributes (onclick, onmouseover) on elements
For instance, <div onclick="..." should be changed to <div ....
remove any URL attributes (like href on a elements) that use the javascript: scheme
For instance, <a href="javascript:codeHere()" should be changed to <a href="#" or similar (if you remove href entirely, that works to, but the link will no longer automatically be a tabstop etc.).
(This is where normalization in the parser is important.)
Serialize the resulting document model to HTML and send it to the browser

When does the rendering update in JavaScript?

I was reading html specs concerning web api's , and it basically said that
After the event loop has performed some task from a task queue, it needs to update the rendering (if this is a window event loop)
It is also said that the user agent has some way of telling that updating the render isn't necessary (see point 10.3 "unnecessary rendering" in the link above)
So my question is the following: if, let's say, I have a simple index.html file and only one script file attached to it - index.js
index.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Test</title>
</head>
<body>
Hello StackOverflow
<script src="./index.js"></script>
</body>
</html>
index.js
function add(a,b) {
console.log(a+b);
}
function modifyBackground() {
document.body.style = 'background : red';
}
console.log('Hello world'); // Task 1
add(4,5); // Task 2
modifyBackground(); // Task 3 Will the render be updated only here?
I view it that way - there are 3 tasks combined in my index.js. The first task is the console.log task,it is first added onto the task queue in the event loop.Then there's the add task, and finally the modifyBackground task.
In theory, if the user agent doesn't have a way of knowing if we actually need to update the render, it should update the render after every task.
Will the re-render happen only when I call the modifyBackground() method on the last line? And, if so, how does the user agent tell that the re-render isn't necessary?
P.S. If you want some clarification, I want to know, how does, for example, google chrome's user agent tell that the re-render isn't necesssary.
Will the re-render happen only when I call the modifyBackground()
method on the last line?
It depends on the browser implementation, they do not follow the specs sometimes.
And, if so, how does the user agent tell that the re-render isn't
necessary?
Again, it depends on the browser implementation. You need to understand how a browser rendering works. Here's a great explanation you can check it out: How the browser renders a Web Page. Below is a resume.
There are 3 basic concepts:
DOM
CSSOM
Render Tree.
Document Object Model (DOM)
When the browser reads HTML code, whenever it encounters an HTML element like html, body, div etc., it creates a JavaScript object called as Node. Eventually, all HTML elements will be converted to JavaScript objects. Since every different HTML element has different properties, the Node object will be created from different classes (constructor functions).
CSS Object Model (CSSOM)
After constructing the DOM, the browser reads CSS from all the sources (external, embedded, inline, user-agent, etc.) and construct a CSSOM. CSSOM stands for CSS Object Model which is a Tree Like structure just like DOM.
Each node in this tree contains CSS style information for that particular DOM element. CSSOM, however, does not contain DOM elements which can’t be printed on the screen like , , etc.
Render Tree
Here is the answer to your question. Render-Tree is a tree-like structure constructed by combining DOM and CSSOM trees. The browser has to calculate the layout of each visible element and paint them on the screen, for that browser uses Render-Tree. Hence, unless Render-Tree isn’t constructed, nothing will get printed on the screen.

remove script after load in memory [duplicate]

As the title says, if I remove a script tag from the DOM using:
$('#scriptid').remove();
Does the javascript itself remain in memory or is it cleaned?
Or... am I completely misunderstanding the way in which browsers treat javascript? Which is quite possible.
For those interested in my reason for asking see below:
I am moving some common javascript interactions from static script files into dynamically generated ones in PHP. Which are loaded on demand when a user requires them.
The reason for doing this is in order to move the logic serverside and and run a small script, returned from the server, clientside. Rather than have a large script which contains a huge amount of logic, clientside.
This is a similar approach to what facebook does...
Facebook talks frontend javascript
If we take a simple dialog for instance. Rather than generating the html in javascript, appending it to the dom, then using jqueryUI's dialog widget to load it, I am now doing the following.
Ajax request is made to dialog.php
Server generates html and javascript that is specific to this dialog then encodes them as JSON
JSON is returned to client.
HTML is appended to the <body> then once this is rendered, the javascript is also appended into the DOM.
The javascript is executed automatically upon insertion and the dynamic dialog opens up.
Doing this has reduced the amount of javasript on my page dramatically however I am concerned about clean up of the inserted javascript.
Obviously once the dialog has been closed it is removed from the DOM using jQuery:
$('#dialog').remove();
The javascript is appended with an ID and I also remove this from the DOM via the same method.
However, as stated above, does using jQuery's .remove() actually clean out the javascript from memory or does it simple remove the <script> element from the DOM?
If so, is there any way to clean this up?
No. Once a script is loaded, the objects and functions it defines are kept in memory. Removing a script element does not remove the objects it defines. This is in contrast to CSS files, where removing the element does remove the styles it defines. That's because the new styles can easily be reflowed. Can you imagine how hard it would be to work out what a script tag created and how to remove it?
EDIT: However, if you have a file that defines myFunction, then you add another script that redefines myFunction to something else, the new value will be kept. You can remove the old script tag if you want to keep the DOM clean, but that's all removing it does.
EDIT2: The only real way to "clean up" functions that I can think of is to have a JS file that basically calls delete window.myFunction for every possible object and function your other script files may define. For obvious reasons, this is a really bad idea.
If your scripts have already executed removing the DOM elements are not going to get rid of them. Go to any page with JavaScript, open up your preferred javascript console and type $("script").remove(). Everything keeps running.
And this demonstrates #Kolink answer:
http://jsfiddle.net/X2mk8/2/
HTML:
<div id="output"></div>
<script id="yourDynamicGeneratedScript">
function test(n) {
$output = $("#output")
$output.append("test " + n + "<br/>")
}
test(1);
</script>
Javascript:
$("script").remove();
// or $("#yourDynamicGeneratedScript").remove();
test(2);
test(3);
test(4);
function test(n) {
$output = $("#output")
$output.append("REDEFINED! " + n + "<br/>")
}
test(5);
test(6);
test(7);

HTML attributes that can contain javascript

I'm looking for a simple list of all the html attributes that can contain javascript that will automatically run when an action is performed. I know this will differ between browsers and versions but I'd rather be safer than sorry. I currently know of the following javascript attributes: onload, onclick, onchange, onmouseover, onmouseout, onmousedown, and onmouseup
Backstory:
I'm getting a full html document from an untrusted source and I want to strip all javascript that could run from the original html document so I'm removing all script tags as well as any attributes that could hold javascript before its displayed in an iframe. For this implantation there is no server side processing and no way of sandboxing the code since I need to run javascript that is being added locally after all of the original javascript is removed.
There are two places where Javascript can be used in HTML attributes:
Any onEVENT attribute. I suggest just treating any attribute that begins with on as an event binding, and strip them all out.
Any attribute that can contain a URI will be executed as Javascript if the URI uses the javascript: scheme, such as href and src. A complete list is in
COMPLETE list of HTML tag attributes which have a URL value?
http://www.w3.org/TR/html401/interact/scripts.html#h-18.2.3
Scroll down to 18.2.3 Intrinsic events
I've had a similar requirement in a project. Don't forget to strip script elements, as well.

Possible to modify DOM during/before initial DOM parsing?

Is it possible to modify the DOM during or before the initial DOM parsing? Or do I have to wait until the DOM is parsed and built before interacting with it? More specifically, is it possible to hinder a script element in DOM from running using userscripts/content scripts or similar in chrome or firefox?
Tried using eventListeners on DOMNodeInserted before the DOM is parsed, but these are only fired after the DOM is built.
These are two separate questions:
1. Is it possible to modify the DOM during or before the initial DOM parsing?
Yes. As soon as the browser builds the root element, then you can start querying and mutating the DOM. Note that when your script runs, some of the page may still yet be unparsed, perhaps even still in transit on the network. Your script generally has access to any element declared in the source before the script tag containing/calling your script. This includes parent elements containing your script tag.
2. Is it possible to hinder a script element in DOM from running using userscripts/content scripts or similar in chrome or firefox?
No. All scripts are executed and one script can't prevent another script's initial execution. However, you can perhaps go back through and remove event handlers, and otherwise attempt to counteract the effects of a script. Although this scenario seems a bit shady and/or against the grain of normal JavaScript usage.
I've actually looked into this a lot while developing an adblocker for Chrome. Basically, you can't use just Javascript to stop Javascript, especially since it would run simultaneously with the script element in question. So even if it would work it would cause a race condition.
you can modify elements during document parsing, but after than this element has added to dom.
for example
<div id="example"></div>
<script>
document.getElementById('example').innerHTML = 'hello';
</script>
<div>something else</div>
Is it possible to modify the DOM during or before the initial DOM parsing?
Yes it is possible.
Start by completely preventing the document from getting parsed altogether then on the side, fetch the same document, do any processing on this document and then inject the resulting document in the page.
Here is how I currently do just that https://stackoverflow.com/a/36097573/6085033

Categories

Resources