I'm trying to run chrome debugger to gather deobfuscated JavaScripts but it returns a large number of chunks for a script. I want to know how does chrome divides one JavaScript file to multiple chunks? What exactly is one chunk of a script?
I am aware that for each script file, script tag, and eval() function separate chunks will be created. I just want to pinpoint all possible cases for which chunks are created. For example, does lazy parsing also creates chunks for some functions?
It would be great if somebody can point me to some documentation about how the process works.
Chrome debugging protocol API used
The Debugger.scriptParsed event is what generates the scriptId for each script found. Each script file parsed will fire the event and should receive a unique id. As well as individual script files, each instance of a <script> tag within a page source will also get its own id.
There are a number of parameters the event callback gets passed. You can inspect all the arguments by logging out the arguments object. For example, you'll get a url back for a script file and startLine gives you the offset for <script> tags that appear in an HTML resource.
on('Debugger.scriptParsed', function() {
console.log(JSON.stringify(arguments, null, 2)(
});
Investigation
In response to your updated question, my understanding is that in debugger mode it will attempt a full parse and not to try to optimise with a pre-parse. I wasn't able to really understand the v8 code base, but I did a little testing.
I created an HTML page with a button, and a separate script file. I executed one function immediately after page load. Another function is executed on click of button.
document.addEventListener("DOMContentLoaded", () => {
document.getElementById('clickMe').addEventListener('click', () => clicked);
});
I inspected the output from Debugger.scriptParsed. It parsed the script file straight away. Upon clicking the button, there were no additional output entries. If I changed the handler to invoke the clicked called dynamically using eval, it did output a new script chunk.
eval('clicked()');
This makes sense as there's no way for the parser to know about it until it's executed. The same applies to an inline handler, e.g. onclick="doSomething()"
Other chunks I noticed were Chrome Extension URIs and internals URIs from chrome.
Related
One of my web page elements is an <object> element with neither <iframe> nor any other container. I'd be happiest if I can keep using it without a container because it works fine for everything but this my current task of getting a piece of data over to a <textarea> in the main part of the page. In particular <object> elements morph into text, HTML, image, or video styles according to the return from their URI, so I am forced to make the server send the client on event Javascript coding and some functions located in a <script> tag just prior to the <table> tag.
My difficulty: I have a <textarea>.value outside of this <object> called StartTimeOfInterestEl.value, that needs to display (after trimming and adjusting) a short string of character data located in a <td> of the table.
EDIT: An easier way of saying this is that the Javascript that comes in the URI needs to directly(?) set the static element StartTimeOfInterestEl.value, but indications are that there is a scope boundary between server-sent Javascript code and the static code.
Javascript static side:
function takeTimeOfInterest(timestmp)
{
StartTimeOfInterestEl.value=timestmp
}
Server-side script commands that I've tried:
window.opener.takeTimeOfInterest(StartTimeOfInterest)
or
opener.document.takeTimeOfInterest(StartTimeOfInterest)
or just
takeTimeOfInterest(StartTimeOfInterest)
and none of these execute the function to receive the data, as evidenced when I place an alert box in that function - no alert happens. In fact, each of these causes code execution to end at that statement, proving to me that each trips an error.
The commands that execute on through without error seem to be commands that try to directly set StartTimeOfInterestEl.value, but though commands may not error to end execution, I haven't found one that actually does set that value successfully.
Is there some kind of scope boundary that exists between server-sent code and static code? I'm obviously unsure of child, parent, and opener relationships in this scenario and have tried window.opener to get the server-sent code to execute global page functions, but the execution of code just will not cross from child to opener scope, if such scope difference exists. And I'm unsure whether server-sent code is actually boundaries at a window level from the static page elements.
Could someone throw me a clue here that maybe is a specific method of using container-less <object>? Thank you!
(Note the server-sent code is NOT generated by PHP but by bash script, so I just care about that aspect of code as the client will render it. IOW, My PHP experience is way too little for me to translate it to bash that I need, so a PHP snippet won't help me here.)
Got it. Simple:
window.top.StartTimeOfInterestEl.value=StartTimeOfInterest in the server-side code.
I am trying to analyze some JavaScript code for which I make use of function rewriting so that calls to a JavaScript library go through my JavaScript code. My JavaScript code is part of a Chrome Extension. From a Chrome extension content script, I install/inject the code into the target page's DOM.
This works fine for functions that are induced after the load of page. The library calls go through my function. But, there's JavaScript code that runs while the page is actually loading (probably while the DOM is being rendered). This happens before my custom script is injected. This way, the function calls before the custom script is injected are lost to me, or those JavaScript calls do not go through my function.
I make use of Content Script to actually inject other JavaScript by appending to the DOM as mentioned in the following Stack Exchange question:
Insert code into the page context using a content script
I know I can cause the loading time of Content Script to be at the start/end of the DOM but this is another script file that I append to the DOM of the target page. I do not seem to understand how to control it.
The problem explained in Is it possible to run a script in context of a webpage, before any of the webpage's scripts run, using a chrome extension?
is exactly the same, but the solution does not seem to work. My intention is to make the injected script execute before any JavaScript code executes from the webpage. By specifying document_start in manifest.json, content script execution can be made to run before the webpage, but not the script that I inject through the content script (injecting script as explained in first link). This injected script is not running in any specific manner with respect to the webpage
Manifest.json:
Manifest file has the content script content.js added at document_start, so content.js is run before the target webpage (underlying page) runs.
"content_scripts":[
{
"matches":["<all_urls>"],
"js":["content.js"],
"run_at":"document_start",
"all_frames":false
}
],
content.js:
content.js has the below code with which I add the main.js to the DOM, so that I am actually able to interact with the JavaScript that is in the target page's environment. I do this from a different file and attach it to the DOM because I cannot interact with the target page's JavaScript through the Content Scripts, since they both do not interfere with each other.
To explain further, main.js has some JavaScript that intercepts JavaScript calls during the execution of JavaScript in target page. JavaScript in target page makes calls to a library and I intend just to write a wrapper on those library functions.
var u = document.createElement('script');
u.src = chrome.extension.getURL('main.js');
(document.head||document.documentElement).appendChild(u);
u.onload = function() {
u.parentNode.removeChild(u);
};
I expect that main.js is available in the target page's domain and any of the scripts in the target page, since I inject it through the content script that is run at document_start.
Assume I have a call to some JavaScript function like this in my target page HTML, someJSCall() is defined by the target page's domain.
<html onLoad="someJSCall( )">
In this scenario, main.js (code injected through my Chrome extension) is already available. So calls to the JavaScript library from someJSCall() function go through main.js wrapper functions.
This works fine.
The problem is when there are IIFE (immediately invoked function expressions) defined in the target page's JavaScript. If these IIFE calls make library calls, this does not go through my main.js interceptions. If I look at the files loaded in the browser through Chrome Dev Tools, I see that main.js is still not loaded while IIFE calls are executing.
I hope I have explained the problem in detail.
Based on the additional information you added to the question about 2.5 weeks after I answered, you are adding code to the page context by including a "main.js", which is a separate file in your extension, using a <script> that looks something like:
<script src="URL_to_file_in_extension/main.js"/>
However, when you do that you introduce an asynchronous delay between when the <script> is inserted into the page and when the "main.js" is fetched from the extension and executed in the page context. You will not be able to control how long this delay is and it may, or may not, result in your code running prior to any particular code in the page. It will probably run prior to code that has to be fetched from external URLs, but may not.
In order to guarantee that your code runs synchronously, you must insert it in a <script> tag as actual code, not using the src attribute to pull in another file. That means the code which you want to execute in the page must exist within the content script file you are loading into the page.
Needing to execute code in the page context is a fairly common requirement. I've needed to do so in browser extensions (e.g. Chrome, Firefox, Edge, etc.) and in userscripts. I've also wanted to be able to pass data to such code, so I wrote a function called executeInPage(), which will take a function defined in the current context, convert it to text, insert it into the page context and execute it while passing any arguments you have for it (of most types). If interested, you can find executeInPage() in my answer to Calling webpage JavaScript methods from browser extension and my answer to How to use cloneInto in a Firefox web extension?
The following is my original answer based on the original version of the question, which did not show when the content script was being executed, or explain that the code being added to the page was in a separate file, not in the actual content script.
You state in your question that you "can handle the loading time of Content Script to be at the start/end of the DOM", but you don't make clear why you are unable to resolve your issue by executing your content script at document_start.
You can have your script injected prior to the page you are injecting into being built by specifying document_start for the run_at property in your manifest.json content_scripts entry, or for the runAt option passed to chrome.tabs.executeScript(). If you do this, then your script will start running when document.head and document.body are both null. You can then control what gets added to the page.
For chrome.tabs.executeScript() exactly when your script runs depends on when you execute chrome.tabs.executeScript() in relation to the process of loading the page. Due to the asynchronous nature of the processing (your background script is usually running in a different process), it is difficult to get your script consistently injected when document.head and document.body are both null. The best I've accomplished is to have the script injected sometimes when that is the case, and sometimes after the page is populated, but prior to any other resources being fetched. This timing will work for most things, but if you really need to have your script run prior to the page existing, then you should use a manifest.json content_scripts entry.
With your content script running prior to the existence of the head and body, you can control what gets inserted first. Thus, you can insert your <script> prior to anything else on the page. This should make your script execute prior to any other script in the page context.
I have a Java Web Application, and I'm wondering if the javascript files are downloaded with the HTML-body, or if the html body is loaded first, then the browser request all the JavaScript files.
The reason for this question is that I want to know if importing files with jQuery.getScript() would result in poorer performance. I want to import all files using that JQuery function to avoid duplication of JavaScript-imports.
The body of the html document is retrieved first. After it's been downloaded, the browser checks what resources need to be retrieved and gets those.
You can actually see this happen if you open Chrome Dev Console, go to network tab (make sure caching is disabled and logs preserved) and just refresh a page.
That first green bar is the page loading and the second chunk are the scripts, a stylesheet, and some image resources
The HTML document is downloaded first, and only when the browser has finished downloading the HTML document can it find out which scripts to fetch
That said, heavy scripts that don't influence the appearance of the HTML body directly should be loaded at the end of the body and not in the head, so that they do not block the rendering unless necessary
I'm wondering if the javascript are downloaded with the html body during a request
If it's part of that body then yes. If it's in a separate resource then no.
For example, suppose your HTML file has this:
<script type="text/javascript">
$(function () {
// some code here
});
</script>
That code, being part of the HTML file, is included in the HTML resource. The web server doesn't differentiate between what kind of code is in the file, it just serves the response regardless of what's there.
On the other hand, if you have this:
<script type="text/javascript" src="someFile.js"></script>
In that case the code isn't in the same file. The HTML is just referencing a separate resource (someFile.js) which contains the code. In that case the browser would make a separate request for that resource. Resulting in two requests total.
The HTML document is downloaded first, or at least it starts to download first. While it is parsed, any script includes that the browser finds are downloaded. That means that some scripts may finish loading before the document is completely loaded.
While the document is being downloaded, the browser parses it and displays as much as it can. When the parsing comes to a script include, the parsing stops and the browser will suspend it until the script has been loaded and executed, then the parsing continues. That means that
If you put a call to getScript instead of a script include, the behaviour will change. The method makes an asynchronous request, so the browser will continue parsing the rest of the page while the script loads.
This has some important effects:
The parsing of the page will be completed earlier.
Scripts will no longer run in a specific order, they run in the order that the loading completes.
If one script is depending on another, you have to check yourself that the first script has actually loaded before using it in the other script.
You can use a combination of script includes and getScript calls to get the best effect. You can use regular scripts includes for scripts that other scripts depend on, and getScript for scripts that are not affected by the effects of that method.
I am creating a firefox extension that lets the operator perform various actions that modify the content of the HTML document. The operator does not edit HTML, they take other actions and my extension modifies the document by inserting elements, adding attributes, and so forth.
When the operator is finished, they need to be able to save the HTML document as a file (or have my extension send it to an internet destination, but this is not required since they can email the saved file).
I thought maybe the changes made by the javascript code in my extension would be reflected in the HTML document, but when I ask the firefox browser to "view source" after making modifications, it displays the original HTML text.
My questions are:
#1: What is the easiest way for the operator to save the HTML document with all the changes my extension has made?
#2: What is the easiest way for the javascript code in my extension to process the HTML document contents and write to an HTML file on the local disk?
#3: Is any valid HTML content incapable of accurate representation in the saved file?
#4: Is the TreeWalker part of the solution (see below)?
A couple observations from my research so far:
I've read about the TreeWalker object, which seems to provide a fairly painless way for an extension to walk through everything (?or almost everything?) in the HTML document. But does it expose everything so everything in the original (and my modifications) can be saved without losing anything of importance?
Does the TreeWalker walk through the HTML document in the "correct order" --- the order necessary for my extension to generate the original and/or modified HTML document?
Anything obscure or tricky about these problems?
Ok so I am assuming here you have access to page DOM. What you need to do it basically make changes to the dom and then get all the dom code and save it as a file. Here is how you can download the page's html code. This will create an a tag which the user needs to click for the file to download.
var a = document.createElement('a'), code = document.querySelectorAll('html')[0].innerHTML;
a.setAttribute('download', 'filename.html');
a.setAttribute('href', 'data:text/html,' + code);
Now you can insert this a tag anywhere in the DOM and the file will download when the user clicks it.
Note: This is sort of a hack, this injects entire html of the file in the a tag, it should in theory work in any up to date browser (except, surprise, IE). There are more stable and less hacky ways of doing it like storing it in a file system API file and then downloading that file instead.
Edit: The document.querySelectorAll line accesses the page DOM. For it to work the document must be accessible. You say you are modifying DOM so that should already be there. Make sure you are adding the code on the page and not your extension code. This code will be at the same place as your DOM modification code, not your extension pages that can't access the DOM.
And as for the a tag, it will be inserted in the page. I skipped the steps since I assumed you already know how to manipulate DOM and also because I don't know where you would like to add the link. And you can skip the user action of clicking the link too, but it's a hack and only works in modern browsers. You can insert the a tag somewhere in the original page where user won't see it and then call the a.click() function to simulate a click event on the link. But this is not a legit way and I personally only use it on my practice projects to call click event listeners.
I can only test this on chrome not on FF but try this code, this will not require you to even add the a link to DOM. You need to add this next to the DOM manipulation code. This will work if luck is on your side :)
var a = document.createElement('a'), code = document.querySelectorAll('html')[0].innerHTML;
a.setAttribute('download', 'filename.html');
a.setAttribute('href', 'data:text/html,' + code);
a.click();
There is no easy way to do this with the web API only, at least when you want a result that does not omit stuff like the doctype or comments. You could still write a serializer yourself that goes through document.childNodes and serialized according to the node type (Element.outerHTML, Comment.data and so on).
Luckily, you're writing a Firefox add-on, so you have access to a lot more (powerful) stuff.
While still not 100% perfect, the nsIDocumentEncoder implementations will produce pretty decent results, that should only differ in some whitespace and explicit charset declaration at most (everything else is a bug).
Here is an example on how one might use this component:
function serializeDocument(document) {
const {
classes: Cc,
interfaces: Ci,
utils: Cu
} = Components;
let encoder = Cc['#mozilla.org/layout/documentEncoder;1?type=text/html'].createInstance(Ci.nsIDocumentEncoder);
encoder.init(document, 'text/html', Ci.nsIDocumentEncoder.OutputLFLineBreak | Ci.nsIDocumentEncoder.OutputRaw);
encoder.setCharset("utf-8");
return encoder.encodeToString();
}
If you're writing an SDK add-on, stuff gets more complicated as the SDK abstracts some important stuff away. You'll need to go through the chrome module, and also figure out the active window and tab yourself. Something like Services.wm.getMostRecentWindow("navigator:browser").content.document (Services.jsm) should do the trick.
In XUL overlay add-ons, content.document should suffice to get the document of the currently active tab, and you have Components access already.
Still, you need to let the user choose a file destination, usually through nsIFilePicker and then actually write the file, by using something like a file stream or the fully async OS.File API.
Looks like I get to answer my own question, thanks to someone in mozilla #extdev IRC.
I got totally faked out by "view source". When I didn't see my modifications in the window displayed by "view source", I assumed the browser would not provide the information.
However, guess what? When I "file" ===>> "save page as...", then examine the page contents with a plain text editor... sure enough, that contained the modifications made by my firefox extension! Surprise!
A browser has no direct write access to the local filesystem. The only read access it has is when explicitly provide a file:// URL (see note 1 below)
In your case, we are explicitly talking about javascript - which can read and write cookies and local storage. It can also send stuff back to the server and retrieve it, e.g. using AJAX.
Stuff you put in local storage/cookies is effectively not accessible to other programs (such as email clients).
It is possible to create very long mailto: URLs (see note 2) but only handles inline content in the email and you're going to run into all sorts of encoding issues that you're not ready to deal with.
Hence I'd recommend pursuing storage serverside via AJAX - and look at local storage once you've got this sorted/working.
Note 1: this is not strictly true. a trusted, signed javascript has access to additional functions which may include direct file access.
Note 2: (the limit depends on the browser and the email client - Lotus Notes truncaets the content rather a lot)
I have a few scripts that are common among all html pages for my application. Call this file commonfunctions.js. Each html page will load it as you move around the application along with appending the last modification date for this js file (that's gotten from the server). Firebug is adding the file every time to the list of loaded scripts as well as an eval/seq/# (where # is the number of times this file has been loaded starting at 7 for some reason). For example, if I have 3 pages called one.html, two.html, and three.html each with this line of code:
<script type="text/javascript" src="commonfunctions.js?mod=11/33/2012"></script>
If I were to go from one.html->two.html->one.html->three.html, Firebug would list the scripts loaded as:
commonfunctions.js?mod=11/33/2012
commonfunctions.js?mod=11/33/2012/eval/seq/7
commonfunctions.js?mod=11/33/2012/eval/seq/8
commonfunctions.js?mod=11/33/2012/eval/seq/9
and so on as I visit the three pages more.
Why is this happening and is there a way to stop it? I read that it could be that firebug will make its own url if it doesn't know the url due to an eval() or event attribute; however, these scripts are being loaded via regular tags.
I'm concerned because I'm not sure if this means the browser has now compiled and is executing or storing multiple copies of the same script--very wasteful in both conditions.
The script may have been loaded via script tag, but somewhere within commonfunctions.js a call to eval()has been made. Or three, obviously.