Caching Javascript inlined in HTML - javascript

Instead of having an external .js file, we can inline Javascript directly in HTML, i.e.
Externalized version
<html>
<body>
<script type="text/javascript" src="/app.js"></script>
</body>
</html>
Inlined version
<html>
<body>
<script type="text/javascript">
// app.js inlined
</script>
</body>
</html>
However, it's not recommended:
https://developer.yahoo.com/performance/rules.html#external
Put javascript and css inline in a single minified html file to improve performance?
The main reason is caching and pre-compiling - in the externalized version, the browser can download, pre-compile and store the file once for multiple pages, while it cannot do the same for inlined version.
However, is it possible to do something along these lines:
Inlined keyed version
<html>
<body>
<script type="text/javascript" hash="abc">
// app.js inlined
</script>
</body>
</html>
That is, do this:
In the first invocation, send the whole script and somehow tell the browser that the script hash is abc
Later, when the browser loads that or other pages containing the same script, it will send this key as a cookie. The server will only render the contents of the script if the key has been received.
That is, if the browser already knows about the script, the server will render just this:
Inlined keyed version, subsequent fetches (of the same or other pages)
<html>
<body>
<script type="text/javascript" hash="abc">
</script>
</body>
</html>
where notably the script contents are empty.
This would allow for shorter script fetching with a natural fallback.
Is the above possible? If not, is some other alternative to the above possible?

I don't know of a way to do what you asked, so I'll provide an alternative that might still suit your needs.
If you're really after a low latency first page load, you could inline the script, and then after the page loads, load the script via url so that it's in the browser cache for future requests. Set a cookie once you've loaded the script by direct url, so that your server can determine whether to inline the script or provide the external script url.
first page load
<script>
// inlined my-script.js goes here.
</script>
<script>
$(function(){
// load it again, so it's in the browser cache.
// notice I'm not executing the script, just loading it.
$.ajax("my-script.js").then(function(){
// set a cookie marking this script as cached
});
});
</script>
second page load
<script src="my-script.js"></script>
Obviously, this has the drawback that it loads the script twice. It also adds additional complexity for you to take care of when you update your script with new code - you need to make sure you address the cookie being for a old version.
I wouldn't bother with all this unless you really feel the need to optimize the first page. It might be worth it in your case.

The Concept
Here's an interesting approach (after being bugged by notifications :P)
You could have the server render your script this way. Notice the weird type attribute. That's to prevent the script from executing. We'll get to that in a second.
<script type="text/cacheable" data-hash="9182n30912830192c83012983xm019283x">
//inline script
</script>
Then create a library that looks for these scripts with weird types, get the innerHTML of these scripts, and execute them in the global context as if they were normally executing (via eval or new Function). This makes them execute like normal scripts. Here's a demo:
<script type="text/cacheable" data-hash="9182n30912830192c83012983xm019283x">
alert(a);
</script>
<script type="text/cacheable" data-hash="9182n30912830192c83012983xm019283x">
alert(b);
</script>
<script>
// Let's say we have a global
var a = "foo";
var b = "bar"
// Getting the source
var scripts = Array.prototype.slice.call(
document.querySelectorAll('script[type="text/cacheable"]')
);
scripts.forEach(function(script){
// Grabbing
var source = script.innerHTML;
// Create a function (mind security on this one)
var fn = new Function(source);
// Execute in the global scope
fn.call(window);
});
</script>
However...
Since you have the script source (the innerHTML), you can cache them somewhere locally (like in localStorage) and use the hash as its identifier. Then you can store the same hash in the cookie, where future page-requests can tell the server "Hey, I have cached script with [hash]. Don't print the script on the page anymore". Then you'll get this in future requests:
<script type="text/cacheable" data-hash="9182n30912830192c83012983xm019283x"></script>
That covers up the first half. The second phase is when your library sees an empty script. The other thing your library should do is when it sees an empty script, it should look up for that script with that hash in your local storage, get the script's source and execute it like you just did in the first place.
The Catch
Now there's always a trade-off in everything, and I'll highlight what I can think of here:
Pros
You only need one request for everything. Initial pageload contains scripts, subsequent pages become lighter because of the missing code, which is already cached by then.
Instant cache busting. Assuming the hash and code are 1:1, then changing the content should change the hash.
Cons
This assumes that pages are dynamic and are never cached. That's because if you happen to create a new script, with new hash, but had the client cache the page, then it will still be using the old hashes thus old scripts.
Initial page load will be heavy due to inlined scripts. But this can be overcome by compressing the source using a minifier on the server. Overhead of minification can also be overcome by caching minified results on the server.
Security. You'll be using eval or new Function. This poses a big threat when unauthorized code manages to sneak in. In addition, the threat is persistent because of the caching.
Out of sync pages. What happens if you get an empty script, whose hash is not in the cache? Perhaps the user deleted local storage? You'll have to issue a request to the server for it. Since you want the source, you'll have to have AJAX.
Scripts are not "normal". Your script is best put at the end of the page so that all inline scripts will be parsed by then. This means your scripts execute late and never in the time they get parsed by the browser.
Storage limits. localStorage has a size limit of 5-10MB, depending on which browser we're talking about. Cookies are limited to 4KB generally.
Request size. Note that cookies are shipped up to the server on request and down to the browser on response. That additional load might be more of a hassle than it is for good.
Added server-side logic. Because you need to know what needs to be added, you need to program your server to do it. This makes the client-side implementation dependent on the server. Switching servers (say from PHP to Python) wouldn't be as easy, as you need to port over the implementation.

If your <script> is not introduced as type=text/javascript, it will simply not be executed.
So you could have many tags like theses:
<script type="text/hashedjavascript" hash="abc">...</script>
<script type="text/hashedjavascript" hash="efg">...</script>
Then when the DOM is loaded, pick one and evaluate it.
I made an example here: http://codepen.io/anon/pen/RNGQEM
But it smells, real bad. It's definitely better to fetch two different files.
Actually what you should do, is have a single file my-scripts.js that contains the code for each of your script, wrapped in a function
// file: my-scripts.js
function script_abc(){
// what script abc is supposed to do
}
function script_efg(){
// what script efg is supposed to do
}
Then execute whatever your cookie tells you to. This is how AMD builders concatenate multiples files in one.
Also look for an AMD library such as requirejs
Edit: I misunderstood your question, removed the irrelevant part.

Related

Are JavaScript files downloaded with the HTML-body

I have a Java Web Application, and I'm wondering if the javascript files are downloaded with the HTML-body, or if the html body is loaded first, then the browser request all the JavaScript files.
The reason for this question is that I want to know if importing files with jQuery.getScript() would result in poorer performance. I want to import all files using that JQuery function to avoid duplication of JavaScript-imports.
The body of the html document is retrieved first. After it's been downloaded, the browser checks what resources need to be retrieved and gets those.
You can actually see this happen if you open Chrome Dev Console, go to network tab (make sure caching is disabled and logs preserved) and just refresh a page.
That first green bar is the page loading and the second chunk are the scripts, a stylesheet, and some image resources
The HTML document is downloaded first, and only when the browser has finished downloading the HTML document can it find out which scripts to fetch
That said, heavy scripts that don't influence the appearance of the HTML body directly should be loaded at the end of the body and not in the head, so that they do not block the rendering unless necessary
I'm wondering if the javascript are downloaded with the html body during a request
If it's part of that body then yes. If it's in a separate resource then no.
For example, suppose your HTML file has this:
<script type="text/javascript">
$(function () {
// some code here
});
</script>
That code, being part of the HTML file, is included in the HTML resource. The web server doesn't differentiate between what kind of code is in the file, it just serves the response regardless of what's there.
On the other hand, if you have this:
<script type="text/javascript" src="someFile.js"></script>
In that case the code isn't in the same file. The HTML is just referencing a separate resource (someFile.js) which contains the code. In that case the browser would make a separate request for that resource. Resulting in two requests total.
The HTML document is downloaded first, or at least it starts to download first. While it is parsed, any script includes that the browser finds are downloaded. That means that some scripts may finish loading before the document is completely loaded.
While the document is being downloaded, the browser parses it and displays as much as it can. When the parsing comes to a script include, the parsing stops and the browser will suspend it until the script has been loaded and executed, then the parsing continues. That means that
If you put a call to getScript instead of a script include, the behaviour will change. The method makes an asynchronous request, so the browser will continue parsing the rest of the page while the script loads.
This has some important effects:
The parsing of the page will be completed earlier.
Scripts will no longer run in a specific order, they run in the order that the loading completes.
If one script is depending on another, you have to check yourself that the first script has actually loaded before using it in the other script.
You can use a combination of script includes and getScript calls to get the best effect. You can use regular scripts includes for scripts that other scripts depend on, and getScript for scripts that are not affected by the effects of that method.

javascript stream wrapper, intercept JS file contents

In PHP there's a function called stream_wrapper_register. With that i can get the file contents of every PHP file that is about to be included. So that basically gives me control over the 'code' that will get parsed.
I was wondering if there's something like this in javascript too? So suppose i include my file:
<script type="text/javascript" src="js/myfile.js"></script>
My code in that file then sets up the stream wrapper (suppose this is available in JS too). Now i want to be able to get the file contents of every other javascript file that will be included:
<script type="text/javascript" src="js/somefile.js"></script>
<script type="text/javascript" src="js/someotherfile.js"></script>
But this ofcourse must happen before before the browser actually executes those files.
So is there a way to intercept that somehow?
$.ajax("/path/to/javascript.js").done(function(source) {
eval(transmogrifySourceCode(source));
});
I used the jQuery syntax because AJAX-style gets are much easier that way, and you'll have to provide your own transmogrifySourceCode function to edit the source before you load it.
I do wonder why you'd want to do, that, though. You should be in full control over your input source, so why not just excise the code you don't want on the server?
No, you can't. Alone for security reasons you won't be allowed to get every script's content.
For Opera, there is a special BeforeScript event which can be listened to from local user scripts.
So there is no (good) way to detect (dynamically added) <script> elements in a page and prevent them from loading and executing a script. Yet you could load the script files by ajax, respecting the same-origin-policy (!), and evaling their modified contents as #DavidEllis suggested.
Elsewise, you need to proxy all script inclusions over your server and modify them there.

How do I protect JavaScript files?

I know it's impossible to hide source code but, for example, if I have to link a JavaScript file from my CDN to a web page and I don't want the people to know the location and/or content of this script, is this possible?
For example, to link a script from a website, we use:
<script type="text/javascript" src="http://somedomain.example/scriptxyz.js">
</script>
Now, is possible to hide from the user where the script comes from, or hide the script content and still use it on a web page?
For example, by saving it in my private CDN that needs password to access files, would that work? If not, what would work to get what I want?
Good question with a simple answer: you can't!
JavaScript is a client-side programming language, therefore it works on the client's machine, so you can't actually hide anything from the client.
Obfuscating your code is a good solution, but it's not enough, because, although it is hard, someone could decipher your code and "steal" your script.
There are a few ways of making your code hard to be stolen, but as I said nothing is bullet-proof.
Off the top of my head, one idea is to restrict access to your external js files from outside the page you embed your code in. In that case, if you have
<script type="text/javascript" src="myJs.js"></script>
and someone tries to access the myJs.js file in browser, he shouldn't be granted any access to the script source.
For example, if your page is written in PHP, you can include the script via the include function and let the script decide if it's safe" to return it's source.
In this example, you'll need the external "js" (written in PHP) file myJs.php:
<?php
$URL = $_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
if ($URL != "my-domain.example/my-page.php")
die("/\*sry, no acces rights\*/");
?>
// your obfuscated script goes here
that would be included in your main page my-page.php:
<script type="text/javascript">
<?php include "myJs.php"; ?>;
</script>
This way, only the browser could see the js file contents.
Another interesting idea is that at the end of your script, you delete the contents of your dom script element, so that after the browser evaluates your code, the code disappears:
<script id="erasable" type="text/javascript">
//your code goes here
document.getElementById('erasable').innerHTML = "";
</script>
These are all just simple hacks that cannot, and I can't stress this enough: cannot, fully protect your js code, but they can sure piss off someone who is trying to "steal" your code.
Update:
I recently came across a very interesting article written by Patrick Weid on how to hide your js code, and he reveals a different approach: you can encode your source code into an image! Sure, that's not bullet proof either, but it's another fence that you could build around your code.
The idea behind this approach is that most browsers can use the canvas element to do pixel manipulation on images. And since the canvas pixel is represented by 4 values (rgba), each pixel can have a value in the range of 0-255. That means that you can store a character (actual it's ascii code) in every pixel. The rest of the encoding/decoding is trivial.
The only thing you can do is obfuscate your code to make it more difficult to read. No matter what you do, if you want the javascript to execute in their browser they'll have to have the code.
Just off the top of my head, you could do something like this (if you can create server-side scripts, which it sounds like you can):
Instead of loading the script like normal, send an AJAX request to a PHP page (it could be anything; I just use it myself). Have the PHP locate the file (maybe on a non-public part of the server), open it with file_get_contents, and return (read: echo) the contents as a string.
When this string returns to the JavaScript, have it create a new script tag, populate its innerHTML with the code you just received, and attach the tag to the page. (You might have trouble with this; innerHTML may not be what you need, but you can experiment.)
If you do this a lot, you might even want to set up a PHP page that accepts a GET variable with the script's name, so that you can dynamically grab different scripts using the same PHP. (Maybe you could use POST instead, to make it just a little harder for other people to see what you're doing. I don't know.)
EDIT: I thought you were only trying to hide the location of the script. This obviously wouldn't help much if you're trying to hide the script itself.
Google Closure Compiler, YUI compressor, Minify, /Packer/... etc, are options for compressing/obfuscating your JS codes. But none of them can help you from hiding your code from the users.
Anyone with decent knowledge can easily decode/de-obfuscate your code using tools like JS Beautifier. You name it.
So the answer is, you can always make your code harder to read/decode, but for sure there is no way to hide.
Forget it, this is not doable.
No matter what you try it will not work. All a user needs to do to discover your code and it's location is to look in the net tab in firebug or use fiddler to see what requests are being made.
From my knowledge, this is not possible.
Your browser has to have access to JS files to be able to execute them. If the browser has access, then browser's user also has access.
If you password protect your JS files, then the browser won't be able to access them, defeating the purpose of having JS in the first place.
I think the only way is to put required data on the server and allow only logged-in user to access the data as required (you can also make some calculations server side). This wont protect your javascript code but make it unoperatable without the server side code
I agree with everyone else here: With JS on the client, the cat is out of the bag and there is nothing completely foolproof that can be done.
Having said that; in some cases I do this to put some hurdles in the way of those who want to take a look at the code. This is how the algorithm works (roughly)
The server creates 3 hashed and salted values. One for the current timestamp, and the other two for each of the next 2 seconds. These values are sent over to the client via Ajax to the client as a comma delimited string; from my PHP module. In some cases, I think you can hard-bake these values into a script section of HTML when the page is formed, and delete that script tag once the use of the hashes is over The server is CORS protected and does all the usual SERVER_NAME etc check (which is not much of a protection but at least provides some modicum of resistance to script kiddies).
Also it would be nice, if the the server checks if there was indeed an authenticated user's client doing this
The client then sends the same 3 hashed values back to the server thru an ajax call to fetch the actual JS that I need. The server checks the hashes against the current time stamp there... The three values ensure that the data is being sent within the 3 second window to account for latency between the browser and the server
The server needs to be convinced that one of the hashes is
matched correctly; and if so it would send over the crucial JS back
to the client. This is a simple, crude "One time use Password"
without the need for any database at the back end.
This means, that any hacker has only the 3 second window period since the generation of the first set of hashes to get to the actual JS code.
The entire client code can be inside an IIFE function so some of the variables inside the client are even more harder to read from the Inspector console
This is not any deep solution: A determined hacker can register, get an account and then ask the server to generate the first three hashes; by doing tricks to go around Ajax and CORS; and then make the client perform the second call to get to the actual code -- but it is a reasonable amount of work.
Moreover, if the Salt used by the server is based on the login credentials; the server may be able to detect who is that user who tried to retreive the sensitive JS (The server needs to do some more additional work regarding the behaviour of the user AFTER the sensitive JS was retreived, and block the person if the person, say for example, did not do some other activity which was expected)
An old, crude version of this was done for a hackathon here: http://planwithin.com/demo/tadr.html That wil not work in case the server detects too much latency, and it goes beyond the 3 second window period
As I said in the comment I left on gion_13 answer before (please read), you really can't. Not with javascript.
If you don't want the code to be available client-side (= stealable without great efforts),
my suggestion would be to make use of PHP (ASP,Python,Perl,Ruby,JSP + Java-Servlets) that is processed server-side and only the results of the computation/code execution are served to the user. Or, if you prefer, even Flash or a Java-Applet that let client-side computation/code execution but are compiled and thus harder to reverse-engine (not impossible thus).
Just my 2 cents.
You can also set up a mime type for application/JavaScript to run as PHP, .NET, Java, or whatever language you're using. I've done this for dynamic CSS files in the past.
I know that this is the wrong time to be answering this question but i just thought of something
i know it might be stressful but atleast it might still work
Now the trick is to create a lot of server side encoding scripts, they have to be decodable(for example a script that replaces all vowels with numbers and add the letter 'a' to every consonant so that the word 'bat' becomes ba1ta) then create a script that will randomize between the encoding scripts and create a cookie with the name of the encoding script being used (quick tip: try not to use the actual name of the encoding script for the cookie for example if our cookie is name 'encoding_script_being_used' and the randomizing script chooses an encoding script named MD10 try not to use MD10 as the value of the cookie but 'encoding_script4567656' just to prevent guessing) then after the cookie has been created another script will check for the cookie named 'encoding_script_being_used' and get the value, then it will determine what encoding script is being used.
Now the reason for randomizing between the encoding scripts was that the server side language will randomize which script to use to decode your javascript.js and then create a session or cookie to know which encoding scripts was used
then the server side language will also encode your javascript .js and put it as a cookie
so now let me summarize with an example
PHP randomizes between a list of encoding scripts and encrypts javascript.js then it create a cookie telling the client side language which encoding script was used then client side language decodes the javascript.js cookie(which is obviously encoded)
so people can't steal your code
but i would not advise this because
it is a long process
It is too stressful
use nwjs i think helpful it can compile to bin then you can use it to make win,mac and linux application
This method partially works if you do not want to expose the most sensible part of your algorithm.
Create WebAssembly modules (.wasm), import them, and expose only your JS, etc... workflow. In this way the algorithm is protected since it is extremely difficult to revert assembly code into a more human readable format.
After having produced the wasm module and imported correclty, you can use your code as you normallt do:
<body id="wasm-example">
<script type="module">
import init from "./pkg/glue_code.js";
init().then(() => {
console.log("WASM Loaded");
});
</script>
</body>

Script puzzle <script src="ajaxpage.php?emp_id=23" />?

Very simple Ajax request taking employee id and returning the user info as HTML dumb.
Request ajax("employee/info?emp_id=3543")
Response id = 3543name = some name
This is just another simple JS trick to populate the UI. However i do not understand how something like below is equally able to execute correctly and dump the HTML code.
<script type="text/javascript" src="employee/info?emp_id=3543" />
When page encounters following code it executes like the ajax request is executed and dumps code into page. Only difference is its no more asynchronous as in case of Ajax.
Questions :
Is this correct approach ? its +ves and -ves.
Which are the correct scenarious to user it?
Is this also means that any HTML tag taking "src" tag can be used like this?
I have used this kind of javascript loading for cross domain scripting. Where it is very useful. Here is an example to show what I mean.
[Keep in mind, that JS does not allow cross domain calls from javascript; due to inbuilt security restrictions]
On domain www.xyz.com there lies a service that give me a list of users which can be accessed from http://xyz.com/users/list?age=20
It returns a json, with a wrapping method like following
JSON:
{username:"user1", age:21}
If I request this json wrapped in a method like as follows:
callMyMethod({username:"user1", age:21})
Then this is a wrapped json which if loads on my page; will try to invoke a method called callMyMethod. This would be allowed in a <script src="source"> kind of declaration but would not be allowed otherwise.
So what I can do is as follows
<script language="javascript" src="http://xyz.com/users/list?age=20"></script>
<script language="javascript">
function callMyMethod(data)
{
//so something with the passed json as data variable.
}
</script>
This would allow me to stuff with JSON coming from other domain, which I wouldn't have been able to do otherwise. So; you see how I could achieve a cross domain scripting which would have been a tough nut to crack otherwise.
This is just one of the uses.
Other reasons why someone would do that is:
To version their JS files with
releases.
To uncache the js files so that they are loaded on client as soon as some changes happen to js and params being passed to URL will try to fetch the latest JS. This would enable new changes getting reflected on client immediatly.
When you want to generate conditional JS.
The usage you have specified in example wouldn't probably serve much purpose; would probably just delay the loading of page if processing by server takes time and instead a async ajax call would be much preferred.
Is this correct approach ? its +ves
and -ves.
Depends whether you want to use asynchronous (ajax) way or not. Nothing like +ve or -ve.
The later method takes more time though.
Which are the correct scenarious to
user it?
Ajax way is the correct method there in that sense.
Is this also means that any HTML tag
taking "src" tag can be used like
this?
src is used to specify the source path. That is what it is meant to do.

How to load JavaScript intermixed with html

So I need to pull some JavaScript out of a remote page that has (worthless) HTML combined with (useful) JavaScript. The page, call it, http://remote.com/data.html, looks something like this (crazy I know):
<html>
<body>
<img src="/images/a.gif" />
<div>blah blah blah</div><br/><br/>
var data = { date: "2009-03-15", data: "Some Data Here" };
</body>
</html>
so, I need to load this data variable in my local page and use it.
I'd prefer to do so with completely client-side code. I figured, if I could get the HTML of this page into a local JavaScript variable, I could parse out the JavaScript code, run eval on it and be good to use the data. So I thought load the remote page in an iframe, but I can't seem to find the iframe in the DOM. Why not?:
<script>
alert(window.parent.frames.length);
alert(document.getElementById('my_frame'));
</script>
<iframe name="my_frame" id='my_frame' style='height:1px; width:1px;' frameBorder=0 src='http://remote.com/data.html'></iframe>
The first alert shows 0, the second null, which makes no sense. How can I get around this problem?
Have you tried switching the order - i.e. iframe first, script next? The script runs before the iframe is inserted into the DOM.
Also, this worked for me in a similar situation: give the iframe an onload handler:
<iframe src="http://example.com/blah" onload="do_some_stuff_with_the_iframe()"></iframe>
Last but not least, pay attention to the cross-site scripting issues - the iframe may be loaded, but your JS may not be allowed to access it.
One option is to use XMLHttpRequest to retrieve the page, although it is apparently only currently being implemented for cross-site requests.
I understand that you might want to make a tool that used the client's internet connection to retrieve the html page (for security or legal reasons), so it is a legitimate hope.
If you do end up needing to do it server-side, then perhaps a simple php page that takes a url as a query and returns a json chunk containing the script in a string. That way if you do find you need to filter out certain websites, you need only do this in one place.
The inevitable problem is that some of the users will be hostile, and they then have a license to abuse what is effectively a javascript proxy. As a result, the safest option may be to do all the processing on the server, and not allow certain javascript function calls (eval, http requests, etc).

Categories

Resources