When is filename based browser cachebusting actually needed ?

When is filename based browser cachebusting actually needed ? - javascript

Currently, we're using a method to bust browser cached resources like css and js in a way similar to SE: https://meta.stackexchange.com/questions/112182/how-does-se-determine-the-css-and-js-version-parameter
Anyways, after doing some testing with the HTTP Headers, i'm wondering when this is actually necessary. Is this just a relic left over from the 90s, or are there modern browsers that can't read the Last-Modified or ETags HTTP headers ?

Caching Issues
When you are attempting to server JS or CSS that is volatile and you don't want to/can't (e.g. using a CDN) rely on HTTP cache directive headers to make the browser request the new files. Some older browsers don't respond to HTTP cache directives; so if you are targeting them you have limited options. Baring older browsers some proxy servers strip or invalidate or ignore proxy information because they are buggy or they are acting as aggressive caches. As such using HTTP cache control headers will not work. In this case you are just ensuring your end users don't get odd functionality until they hit F5.
Volatile JS/CSS resources can come from files/resources that are editable through an administration/configuration panel. Some reasons for this are theming, layout editing, or language definition files for internationalization.
HTTP 1.0
There are legacy systems out there that use it. Consider that Oracle's built-in HTTP server (the EGP gateway) in their RDBMS solution still uses it. Some proxies translate 1.1 requests to 1.0. Ancient browsers still only support 1.0, but that should be a relatively non-issue these days.
Whatever the case, HTTP 1.0 uses a different set of control mechanisms that were "primitive" compared to HTTP 1.1's offering. They included a lot of heuristics testing that wasn't specified in the RFC to get caching to work reasonably well. In either case, caching would often cause odd behavior due to stale content being delivered or the same content being requests with no change.
A note on pragma:no-cache
Works only on REQUESTS not RESPONSES; a common thing people don't know. It was meant to keep intermediate systems from caching sensitive information. It still has backwards support in HTTP 1.1, but shouldn't be used because it is deprecated.
...except where Microsoft says IE doesn't do that: http://support.microsoft.com/kb/234067
Input For Generated Content
Yet another reason is JS or CSS that is generated based on input parameters. Just because the URL includes somefile.js does not mean that needs to be a real file on a file system. It could just be JS that is output from a process. Should that process need to output different content based on parameter, GET parameters are good way to make that happen.
Consider page versioning. In large applications where pages may be kept for historical or business requirements, it allows the same named resource to exist, but should a specific version be needed it can be served as needed. You could just save each version in a different file or you could just create a process that outputs the right content with the correct version changes.
Old Browser Issues
In IE6, AJAX requests would be subject to the browser cache. If you were requesting a service you did not have control over with a URL that didn't change, adding a trivial random string to the URL would circumvent that issue.
Browser Cache Options
If we consider the RFC on HTTP 1.1 for user agent cache settings we also see this:
Many user agents make it possible for users to override the basic
caching mechanisms. For example, the user agent might allow the user
to specify that cached entities (even explicitly stale ones) are
never validated. Or the user agent might habitually add "Cache-
Control: max-stale=3600" to every request. The user agent SHOULD NOT
default to either non-transparent behavior, or behavior that results
in abnormally ineffective caching, but MAY be explicitly configured
to do so by an explicit action of the user.
Altering the URL for versioning of resources could be considered a counter measure to such an issue. Whether you believe it is worthwhile I will leave up to the reader.
Conclusion
There are reasons to add GET parameters to a file request, but realistically the only reason to do that now (writing as of 2012) is to supply input parameters for dynamically generated scripts and overcoming issues where you can't control the cache headers.
Personally I only use for providing input parameters to scripts that dynamically output initialization scripts, but like everything in development there is always some edge case that adds reason.

Related

How can I make sure that my JavaScript files delivered over a CDN are not altered?

I am working on a scenario in which some JavaScript files are to be hosted on a CDN. I want to have some mechanism so that when these file are downloaded on user side, I can ensure that the files were not tampered with and are indeed coming from the specified CDN.
I understand that the task is very easy if I am using SSL, but still, I want to ensure that the right files are served even on HTTP without SSL.
As far as I could search, there is no existing mechanism like digital signature for JavaScript files which is supported across platforms. Perhaps it's not needed?
Is there some method built in to browsers to verify the author of the JavaScript files? Is there anything I can do to do this in a secure way?

As a matter of fact, a feature like this is currently being drafted under the name of Subresource Integrity. Look into the integrity attribute of the <script> tag. While not yet fully adopted across the board, it fulfills just this purpose.
integrity
Contains inline metadata that a user agent can use to verify that a fetched resource has been delivered free of unexpected manipulation. See Subresource Integrity.
Source
Subresource Integrity (SRI) is a security feature that enables browsers to verify that files they fetch (for example, from a CDN) are delivered without unexpected manipulation. It works by allowing you to provide a cryptographic hash that a fetched file must match.
Source
Example:
<script src="https://example.com/example-framework.js"
integrity="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
crossorigin="anonymous"></script>
Note however that this will not protect you against Man in the Middle attacks if you are transferring your resources via plain HTTP. In this case, the hash code can be spoofed by the attacker, rendering the defense against manipulated script files useless.
For this reason, you should always use secure HTTPS connections instead of plain HTTP in addition to the security measures described above.

You're looking for subresource integrity checks.
For example, here's the jQuery CDN snippet:
<script src="https://code.jquery.com/jquery-3.1.0.js"
integrity="sha256-slogkvB1K3VOkzAI8QITxV3VzpOnkeNVsKvtkYLMjfk="
crossorigin="anonymous"></script>

Disclaimer: As always, you should only consider these mechanisms to be of any use when using https, as they can easily be disabled via MitM with http
In addition to the mechanism in the above answers, you can also use the content-security policy http response headers on the parent page.
http://www.html5rocks.com/en/tutorials/security/content-security-policy/
Content-Security-Policy: script-src 'sha256-qznLcsROx4GACP2dm0UCKCzCG-HiZ1guq6ZZDob_Tng='
There are a few things to note here. The sha*- prefix specifies the algorithm used to generate the hash. In the example above, sha256- is used. CSP also supports sha384- and sha512-. When generating the hash do not include the tags. Also capitalization and whitespace matter, including leading or trailing whitespace.
Using Chrome 40 or later you can open DevTools then reload your page. The Console tab will contain error messages with the correct sha256 hash for each of your inline scripts.
This mechanism has been around for quite some time, so the browser support is likely pretty good, just be sure to check.
Additionally, if you want to ensure that older non-compliant browsers are not insecure, you can include a synchronous redirect script at the top of the page that is not allowed by the policy.

There's an important point about what this kind of signing can and cannot do. It can protect the user from hypothetical attacks in which someone modifies your code. It cannot assure your site that your code is the code being executed. In other words, you still can't trust anything that comes to your site from the client.

If your adversary model permits an attacker to modify JavaScript files as they are delivered from a CDN, then your adversary model permits an attacker to modify the referring source as it is delivered to remove any attempt at verification, to alter the source address to other than the CDN, and/or to remove the reference to the JavaScript entirely.
And lets not open the can of worms of how your application can determine whether the user's resolver is or is not correctly resolving to the CDN via HTTP requests (or any other mechanism that doesn't have a verified chain of trust).
/etc/hosts:
# ...
1.2.3.4 vile-pirates.org trustworthy.cdn
# ...

You can ensure this with Subresource Integrity. Many public CDNs include SRI hashes in the embeddable code offered on CDN websites. For example, on PageCDN, when you click on jquery file on the jQuery CDN page, you get the option to either copy the URL or use the script tag that contains SRI hash as below:
<script src="https://pagecdn.io/lib/jquery/3.4.1/jquery.min.js" integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo=" crossorigin="anonymous"></script>
On page load, browser will issue a request for this resource and on completion of request it will match the hash of the received file with the one given as the integrity value in script tag. If both hashes do not match, browser will discard the jquery file.
At the moment, this feature is supported by 91% of browsers worldwide. More details on caniuse.

Prevent local PHP/HTML files preview from executing javascript on server

I have some HTML/PHP pages that include javascript calls.
Those calls points on JS/PHP methods included into a library (PIWIK) stored onto a distant server.
They are triggered using an http://www.domainname.com/ prefix to point the correct files.
I cannot modify the source code of the library.
When my own HTML/PHP pages are locally previewed within a browser, I mean using a c:\xxxx kind path, not a localhost://xxxx one, the distant script are called and do their process.
I don't want this to happen, only allowing those scripts to execute if they are called from a www.domainname.com page.
Can you help me to secure this ?
One can for sure directly bypass this security modifying the web pages on-the-fly with some browser add-on while browsing the real web site, but it's a little bit harder to achieve.
I've opened an issue onto the PIWIK issue tracker, but I would like to secure and protect my web site and the according statistics as soon as possible from this issue, waiting for a further Piwik update.
EDIT
The process I'd like to put in place would be :
Someone opens a page from anywhere than www.domainname.com
> this page calls a JS method on a distant server (or not, may be copied locally),
> this script calls a php script on the distant server
> the PHP script says "hey, from where damn do yo call me, go to hell !". Or the PHP script just do not execute....
I've tried to play with .htaccess for that, but as any JS script must be on a client, it blocks also the legitimate calls from www.domainname.com

Untested, but I think you can use php_sapi_name() or the PHP_SAPI constant to detect the interface PHP is using, and do logic accordingly.
Not wanting to sound cheeky, but your situation sounds rather scary and I would advise searching for some PHP configuration best practices regarding security ;)
Edit after the question has been amended twice:
Now the problem is more clear. But you will struggle to secure this if the JavaScript and PHP are not on the same server.
If they are not on the same server, you will be reliant on HTTP headers (like the Referer or Origin header) which are fakeable.
But PIWIK already tracks the referer ("Piwik uses first-party cookies to keep track some information (number of visits, original referrer, and unique visitor ID)" so you can discount hits from invalid referrers.
If that is not enough, the standard way of being sure that the request to a web service comes from a verified source is to use a standard Cross-Site Request Forgery prevention technique -- a CSRF "token", sometimes also called "crumb" or "nonce", and as this is analytics software I would be surprised if PIWIK does not do this already, if it is possible with their architecture. I would ask them.
Most web frameworks these days have CSRF token generators & API's you should be able to make use of, it's not hard to make your own, but if you cannot amend the JS you will have problems passing the token around. Again PIWIK JS API may have methods for passing session ID's & similar data around.

Original answer
This can be accomplished with a Content Security Policy to restrict the domains that scripts can be called from:
CSP defines the Content-Security-Policy HTTP header that allows you to create a whitelist of sources of trusted content, and instructs the browser to only execute or render resources from those sources.
Therefore, you can set the script policy to self to only allow scripts from your current domain (the filing system) to be executed. Any remote ones will not be allowed.
Normally this would only be available from a source where you get set HTTP headers, but as you are running from the local filing system this is not possible. However, you may be able to get around this with the http-equiv <meta> tag:
Authors who are unable to support signaling via HTTP headers can use tags with http-equiv="X-Content-Security-Policy" to define their policies. HTTP header-based policy will take precedence over tag-based policy if both are present.
Answer after question edit
Look into the Referer or Origin HTTP headers. Referer is available for most requests, however it is not sent from HTTPS resources in the browser and if the user has a proxy or privacy plugin installed it may block this header.
Origin is available for XHR requests only made cross domain, or even same domain for some browsers.
You will be able to check that these headers contain your domain where you will want the scripts to be called from. See here for how to do this with htaccess.
At the end of the day this doesn't make it secure, but as in your own words will make it a little bit harder to achieve.

What files are cached by browsers?

I am developing a web app using the AngularJs-framework. I'm currently trying to figure out how to prevent web caching and I am doing so by setting a hash in front my filenames.
What I have seen thus far is that most people only do this for image-, javascript- and css-files, for instance here:
http://davidtucker.net/articles/automating-with-grunt/#workflowCache
My question is, is there other kind of files that I should take into consideration?
Doesn't web browsers cache html-files as well?

Follow Google's guidelines for Optimizing Caching.
Some key points:
Set caching headers aggressively for all static resources.
For all cacheable resources, we recommend the following settings:
Set Expires to a minimum of one month, and preferably up to one year, in the future. (We prefer Expires over Cache-Control: max-age because it is is more widely supported.)
Do not set it to more than one year in the future, as that violates the RFC guidelines.
If you know exactly when a resource is going to change, setting a shorter expiration is okay. But if you think it "might change soon" but don't know when, you should set a long expiration and use URL fingerprinting (described below). Setting caching aggressively does not "pollute" browser caches: as far as we know, all browsers clear their caches according to a Least Recently Used algorithm; we are not aware of any browsers that wait until resources expire before purging them.
Set the Last-Modified date to the last time the resource was changed: If the Last-Modified date is sufficiently far enough in the past, chances are the browser won't refetch it.
Use fingerprinting to dynamically enable caching: For resources that change occasionally, you can have the browser cache the resource until it changes on the server, at which point the server tells the browser that a new version is available. You accomplish this by embedding a fingerprint of the resource in its URL (i.e. the file path). When the resource changes, so does its fingerprint, and in turn, so does its URL. As soon as the URL changes, the browser is forced to re-fetch the resource. Fingerprinting allows you to set expiry dates long into the future even for resources that change more frequently than that. Of course, this technique requires that all of the pages that reference the resource know about the fingerprinted URL, which may or may not be feasible, depending on how your pages are coded.
Read Google's full article for other points, especially regarding inter-operability.

source map HTTP request does not send cookie header

Regarding source maps, I came across a strange behavior in chromium (build 181620).
In my app I'm using minified jquery and after logging-in, I started seeing HTTP requests for "jquery.min.map" in server log file. Those requests were lacking cookie headers (all other requests were fine).
Those requests are not even exposed in net tab in Developer tools (which doesn't bug me that much).
The point is, js files in this app are only supposed to be available to logged-in clients, so in this setup, the source maps either won't work or I'd have to change the location of source map to a public directory.
My question is: is this a desired behavior (meaning - source map requests should not send cookies) or is it a bug in Chromium?

The String InspectorFrontendHost::loadResourceSynchronously(const String& url) implementation in InspectorFrontendHost.cpp, which is called for loading sourcemap resources, uses the DoNotAllowStoredCredentials flag, which I believe results in the behavior you are observing.
This method is potentially dangerous, so this flag is there for us (you) to be on the safe side and avoid leaking sensitive data.
As a side note, giving jquery.min.js out only to logged-in users (that is, not from a cookieless domain) is not a very good idea to deploy in the production environment. I;m not sure about your idea behind this, but if you definitely need to avoid giving the file to clients not visiting your site, you may resort to checking the Referer HTTP request header.

I encountered this problem and became curious as to why certain authentication cookies were not sent in requests for .js.map files to our application.
In my testing using Chrome 71.0.3578.98, if the SameSite cookie atttribute is set to either strict or lax for a cookie, Chrome will not send that cookie when requesting the .js.map file. When there is no sameSite restriction, the cookie will be sent.
I'm not aware of any specification of the intended behavior.

Cache the execution of a javascript file

As far as i know it's impossible to achieve the following, but only an expert can confirm this:
I've got page number 1 that request for some user and application data as soon as the page loads, page number 2 uses the same script and it would be wasteful to request for the same info.
I know that the browsers caches the script, my question is if it caches the execution (data) as well.
The pages don't share the same layout, so it is not possible to make page number 2 be reloaded via ajax.

The browser doesn't automatically cache the result of the script (that would be seriously weird), but you can, by setting (and checking for) cookies, using the new local storage stuff on modern browser, etc. Note with cookies, though, that they're sent to the server on every request, so result in increased size of requests; if you can use local storage, do.

You can "cache" your data, if you use some kind of client side storage like localStorage (see MDN docu for more details).
The Browser itself may also cache your request internally as the ajax request is no different from any other request made by the browser (html docs, images, etc.). So depending on your exact request (including all parameters) the Browser may actually use a cached version of your request to avoid unnecessary calls. Here, however, the usual restrictions and properties of caching apply, so you can not rely on that behaviour!

Browser will not cache your data automatically if your "page" is a new URL.
But it is certainly possible for you to implement it in several ways
One is to use local storage in new browsers that support HTML5
Another is to write your app such that it is a single page with multiple views and transitions
Use AJAX to replace portions of your page (views).
This technique is becoming increasingly popular.
I highly recommend reading "Javascript Web Applications" by Alex MacCaw to understand javascript MVC and how to use javascript to create a client side (browser based) controller and views and manage caching, state etc in the browser. Also look at frameworks like backbone.js
http://www.amazon.com/JavaScript-Web-Applications-Alex-MacCaw/dp/144930351X/ref=sr_1_1?s=books&ie=UTF8&qid=1332771002&sr=1-1

I would avoid caching the data, except if there's serious performance problems (and, then, rather eliminate the performance problems than caching it). It's premature optimization.
When having the data cached, all kind of scenarios (stale data, deleted data) must be considered (except if the data is static, but then, it's not relevant anyways).

Develop Reference

JavaScript is the programming language of the Web.