When you read about preventing cache on deployment (if a file has changed) the solutions are often to add an incrementing query string (file.js?v=123) or rename the file to a MD5 hash of the file's contents using a build script (for example https://www.npmjs.org/package/grunt-cachebuster).
Why not use Last-Modified or ETag to solve it instead? What are the disadvantages?
Sure, you can use Last-Modified and ETag instead. However, using these cache mechanisms requires a roundtrip HTTP query for these resources every single time. Which can be pretty wasteful. Instead, you really want the browser to keep a local copy and not bother checking in with the server at all for as long as possible.
Last-Modified and ETag mostly just avoid the bandwidth cost of repeatedly transferring the file over the network.
Expires or Cache-Control headers reduce the effective traffic to zero, saving both time and bandwidth; at the cost of risking using outdated data. Uniquely identifying each version of a file helps solve that.
Related
My website has 200k Active users daily
I read an article not to long ago about forcing javascript and PHP to cache files. I have never needed to have my files cached before, but now that i am dealing with a massive amount of data being transferred to and from the server i would like to store some of this data locally on the client side.
I don't know if there are any better ways on doing this but essentially, i am considering writing a library using
HTML5 local storage if its available / manifest
with a fallback of java if its available
with a fallback of silverlight if its available.
I am very interested in pursuing this, preferably in JavaScript.
I would like to know how to cache files using JavaScript
Before anyone thinks i am re-inventing the wheel
(example)
I have several Javascript files which if updated, the browser will not reload the script because it is cached. With version control, i can manage when a user needs to reload cached data.
See caching in HTTP. Basically, for every request you should specify the cache-control header field in the response, indicating when a fresh content will be available. The formal definition of the cache-control header field is as follows:
The Cache-Control general-header field is used to specify directives
that MUST be obeyed by all caching mechanisms along the
request/response chain. The directives specify behavior intended to
prevent caches from adversely interfering with the request or
response. These directives typically override the default caching
algorithms. Cache directives are unidirectional in that the presence
of a directive in a request does not imply that the same directive is
to be given in the response.
The field is usually specified along the lines of
cache-control: private|public, max-age=[, no-cache].
public
Indicates that the response MAY be cached by any cache, even if
it would normally be non-cacheable or cacheable only within a non-
shared cache. (See also Authorization, section 14.8, for additional
details.)
private
Indicates that all or part of the response message
is intended for a single user and MUST NOT be cached by a shared
cache. This allows an origin server to state that the specified parts
of the response are intended for only one user and are not a valid
response for requests by other users. A private (non-shared) cache MAY
cache the response. Note: This usage of the word private only controls
where the response may be cached, and cannot ensure the privacy of the
message content.
no-cache
If the no-cache directive does not specify a field-name, then
a cache MUST NOT use the response to satisfy a subsequent request
without successful revalidation with the origin server. This allows an
origin server to prevent caching even by caches that have been
configured to return stale responses to client requests. If the
no-cache directive does specify one or more field-names, then a cache
MAY use the response to satisfy a subsequent request, subject to any
other restrictions on caching. However, the specified field-name(s)
MUST NOT be sent in the response to a subsequent request without
successful revalidation with the origin server. This allows an origin
server to prevent the re-use of certain header fields in a response,
while still allowing caching of the rest of the response.
For example, cache-control: private, max-age=86400, no-cache directs the client to cache a response and reuse it until 86400 seconds (24 hours) have elapsed. However, things may change before that time elapses. no-cache directive causes a revalidation each time. It is like the browser asking each time may I really present your user with the cached content? Together with the ETag header, you will be able to push important changes to your user before previously cached content expires.
During revalidation, an Etag present in a response is compared with the one provided previously in a request for same resource. If they are same, it reassures that the resource has not changed, thus, cache is really valid. Else if they differ, then the resource content has changed, and the new content will be given as response to the user.
Read more about HTTP caching:
https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en#validating-cached-responses-with-etags
http://www.mobify.com/blog/beginners-guide-to-http-cache-headers/
Meanwhile, note that the use of the Application Cache is mainly applicable if you wish to provide your users with offline content.
In my opinion you would reinvent the wheel. Instead of trying to create a second cache on top of a browser's built-in cache, you should take advantage of a proxy like CloudFlare to handle caching of static assets for you.
As for the issue of cached files not updating, a common technique to force resources to be re-requested is to add a query string parameter containing the file's last modification time (e.g. /js/script.js?1441538979), which normally forces the browser to re-download the file.
Is there any way to browser caching for some css and javascript files only through htaccess file?
I have three css files
http://www.example.com/css/main.css
http://www.example.com/css/star_rating.css
http://www.example.com/js/jquery.autocomplete.css
"main.css" may be chaged day by day. I want caching for star_rating.css and jquery.autocomplete.css only, not for main.css. How can I achieve this?
Also is there any way to caching google adsense javascript file.
https://www.gstatic.com/swiffy/v7.1/runtime.js
http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js
https://pagead2.googlesyndication.com/pagead/osd.js
Set a cache-control header in your HTTP Response, in .htaccess, already answered here: How can i add cache control code to htaccess?
You will need a subsequent rule to reduce the cache interval of main.css, to whatever you need. However, before you go ahead with that...
Personally, I wouldn't bother with such sophisticated granularity, just set your cache time so the resources are only requested once for a typical browsing session (24 hours?). Although some browser caches can be rather large, there's no guarantee a busy user is going to still have your resources cached the next time they visit your site, if they fill their cache, the less frequent/stale items will be removed.
For long-term caching strategies I would just check that ETag support is working on your servers. If a browser already has one of your items cached, it will request with an "If Not Modified" header and provide the ETag it holds for your resource.
If the resource has not been modified (if the ETag values match), your server will respond with a 304 (Not Modified) instead of a 200, a good saving for large resources.
You cannot influence the response headers if hot-linking to the Google AdSense JavaScript files and not hosting them yourself, but they should have sensible cache-control headers (set by Google) anyway I would expect.
I am developing a web app using the AngularJs-framework. I'm currently trying to figure out how to prevent web caching and I am doing so by setting a hash in front my filenames.
What I have seen thus far is that most people only do this for image-, javascript- and css-files, for instance here:
http://davidtucker.net/articles/automating-with-grunt/#workflowCache
My question is, is there other kind of files that I should take into consideration?
Doesn't web browsers cache html-files as well?
Follow Google's guidelines for Optimizing Caching.
Some key points:
Set caching headers aggressively for all static resources.
For all cacheable resources, we recommend the following settings:
Set Expires to a minimum of one month, and preferably up to one year, in the future. (We prefer Expires over Cache-Control: max-age because it is is more widely supported.)
Do not set it to more than one year in the future, as that violates the RFC guidelines.
If you know exactly when a resource is going to change, setting a shorter expiration is okay. But if you think it "might change soon" but don't know when, you should set a long expiration and use URL fingerprinting (described below). Setting caching aggressively does not "pollute" browser caches: as far as we know, all browsers clear their caches according to a Least Recently Used algorithm; we are not aware of any browsers that wait until resources expire before purging them.
Set the Last-Modified date to the last time the resource was changed: If the Last-Modified date is sufficiently far enough in the past, chances are the browser won't refetch it.
Use fingerprinting to dynamically enable caching: For resources that change occasionally, you can have the browser cache the resource until it changes on the server, at which point the server tells the browser that a new version is available. You accomplish this by embedding a fingerprint of the resource in its URL (i.e. the file path). When the resource changes, so does its fingerprint, and in turn, so does its URL. As soon as the URL changes, the browser is forced to re-fetch the resource. Fingerprinting allows you to set expiry dates long into the future even for resources that change more frequently than that. Of course, this technique requires that all of the pages that reference the resource know about the fingerprinted URL, which may or may not be feasible, depending on how your pages are coded.
Read Google's full article for other points, especially regarding inter-operability.
Currently, we're using a method to bust browser cached resources like css and js in a way similar to SE: https://meta.stackexchange.com/questions/112182/how-does-se-determine-the-css-and-js-version-parameter
Anyways, after doing some testing with the HTTP Headers, i'm wondering when this is actually necessary. Is this just a relic left over from the 90s, or are there modern browsers that can't read the Last-Modified or ETags HTTP headers ?
Caching Issues
When you are attempting to server JS or CSS that is volatile and you don't want to/can't (e.g. using a CDN) rely on HTTP cache directive headers to make the browser request the new files. Some older browsers don't respond to HTTP cache directives; so if you are targeting them you have limited options. Baring older browsers some proxy servers strip or invalidate or ignore proxy information because they are buggy or they are acting as aggressive caches. As such using HTTP cache control headers will not work. In this case you are just ensuring your end users don't get odd functionality until they hit F5.
Volatile JS/CSS resources can come from files/resources that are editable through an administration/configuration panel. Some reasons for this are theming, layout editing, or language definition files for internationalization.
HTTP 1.0
There are legacy systems out there that use it. Consider that Oracle's built-in HTTP server (the EGP gateway) in their RDBMS solution still uses it. Some proxies translate 1.1 requests to 1.0. Ancient browsers still only support 1.0, but that should be a relatively non-issue these days.
Whatever the case, HTTP 1.0 uses a different set of control mechanisms that were "primitive" compared to HTTP 1.1's offering. They included a lot of heuristics testing that wasn't specified in the RFC to get caching to work reasonably well. In either case, caching would often cause odd behavior due to stale content being delivered or the same content being requests with no change.
A note on pragma:no-cache
Works only on REQUESTS not RESPONSES; a common thing people don't know. It was meant to keep intermediate systems from caching sensitive information. It still has backwards support in HTTP 1.1, but shouldn't be used because it is deprecated.
...except where Microsoft says IE doesn't do that: http://support.microsoft.com/kb/234067
Input For Generated Content
Yet another reason is JS or CSS that is generated based on input parameters. Just because the URL includes somefile.js does not mean that needs to be a real file on a file system. It could just be JS that is output from a process. Should that process need to output different content based on parameter, GET parameters are good way to make that happen.
Consider page versioning. In large applications where pages may be kept for historical or business requirements, it allows the same named resource to exist, but should a specific version be needed it can be served as needed. You could just save each version in a different file or you could just create a process that outputs the right content with the correct version changes.
Old Browser Issues
In IE6, AJAX requests would be subject to the browser cache. If you were requesting a service you did not have control over with a URL that didn't change, adding a trivial random string to the URL would circumvent that issue.
Browser Cache Options
If we consider the RFC on HTTP 1.1 for user agent cache settings we also see this:
Many user agents make it possible for users to override the basic
caching mechanisms. For example, the user agent might allow the user
to specify that cached entities (even explicitly stale ones) are
never validated. Or the user agent might habitually add "Cache-
Control: max-stale=3600" to every request. The user agent SHOULD NOT
default to either non-transparent behavior, or behavior that results
in abnormally ineffective caching, but MAY be explicitly configured
to do so by an explicit action of the user.
Altering the URL for versioning of resources could be considered a counter measure to such an issue. Whether you believe it is worthwhile I will leave up to the reader.
Conclusion
There are reasons to add GET parameters to a file request, but realistically the only reason to do that now (writing as of 2012) is to supply input parameters for dynamically generated scripts and overcoming issues where you can't control the cache headers.
Personally I only use for providing input parameters to scripts that dynamically output initialization scripts, but like everything in development there is always some edge case that adds reason.
How can I check for a javascript file in user's cache. If he refreshed the page or visits the site after sometime. I need not download that js file again. Does the js files get cleaned up after a site is closed.
Whether a javascript file is cached depends on how your web server is setup, how the users browser is setup and also how any HTTP proxy servers between your server and the user are setup. The only bit you can control is how your server is setup.
If you want the best chance of your javascript being cached then you server needs to be sending the right HTTP headers with the javascript file. Exactly how you do that depends on what web server you are using.
Here are a couple of links that might help:
Apache - http://httpd.apache.org/docs/2.0/mod/mod_expires.html
IIS - http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/0fc16fe7-be45-4033-a5aa-d7fda3c993ff.mspx?mfr=true
To check if a Javascript file is cached works only if the Javascript file is from the same domain afaik.
You could then check it like this:
try {
await fetch("https://YOUR-DOMAIN.org/static/build/js/YOUR-FILE.js",
{method:'Head',cache:'only-if-cached',mode:'same-origin'});
} catch (error) {
console.error(error);
}
You will get a 200 back if it is cached, otherwise an error is thrown.
The browser will automatically look after what it has in it's own cache. There are various mechanisms you can use for controlling it however.
Look into the various HTTP caching headers, such as:
Last-Modified
Expires
ETag
The Expires header is the most critical of these when it comes to client-side caching. If you set a far future Expires header (eg, 10 years) the browser will not (in theory) not look to the server again for that file. Of course then you need a method of changing the file name when the contents of the the file change. Most people manage this by adding a build number to the file path.
The browser will take care of caching for you.
When the cache is emptied depends partly on the browser settings and partly on the headers you send. If you set an Expires header, then the browser shouldn't re-request the file until it has expired. Reading about HTTP Headers might help you.