What files are cached by browsers?

What files are cached by browsers? - javascript

I am developing a web app using the AngularJs-framework. I'm currently trying to figure out how to prevent web caching and I am doing so by setting a hash in front my filenames.
What I have seen thus far is that most people only do this for image-, javascript- and css-files, for instance here:
http://davidtucker.net/articles/automating-with-grunt/#workflowCache
My question is, is there other kind of files that I should take into consideration?
Doesn't web browsers cache html-files as well?

Follow Google's guidelines for Optimizing Caching.
Some key points:
Set caching headers aggressively for all static resources.
For all cacheable resources, we recommend the following settings:
Set Expires to a minimum of one month, and preferably up to one year, in the future. (We prefer Expires over Cache-Control: max-age because it is is more widely supported.)
Do not set it to more than one year in the future, as that violates the RFC guidelines.
If you know exactly when a resource is going to change, setting a shorter expiration is okay. But if you think it "might change soon" but don't know when, you should set a long expiration and use URL fingerprinting (described below). Setting caching aggressively does not "pollute" browser caches: as far as we know, all browsers clear their caches according to a Least Recently Used algorithm; we are not aware of any browsers that wait until resources expire before purging them.
Set the Last-Modified date to the last time the resource was changed: If the Last-Modified date is sufficiently far enough in the past, chances are the browser won't refetch it.
Use fingerprinting to dynamically enable caching: For resources that change occasionally, you can have the browser cache the resource until it changes on the server, at which point the server tells the browser that a new version is available. You accomplish this by embedding a fingerprint of the resource in its URL (i.e. the file path). When the resource changes, so does its fingerprint, and in turn, so does its URL. As soon as the URL changes, the browser is forced to re-fetch the resource. Fingerprinting allows you to set expiry dates long into the future even for resources that change more frequently than that. Of course, this technique requires that all of the pages that reference the resource know about the fingerprinted URL, which may or may not be feasible, depending on how your pages are coded.
Read Google's full article for other points, especially regarding inter-operability.

Related

Why can't download file from cdn when browser cache that file (web)

CORS node on CDN. We already allow CORS on the CDN, so the issue has nothing to do with the server.
It is blocked from the browser cache before being pushed to the remote server. When we open a page, the first time the images are stored in the browser cache. Then we open the image in preview mode and click download, now the browser is too "smart" to detect this image is already in the cache and get it straight from the cache.
Can someone explain why the cache is blocked even though the server has CORS enabled.?

Caching a complex topic, I suggest looking at proper documentation.
Documentation
HTTP caching
The performance of web sites and applications can be significantly improved by reusing previously fetched resources. Web caches reduce latency and network traffic and thus lessen the time needed to display resource representations. HTTP caching makes Web sites more responsive.
Types of caches
Caching is a technique that stores a copy of a given resource and serves it back when requested. When a web cache has a requested resource in its store, it intercepts the request and returns a copy of the stored resource instead of redownloading the resource from the originating server. This achieves several goals: it eases the load of the server because it doesn’t need to serve all clients itself, and it improves performance by being closer to the client. In other words, it takes less time to transmit the resource back. For a web site, web caching is a major component in achieving high performance. However, the cache functionality must be configured properly, as not all resources stay identical forever: it's important to cache a resource only until it changes, not longer.

How to force no-caching of specific external resources?

My website displays traffic cams amongst other resources. They change every minute. I have to use the &nonce= to override the caching in order to get an update every minute. However, ALL of those get cached and the storage profile (specifically image caching) gets into gigabytes quickly.
As the traffic cam resources are out of my control (and they don't specify no-cache, but they DO prevent CORS), I see these options to prevent caching of images (but keep for other resources).
Specify (what?) in the request so that it's not cached.
Using xhr to specify no-cache and createObjectURL would fail b/c of CORS. And, can't bypass CORS b/c it's a PWA, not meant to have a local proxy server.
Override the response (headers!) with some middleware? (which?)
Clear only images in the cache every minute. (how?).
A better option I'm missing?
(Using straight js, no jquery).

Leverage browser caching for some css and javascript file only

Is there any way to browser caching for some css and javascript files only through htaccess file?
I have three css files
http://www.example.com/css/main.css
http://www.example.com/css/star_rating.css
http://www.example.com/js/jquery.autocomplete.css
"main.css" may be chaged day by day. I want caching for star_rating.css and jquery.autocomplete.css only, not for main.css. How can I achieve this?
Also is there any way to caching google adsense javascript file.
https://www.gstatic.com/swiffy/v7.1/runtime.js
http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js
https://pagead2.googlesyndication.com/pagead/osd.js

Set a cache-control header in your HTTP Response, in .htaccess, already answered here: How can i add cache control code to htaccess?
You will need a subsequent rule to reduce the cache interval of main.css, to whatever you need. However, before you go ahead with that...
Personally, I wouldn't bother with such sophisticated granularity, just set your cache time so the resources are only requested once for a typical browsing session (24 hours?). Although some browser caches can be rather large, there's no guarantee a busy user is going to still have your resources cached the next time they visit your site, if they fill their cache, the less frequent/stale items will be removed.
For long-term caching strategies I would just check that ETag support is working on your servers. If a browser already has one of your items cached, it will request with an "If Not Modified" header and provide the ETag it holds for your resource.
If the resource has not been modified (if the ETag values match), your server will respond with a 304 (Not Modified) instead of a 200, a good saving for large resources.
You cannot influence the response headers if hot-linking to the Google AdSense JavaScript files and not hosting them yourself, but they should have sensible cache-control headers (set by Google) anyway I would expect.

When is filename based browser cachebusting actually needed ?

Currently, we're using a method to bust browser cached resources like css and js in a way similar to SE: https://meta.stackexchange.com/questions/112182/how-does-se-determine-the-css-and-js-version-parameter
Anyways, after doing some testing with the HTTP Headers, i'm wondering when this is actually necessary. Is this just a relic left over from the 90s, or are there modern browsers that can't read the Last-Modified or ETags HTTP headers ?

Caching Issues
When you are attempting to server JS or CSS that is volatile and you don't want to/can't (e.g. using a CDN) rely on HTTP cache directive headers to make the browser request the new files. Some older browsers don't respond to HTTP cache directives; so if you are targeting them you have limited options. Baring older browsers some proxy servers strip or invalidate or ignore proxy information because they are buggy or they are acting as aggressive caches. As such using HTTP cache control headers will not work. In this case you are just ensuring your end users don't get odd functionality until they hit F5.
Volatile JS/CSS resources can come from files/resources that are editable through an administration/configuration panel. Some reasons for this are theming, layout editing, or language definition files for internationalization.
HTTP 1.0
There are legacy systems out there that use it. Consider that Oracle's built-in HTTP server (the EGP gateway) in their RDBMS solution still uses it. Some proxies translate 1.1 requests to 1.0. Ancient browsers still only support 1.0, but that should be a relatively non-issue these days.
Whatever the case, HTTP 1.0 uses a different set of control mechanisms that were "primitive" compared to HTTP 1.1's offering. They included a lot of heuristics testing that wasn't specified in the RFC to get caching to work reasonably well. In either case, caching would often cause odd behavior due to stale content being delivered or the same content being requests with no change.
A note on pragma:no-cache
Works only on REQUESTS not RESPONSES; a common thing people don't know. It was meant to keep intermediate systems from caching sensitive information. It still has backwards support in HTTP 1.1, but shouldn't be used because it is deprecated.
...except where Microsoft says IE doesn't do that: http://support.microsoft.com/kb/234067
Input For Generated Content
Yet another reason is JS or CSS that is generated based on input parameters. Just because the URL includes somefile.js does not mean that needs to be a real file on a file system. It could just be JS that is output from a process. Should that process need to output different content based on parameter, GET parameters are good way to make that happen.
Consider page versioning. In large applications where pages may be kept for historical or business requirements, it allows the same named resource to exist, but should a specific version be needed it can be served as needed. You could just save each version in a different file or you could just create a process that outputs the right content with the correct version changes.
Old Browser Issues
In IE6, AJAX requests would be subject to the browser cache. If you were requesting a service you did not have control over with a URL that didn't change, adding a trivial random string to the URL would circumvent that issue.
Browser Cache Options
If we consider the RFC on HTTP 1.1 for user agent cache settings we also see this:
Many user agents make it possible for users to override the basic
caching mechanisms. For example, the user agent might allow the user
to specify that cached entities (even explicitly stale ones) are
never validated. Or the user agent might habitually add "Cache-
Control: max-stale=3600" to every request. The user agent SHOULD NOT
default to either non-transparent behavior, or behavior that results
in abnormally ineffective caching, but MAY be explicitly configured
to do so by an explicit action of the user.
Altering the URL for versioning of resources could be considered a counter measure to such an issue. Whether you believe it is worthwhile I will leave up to the reader.
Conclusion
There are reasons to add GET parameters to a file request, but realistically the only reason to do that now (writing as of 2012) is to supply input parameters for dynamically generated scripts and overcoming issues where you can't control the cache headers.
Personally I only use for providing input parameters to scripts that dynamically output initialization scripts, but like everything in development there is always some edge case that adds reason.

Really deleting cookies with javascript

The way to delete cookies in javascript is to set the expiry date to be in the past. Now this doesn't actually delete the cookie, at least in Firefox. It just means the cookie will be deleted on browser close.
This is a problem for us: we have a product that involves archiving web pages from potentially many sites, with all this content stored on our server. And to make sure that pages render properly we include all js as well. However often cookies are set by js, and given that the page is cached on our server, these cookies are set under our domain.
So over time cookies from dozens of archived sites build up under our domain. And eventually the Cookie header exceeds the max content length, resulting in an HTTP 400 error code.
And because our clients are mostly in corporate environments they never reboot their machines or close their browsers: they can be left on for months. So this "soft" delete doesn't work, at least not reliably.
Is there any way to physically remove cookies intra-session in javscript? Or alternatively, is there any way to stop them being set?

It's not possible. Period. I've been struggling with this for several weeks without finding a solution.
Whoever invented the cookie getter/setter should be %insert_painful_punishment_here%.
Particularly Internet Exploder is a beast when it comes to deleting cookies. I can't remember the exact issue, but I think it involved https and cookie names containing ;.
All I can offer is a workaround: Send a response body with your 400 response, something like 'please restart your browser'.

In addition to setting the expiration in the past, set the value to an empty string. This will at least reduce the size of the cookie immediately.
I would think that cookies should be deleted immediately in all browsers. For example, when I log out of a website, Firefox does not require me to close my browser to delete the cookie that shows that I am logged into the site. If this isn't happening, I suggest you look into Firefox bugs and possibly open a new one with them.
In the meantime, I'd look at my web server and see if it is possibly to set the max content length to something higher than it already is.

You could overwrite the cookie with a new one.

"It is because we are NOT using iframes that we have this issue. The cached page is being rendered by our server, so any cookies get set under our domain." --OP
If you have no control over the javascript that is setting the cookies (which seems extremely odd, why do you not have control?), you can constantly read and empty the cookie, dumping the data to another larger database (preferably server-side, or perhaps HTML5 client storage).

Develop Reference

JavaScript is the programming language of the Web.