Javascript force file caching

Javascript force file caching - javascript

My website has 200k Active users daily
I read an article not to long ago about forcing javascript and PHP to cache files. I have never needed to have my files cached before, but now that i am dealing with a massive amount of data being transferred to and from the server i would like to store some of this data locally on the client side.
I don't know if there are any better ways on doing this but essentially, i am considering writing a library using
HTML5 local storage if its available / manifest
with a fallback of java if its available
with a fallback of silverlight if its available.
I am very interested in pursuing this, preferably in JavaScript.
I would like to know how to cache files using JavaScript
Before anyone thinks i am re-inventing the wheel
(example)
I have several Javascript files which if updated, the browser will not reload the script because it is cached. With version control, i can manage when a user needs to reload cached data.

See caching in HTTP. Basically, for every request you should specify the cache-control header field in the response, indicating when a fresh content will be available. The formal definition of the cache-control header field is as follows:
The Cache-Control general-header field is used to specify directives
that MUST be obeyed by all caching mechanisms along the
request/response chain. The directives specify behavior intended to
prevent caches from adversely interfering with the request or
response. These directives typically override the default caching
algorithms. Cache directives are unidirectional in that the presence
of a directive in a request does not imply that the same directive is
to be given in the response.
The field is usually specified along the lines of
cache-control: private|public, max-age=[, no-cache].
public
Indicates that the response MAY be cached by any cache, even if
it would normally be non-cacheable or cacheable only within a non-
shared cache. (See also Authorization, section 14.8, for additional
details.)
private
Indicates that all or part of the response message
is intended for a single user and MUST NOT be cached by a shared
cache. This allows an origin server to state that the specified parts
of the response are intended for only one user and are not a valid
response for requests by other users. A private (non-shared) cache MAY
cache the response. Note: This usage of the word private only controls
where the response may be cached, and cannot ensure the privacy of the
message content.
no-cache
If the no-cache directive does not specify a field-name, then
a cache MUST NOT use the response to satisfy a subsequent request
without successful revalidation with the origin server. This allows an
origin server to prevent caching even by caches that have been
configured to return stale responses to client requests. If the
no-cache directive does specify one or more field-names, then a cache
MAY use the response to satisfy a subsequent request, subject to any
other restrictions on caching. However, the specified field-name(s)
MUST NOT be sent in the response to a subsequent request without
successful revalidation with the origin server. This allows an origin
server to prevent the re-use of certain header fields in a response,
while still allowing caching of the rest of the response.
For example, cache-control: private, max-age=86400, no-cache directs the client to cache a response and reuse it until 86400 seconds (24 hours) have elapsed. However, things may change before that time elapses. no-cache directive causes a revalidation each time. It is like the browser asking each time may I really present your user with the cached content? Together with the ETag header, you will be able to push important changes to your user before previously cached content expires.
During revalidation, an Etag present in a response is compared with the one provided previously in a request for same resource. If they are same, it reassures that the resource has not changed, thus, cache is really valid. Else if they differ, then the resource content has changed, and the new content will be given as response to the user.
Read more about HTTP caching:
https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en#validating-cached-responses-with-etags
http://www.mobify.com/blog/beginners-guide-to-http-cache-headers/
Meanwhile, note that the use of the Application Cache is mainly applicable if you wish to provide your users with offline content.

In my opinion you would reinvent the wheel. Instead of trying to create a second cache on top of a browser's built-in cache, you should take advantage of a proxy like CloudFlare to handle caching of static assets for you.
As for the issue of cached files not updating, a common technique to force resources to be re-requested is to add a query string parameter containing the file's last modification time (e.g. /js/script.js?1441538979), which normally forces the browser to re-download the file.

Related

Server side rendering issue over a CDN

I have recently launched a site that uses server side rendering (with next.js). The site has login functionality where if an authentication cookie is present from a user's request then it will render a logged in view for that user on the server and return the rendered logged in view to the users browser. If the user does not have an authentication cookie present then it renders a logged out view on the server and returns that to the users browser.
Currently it works great but I have hit a snag when trying to serve the site over a CDN. My issue is that the CDN will cache a servers response to speed it up so what will happen is the first user to hit the website on the CDN will have their logged in view cached and returned to the browser. This in turn means because it is cached then other users who hit the site also see the other users logged in view as opposed to their own as that's what has been cached by the CDN. Not ideal.
I'm trying to think of what the best way to solve this problem would be. Would love to hear any suggestions of the best practice way to get around this?
One way I have thought of would be to potentially always return a logged out view request on the first page visit and so the authentication/ logging in client side and from then on always do the authentication on the server. This method would only work however if next.js only does server side rendering on the first request and let's subsequent requests do all rendering on the client and I'm not sure if that's the case.
Thanks and would love all the help/ suggestions I could get!
UPDATE
From what I can gather so far from the answers it seems that the best way for me to get around this will be to serve a CDN cached logged out view to every user when they first visit the site. I can then log them in manually from the frontend if an authentication token is present in their cookies. All pages after the first page they land on will have to return a logged in view - is this possible with Next.js? Would this be a good way to go about it? Here is a summary of these steps:
The user lands on any webpage
A request is made to the server for that page along with the users cookies.
Because this is the first page they are visitng the cookies are ignored and a "logged out" view is returned to the users browser (that will have been cached in the CDN)
The frontend then loads a logged out view. Once loaded it checks for an authentication token makes a call to the API to log them in if there is one present
Any other page navigation after that is returned from the server as a "logged in" view (ie the authentication cookie is not ignored this time). This avoids having to do step 4 again which would be annoying for the user on every page.

For well-behaved caching proxies (which your CDN should be), there are two response headers you should use:
Cache-Control: private
Setting this response header means that intermediary proxies are not allowed to cache the response. (The browser can still cache it, if it's appropriate to do so. If you want to prevent any caching, you'd use no-store instead.)
See also: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
Vary: Cookie
This response header indicates that the data in the response is dependent on the Cookie request header. That is, if my request has the header Cookie: asdf and your request has the header Cookie: zxcv, then the requests are considered different, and will be cached independently. Note that using this response header may drastically impact your caching if cookies are used for anything on your domain... and I'd bet that they are.
See also: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary
An alternative...
A common alternative approach these days is to handle all the user facing dynamic data client-side. That way, you can make a request to some API server which has no caching CDN at all. The page is then filled client-side with the data needed. The static parts of the site are served directly from the CDN.

All CDNs cache and distribute data rely on the cache header in the HTTP response. You should consider these two simple notes to get the best performance without miss the power of CDN.
1. No-cache header for dynamic content (HTML response, APIs,...):
You should make sure all dynamic contents (HTML response, APIs,...) cache header response is Cache-Control: no-cache.
If you're using next.js can use a custom server (express.js) to serve your app and full control on the response header or you can change next.js config.
2. Set cache header for static content (js, CSS, images, ...)
You should make sure all statics contents (js, CSS, images, ...) cache header response is Cache-Control: max-age=31536000.
If you're using next.js in every build all assets have a unique name and you can set a long-term cache for static assets.

Try to add cache control header to your Auth required pages.
Cache-Control: Private
The private response directive indicates that a resource is user specific—it can still be cached, but only on a client device. For example, a web page response marked as private can be cached by a desktop browser, but not a content delivery network (CDN).

What I understand from your question is that when a user logged in, the logged-in view is getting cached on the CDN and when the user is logged out then also the site is shown in the logged-in view from the CDN cache.
There are some solutions to this issue are as follows:
Set some TTL(Time To Live) for the CDN so that it will automatically invalidate the cache data after a specific time.
As you want to deliver the site fastly means you want to achieve low latency. For this you can do one thing, just cache the big files from the website like images, videos, documents, etc to the CDN. And don't cache the entire website there. Now, every time the user request comes then the site will be served from the regular server and the media files will be taken from the CDN. In this way, you can achieve low latency. And as the media files are taken from the CDN cache, the website code will load fastly and the site will be served quickly. In this way, the authentication will be done on the server-side.
Another solution would be to invalidate the cookie and the authentication after a certain time of inactivity. And after that when a user comes then the site should render a logged-out view.

How to force no-caching of specific external resources?

My website displays traffic cams amongst other resources. They change every minute. I have to use the &nonce= to override the caching in order to get an update every minute. However, ALL of those get cached and the storage profile (specifically image caching) gets into gigabytes quickly.
As the traffic cam resources are out of my control (and they don't specify no-cache, but they DO prevent CORS), I see these options to prevent caching of images (but keep for other resources).
Specify (what?) in the request so that it's not cached.
Using xhr to specify no-cache and createObjectURL would fail b/c of CORS. And, can't bypass CORS b/c it's a PWA, not meant to have a local proxy server.
Override the response (headers!) with some middleware? (which?)
Clear only images in the cache every minute. (how?).
A better option I'm missing?
(Using straight js, no jquery).

Leverage browser caching for some css and javascript file only

Is there any way to browser caching for some css and javascript files only through htaccess file?
I have three css files
http://www.example.com/css/main.css
http://www.example.com/css/star_rating.css
http://www.example.com/js/jquery.autocomplete.css
"main.css" may be chaged day by day. I want caching for star_rating.css and jquery.autocomplete.css only, not for main.css. How can I achieve this?
Also is there any way to caching google adsense javascript file.
https://www.gstatic.com/swiffy/v7.1/runtime.js
http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js
https://pagead2.googlesyndication.com/pagead/osd.js

Set a cache-control header in your HTTP Response, in .htaccess, already answered here: How can i add cache control code to htaccess?
You will need a subsequent rule to reduce the cache interval of main.css, to whatever you need. However, before you go ahead with that...
Personally, I wouldn't bother with such sophisticated granularity, just set your cache time so the resources are only requested once for a typical browsing session (24 hours?). Although some browser caches can be rather large, there's no guarantee a busy user is going to still have your resources cached the next time they visit your site, if they fill their cache, the less frequent/stale items will be removed.
For long-term caching strategies I would just check that ETag support is working on your servers. If a browser already has one of your items cached, it will request with an "If Not Modified" header and provide the ETag it holds for your resource.
If the resource has not been modified (if the ETag values match), your server will respond with a 304 (Not Modified) instead of a 200, a good saving for large resources.
You cannot influence the response headers if hot-linking to the Google AdSense JavaScript files and not hosting them yourself, but they should have sensible cache-control headers (set by Google) anyway I would expect.

Is there a way to use browser cache for AngularJS JSON requests ($http/$resource)?

We're developing an app with AngularJS and RESTful services. The data returned by services is changed infrequently and I very much would like to cache responses for a period of time. I'm setting Cache-Control: no-transform, max-age=604800 in the response.
Is there a way to have AngularJS JSON requests ($http/$resource) respect browser cache instead of using completely parallel built-in AngularJS cache (http://www.metaltoad.com/blog/angularjs-vs-browser-http-cache) or angular-cache library (http://angular-data.pseudobry.com/documentation/api/angular-cache)? From what I can see watching the network, by default $http requests are ignoring Cache-Control headers.

The browser will respect the cache time set by the response for that particular asset. Any subsequent GET should look to the cache until the timeout is reached.
Its possible you have devtools ignoring this.

Where I was stumbling was page reloads and the way they behave differently.
Let's divide use cases into two:
1. Page hit: simply going to a previously visited page.
Here I see what you see: most of the content is retrieved from cache. Chrome shows it better than Firefox/Firebug. Firebug simply does not show cache hits in the Network panel.
2. Regular page reloads.
Pretty much all browsers have two shortcuts to refresh a page: regular reloads (Ctrl+R in Chrome/Windows) and
reloads ignoring cache (Shift+F5 in Chrome/Windows). I'm talking about regular reloads since if cache is ignored, there is nothing to discuss.
What seems to be happening is that browser issues If-Modified-Since requests for all resources on the page. The server then responds with 304 Not Modified for static resources and browser gets them from cache.
The issue is that we were not handling If-Modified-Since in our services. We simply were setting Cache-Control with the expiration age.
The server code update that started to handle If-Modified-Since resolved the issue.
BTW, here is a background article on browser caching that I found quite useful: https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers

Elegant methods for caching search results from RESTful service?

I have a RESTful web service which I access from the browser using JavaScript. As an example, say that this web service returns a list of all the Message resources assigned to me when I send a GET request to /messages/me. For performance reasons, I'd like to cache this response so that I don't have to re-fetch it every time I visit my Manage Messages web page. The cached response would expire after 5 minutes.
If a Message resource is created "behind my back", say by the system admin, it's possible that I won't know about it for up to 5 minutes, until the cached search response expires and is re-fetched. This is acceptable, because it creates no confusion for me.
However if I create a new Message resource which I know should be part of the search response, it becomes confusing when it doesn't appear on my Manage Messages page immediately. In general, when I knowingly create/delete/update a resource that invalidates a cached search response, I need that cached response to be expired/flushed immediately.
The core problem which I can't figure out:
I see no simple way of connecting the task of creating/deleting/updating a resource with the task of expiring the appropriate cached responses. In this example it seems simple, I could manually expire the cached search response whenever I create/delete/update a(ny) Message resource. But in a more complex system, keeping track of which search responses to expire under what circumstances will get clumsy quickly.

Use E-Tag and If-None-Match headers to ensure that the client is always accessing the most up-to-date information.
The down-side to this is you will always make a call to the server to find out if anything had changed. The entire message will not be re-transmitted if nothing changed, and the server will/should simply respond back with a 304 Not Modified response in that case. If the content had changed, then the new message(s) will be transmitted as a response.
If the server is responsive (10-50 ms), then most users with a decent latency (50-500ms) should see no noticeable difference.
This increases the load on the server as it will have to verify for each request whether the received E-Tag matches with the current E-Tag for that resource. Clients never assume that a resource is valid/stale/expired, they always ping the server and find out.

To quote Phil Karlton: "There are only two hard problems in Computer Science: cache invalidation and naming things."
If you are using a comprehensive data access layer, that would be the place to handle cache invalidation (although it's still not easy). You'd just tie in some cache invalidation logic to your logic for saving a Message so it clears the search cache for the assignee of the message.

The browser cache should automatically invalidate the cache when you do a POST to the same URI that you did a GET from. See this article, particularly the section on POST invalidation.

The simplest solution would be using a sever-side cache (like EhCache, for example) :)
You will have less problems with consistency (as you wouldn't need to push changes to your JavaScript) and expiration.

Develop Reference

JavaScript is the programming language of the Web.