What are your tricks on getting the caching part of web application just right?
Make the expiry date too long and we'll have a lot of stale caches, too short and we risk the servers overloaded with unnecessary requests.
How to make sure that all changes will refresh all cache?
How to embed SVN revision into code/url?
Does having multiple version side-by-side really help to address version mismatch problem?
Look at the minify project. It's written in PHP but you could use it as a blueprint for any language.
Key features:
a config file to combine & minify several js or css files into one
always uses the last modified date of the last modified file in a config group as a URL parameter
example resource might look like
<script type="text/javascript" src="/min/g=js1&1248185458"></script>
which would fetch the 'js1' group of javascript files in your configuration with the version number "1248185458" which is really just the last modified date converted to epoch time.
When you put updated js files on your production servers, they'll have a new modified date which automatically becomes a new version number - no stale caches, no manual versioning.
It's a very cool project with some really well thought out ideas about optimization and caching. I've modified the process slightly to insert YUI compressor into the build process. You can optimize it even more by preventing the last modified lookups from the browser by modifying your server's headers (here and here).
I think you are on the right track with putting version numbers on your js css files. And you may want to use a build tool to put all of this together for you like http://ant.apache.org/ or http://nant.sourceforge.net/
Couple of ways to deal with this issue:
Following the clue given about using version #s, if that presents difficulties for you in your build environment it is also just as effective to put a URL parameter at the end of your URL. The browser clients will treat each URL with a different version parameter as URL no in their cache and will download the file again. The servers won't care that the parameter is there, for static content
So, for example, http://mydomain.com/js/main.js can be included in your HTML as http://mydomain.com/js/main.js?v1.5. It might be easier for you to pass version #s into your serverside scripts and append them onto your clientside include URLs.
The second method I've seen work well is to use a controller serverside to deliver your code. Facebook makes use of this. You will see includes in script tags that end in ".php" all the time.
E.g.
<script src="http://static.ak.connect.facebook.com/js/api_lib/v0.4/FeatureLoader.js.php" type="text/javascript"></script>
Their backend determines what JS needs to be sent to the client based on the environment that was sent up in the request.
Related
I am providing an infrastructure that require the developer to include only one simple java script file that then includes allot others, for instance DOJO toolkit. Later DOJO is loading all\some of my infrastructure files.
When i'm updating the version i'm simply telling my clients to include the version number in the <script src="...?ver=1.2"> so it will not take the files from the cache.
My problem is that (this simple file is being reloaded but) the rest of my files that being loaded by DOJO are still being loaded from the cache.
Is there a way to do the same technick, or maybe other way, to force my browser take the files from the server at this time, and not from the cache ?
As usual, I am posting question and answering myself. But sharing the answer so it will use others and not deleting the post.
Using dojoConfig property cacheBust is the solution.
dojoConfig = {
...
cacheBust="v=1.2.3",
...
}
In DOJO documentation it's stated that when you send true it will add the time as query string. Which means that every load will be from the server and never from the cache. but what we can do is adding constant string as i wrote above v=1.2.3 and this string will be added as the query string as well, and giving us more power on when the version will be loaded from cache or server
We have a piece of Javascript which is served to millions of browsers daily.
In order to handle the load, we decided to go for Google App Engine.
One particular thing about this piece of Javascript is that it is (very) slightly different per company using our service.
So far we are handling this by serving everything through main.py which basically goes:
- Read the JS static file and print it
- Print custom code
We do this on every load, and costs are starting to really add-up.
Apart from having a static version of the file per customer, is there any other way that you could think about in order to reduce our bill? Would using memcache instead of reading a file reduce the price in any way?
Thanks a lot.
I'm assuming you're paying a lot in instance hours. Reading from the GAE filesystem is rather slow. So the easiest way to optimize is only read from the static file once on your instance startup, keep the js file in memory (ie a global variable), and print it.
Secondly, make sure your js is being cached by the customers so when they reload your page, you don't have to serve the js to them again unnecessarily.
Next way is to serve the js file as a static file if possible. This would save you some money if the js file is big and you're consuming CPU cycles just printing it. In this case have your handler that generates the HTML insert the appropriate URL to the appropriate js file instead of regenerating the entire js each time. You'll save money because you won't get charged instance hours for files served as static files, plus they can get cached in the edge cache (GAE's CDN), and you won't get billed anything at all for them.
Here are some ways to more optimize it without using a cdn.
Yes do add the memcache layer to cache all the whole output and add an additional instance cache which uses the memory of the instance. This can simply be done by adding a module global dict and adding your key/vals cache there. But you can also use a LRUCaching libraries so you don't overload your instances.
Finally the cheapest would be to use a cdn and point the origin to your app engine app, if your output doesn't require modification too frequently you could cache these results for short or long time.
Here is a complete blog post about instance caching by Ben Kamens:
http://bjk5.com/post/2320616424/layer-caching-in-app-engine-with-memcache-and-cachepy
If you use Javascript by serving static files (I assume that what you do now).
You can use memcache (it reduce cost since handler will server faster - less instances).
You can use webcache to allow cache simple example (it reduce re-reads - not instances).
You can support advanced http headers enter link description here (need to rewrite Google Static Files Handler) (it reduce re-reads + speed up re-reads if not changed - not instance or faster instances and less instances).
I know this question was asked quite a few times and the most common answers were:
Auto versioning using the .htaaccess file.
Although this is not at all recommended, using a version number as a query parameter
For example: '/scripts/script1?v=1.0.0'. This will cause the browser not to cache the file but does the job.
I am handling some post release issues and since we don't follow a software project life cycle as such, we update the site as and when the issues are tested and fixed. So, we may have to update the site several times a day sometimes versus no updates for a week.
I am not sure if there is a way I can still take the benefit of caching and at the same time don't need to have the users to refresh the page/clear cache to see latest changes.
Is there a way I can implement the .htaaccess solution in asp.net if that's what I need to do?
I really appreciate any help.
Here is the solution I've used for css files, but should work fine for JS:
In the htaccess have a rule:
RewriteRule ^(.*)_ver_.*(\..*)$ $1$2 [NC,L]
That takes a file name such as "Style_ver_12345.css" and rewrites it to Style.css.
Then when when you include the file append the LastWriteTime of the actual file (File.GetLastWriteTime(filePath).Ticks.ToString() is how I do it) as the version number. An example file name that I would have is Style_ver_634909902200823172.css
This will ensure that any change in the file will immediately cause a new version number, while the physical files does not need to have a different name, and the file will be cached by the browser.
The user would still have to refresh the page, but they wouldn't have to clear their cache. If you needed to, maybe you could force a refresh by having an ajax call that would compare the version number of the script loaded with the version number on the server. A newer version on the server could then force a refresh.
I've come across sites with CSS and JS filenames like this:
css_pbm0lsQQJ7A7WCCIMgxLho6mI_kBNgznNUWmTWcnfoE.css
What's causing this or why would you do it?
Edit: Some of each answer below could apply to this scenario, but given the sites I've found this on, serving/caching methods seems the most accurate.
Versioning and making sure that correct version of static resources is being served.
If you have a high traffic website and you serve lots of users you will have several layers of caching: CDN, caching headers on files, etc.
Sometimes it can be hard invalidating the caches with the same filename. Server might pass the correct headers, but client might disregard them and still load cached version. Serving different file name prevents that and ensures that you have correct version of css/js and other static resources.
As you can probably tell, no human came up with that name.
Typically it's
the result of combining multiple CSS files into a single file. This is
done for performance reasons (requesting one file is faster than requesting two.)
The name is likely to be the result of a deterministic algorithm on the
input (i.e. a hash), such that if you perform the combination again but haven't changed the CSS, the output will be given the same name.
When the content (CSS) changes, the name of the output file will change.
This is useful because it makes it impossible for a browser to cache
the old version.
It looks like the file was generated, server-side, for minification.
The website you're visiting might have had multiple CSS files (perhaps combined with #import statements) and JS files (jQuery, jQuery UI, jQuery plugins, some custom code, etc) - rather than have the developer manually minify and combine the files the server might do it for them (ASP.NET 4.5 does this, for example). In this case it uses an arbitrary (random? GUID-based?) filename to ensure it doesn't conflict with anything.
It may be the technology used by the website.
i.e. if you use gwt (it's some java compiled in javscript) or something else that preprocess some code and outputs javascript, you will likely to get weird filenames.
I have had some thoughts recently on how to handle shared javascript and css files across a web application.
In a current web application that I am working on, I got quite a large number of different javascripts and css files that are placed in an folder on the server. Some of the files are reused, while others are not.
In a production site, it's quite stupid to have a high number of HTTP requests and many kilobytes of unnecessary javascript and redundant css being loaded. The solution to that is of course to create one big bundled file per page that only contains the necessary information, which then is minimized and sent compressed (GZIP) to the client.
There's no worries to create a bundle of javascript files and minimize them manually if you were going to do it once, but since the app is continuously maintained and things do change and develop, it quite soon becomes a headache to do this manually while pushing out new updates that features changes to javascripts and/or css files to production.
What's a good approach to handle this? How do you handle this in your application?
I built a library, Combres, that does exactly that, i.e. minify, combine etc. It also automatically detects changes to both local and remote JS/CSS files and push the latest to the browser. It's free & open-source. Check this article out for an introduction to Combres.
I am dealing with the exact same issue on a site I am launching.
I recently found out about a project named SquishIt (see on GitHub). It is built for the Asp.net framework. If you aren't using asp.net, you can still learn about the principles behind what he's doing here.
SquishIt allows you to create named "bundles" of files and then to render those combined and minified file bundles throughout the site.
CSS files can be categorized and partitioned to logical parts (like common, print, vs.) and then you can use CSS's import feature to successfully load the CSS files. Reusing of these small files also makes it possible to use client side caching.
When it comes to Javascript, i think you can solve this problem at server side, multiple script files added to the page, you can also dynamically generate the script file server side but for client side caching to work, these parts should have different and static addresses.
I wrote an ASP.NET handler some time ago that combines, compresses/minifies, gzips, and caches the raw CSS and Javascript source code files on demand. To bring in three CSS files, for example, it would look like this in the markup...
<link rel="stylesheet" type="text/css"
href="/getcss.axd?files=main;theme2;contact" />
The getcss.axd handler reads in the query string and determines which files it needs to read in and minify (in this case, it would look for files called main.css, theme2.css, and contact.css). When it's done reading in the file and compressing it, it stores the big minified string in server-side cache (RAM) for a few hours. It always looks in cache first so that on subsequent requests it does not have to re-compress.
I love this solution because...
It reduces the number of requests as much as possible
No additional steps are required for deployment
It is very easy to maintain
Only down-side is that all the style/script code will eventually be stored within server memory. But RAM is so cheap nowadays that it is not as big of a deal as it used to be.
Also, one thing worth mentioning, make sure that the query string is not succeptible to any harmful path manipulation (only allow A-Z and 0-9).
What you are talking about is called minification.
There are many libraries and helpers for different platforms and languages to help with this. As you did not post what you are using, I can't really point you towards something more relevant to yourself.
Here is one project on google code - minify.
Here is an example of a .NET Http handler that does all of this on the fly.