Reducing Google App Engine costs

Reducing Google App Engine costs - javascript

We have a piece of Javascript which is served to millions of browsers daily.
In order to handle the load, we decided to go for Google App Engine.
One particular thing about this piece of Javascript is that it is (very) slightly different per company using our service.
So far we are handling this by serving everything through main.py which basically goes:
- Read the JS static file and print it
- Print custom code
We do this on every load, and costs are starting to really add-up.
Apart from having a static version of the file per customer, is there any other way that you could think about in order to reduce our bill? Would using memcache instead of reading a file reduce the price in any way?
Thanks a lot.

I'm assuming you're paying a lot in instance hours. Reading from the GAE filesystem is rather slow. So the easiest way to optimize is only read from the static file once on your instance startup, keep the js file in memory (ie a global variable), and print it.
Secondly, make sure your js is being cached by the customers so when they reload your page, you don't have to serve the js to them again unnecessarily.
Next way is to serve the js file as a static file if possible. This would save you some money if the js file is big and you're consuming CPU cycles just printing it. In this case have your handler that generates the HTML insert the appropriate URL to the appropriate js file instead of regenerating the entire js each time. You'll save money because you won't get charged instance hours for files served as static files, plus they can get cached in the edge cache (GAE's CDN), and you won't get billed anything at all for them.

Here are some ways to more optimize it without using a cdn.
Yes do add the memcache layer to cache all the whole output and add an additional instance cache which uses the memory of the instance. This can simply be done by adding a module global dict and adding your key/vals cache there. But you can also use a LRUCaching libraries so you don't overload your instances.
Finally the cheapest would be to use a cdn and point the origin to your app engine app, if your output doesn't require modification too frequently you could cache these results for short or long time.
Here is a complete blog post about instance caching by Ben Kamens:
http://bjk5.com/post/2320616424/layer-caching-in-app-engine-with-memcache-and-cachepy

If you use Javascript by serving static files (I assume that what you do now).
You can use memcache (it reduce cost since handler will server faster - less instances).
You can use webcache to allow cache simple example (it reduce re-reads - not instances).
You can support advanced http headers enter link description here (need to rewrite Google Static Files Handler) (it reduce re-reads + speed up re-reads if not changed - not instance or faster instances and less instances).

Related

Single JS script or multiple scripts? [duplicate]

I'm used to working with Java in which (as we know) each object is defined in its own file (generally speaking). I like this. I think it makes code easier to work with and manage.
I'm beginning to work with javascript and I'm finding myself wanting to use separate files for different scripts I'm using on a single page. I'm currently limiting myself to only a couple .js files because I'm afraid that if I use more than this I will be inconvenienced in the future by something I'm currently failing to foresee. Perhaps circular references?
In short, is it bad practice to break my scripts up into multiple files?

There are lots of correct answers, here, depending on the size of your application and whom you're delivering it to (by whom, I mean intended devices, et cetera), and how much work you can do server-side to ensure that you're targeting the correct devices (this is still a long way from 100% viable for most non-enterprise mortals).
When building your application, "classes" can reside in their own files, happily.
When splitting an application across files, or when dealing with classes with constructors that assume too much (like instantiating other classes), circular-references or dead-end references ARE a large concern.
There are multiple patterns to deal with this, but the best one, of course is to make your app with DI/IoC in mind, so that circular-references don't happen.
You can also look into require.js or other dependency-loaders. How intricate you need to get is a function of how large your application is, and how private you would like everything to be.
When serving your application, the baseline for serving JS is to concatenate all of the scripts you need (in the correct order, if you're going to instantiate stuff which assumes other stuff exists), and serve them as one file at the bottom of the page.
But that's baseline.
Other methods might include "lazy/deferred" loading.
Load all of the stuff that you need to get the page working up-front.
Meanwhile, if you have applets or widgets which don't need 100% of their functionality on page-load, and in fact, they require user-interaction, or require a time-delay before doing anything, then make loading the scripts for those widgets a deferred event. Load a script for a tabbed widget at the point where the user hits mousedown on the tab. Now you've only loaded the scripts that you need, and only when needed, and nobody will really notice the tiny lag in downloading.
Compare this to people trying to stuff 40,000 line applications in one file.
Only one HTTP request, and only one download, but the parsing/compiling time now becomes a noticeable fraction of a second.
Of course, lazy-loading is not an excuse for leaving every class in its own file.
At that point, you should be packing them together into modules, and serving the file which will run that whole widget/applet/whatever (unless there are other logical places, where functionality isn't needed until later, and it's hidden behind further interactions).
You could also put the loading of these modules on a timer.
Load the baseline application stuff up-front (again at the bottom of the page, in one file), and then set a timeout for a half-second or so, and load other JS files.
You're now not getting in the way of the page's operation, or of the user's ability to move around. This, of course is the most important part.

Update from 2020: this answer is very old by internet standards and is far from the full picture today, but still sees occasional votes so I feel the need to provide some hints on what has changed since it was posted. Good support for async script loading, HTTP/2's server push capabilities, and general browser optimisations to the loading process over the years, have all had an impact on how breaking up Javascript into multiple files affects loading performance.
For those just starting out with Javascript, my advice remains the same (use a bundler / minifier and trust it to do the right thing by default), but for anybody finding this question who has more experience, I'd invite them to investigate the new capabilities brought with async loading and server push.
Original answer from 2013-ish:
Because of download times, you should always try to make your scripts a single, big, file. HOWEVER, if you use a minifier (which you should), they can combine multiple source files into one for you. So you can keep working on multiple files then minify them into a single file for distribution.
The main exception to this is public libraries such as jQuery, which you should always load from public CDNs (more likely the user has already loaded them, so doesn't need to load them again). If you do use a public CDN, always have a fallback for loading from your own server if that fails.
As noted in the comments, the true story is a little more complex;
Scripts can be loaded synchronously (<script src="blah"></script>) or asynchronously (s=document.createElement('script');s.async=true;...). Synchronous scripts block loading other resources until they have loaded. So for example:
<script src="a.js"></script>
<script src="b.js"></script>
will request a.js, wait for it to load, then load b.js. In this case, it's clearly better to combine a.js with b.js and have them load in one fell swoop.
Similarly, if a.js has code to load b.js, you will have the same situation no matter whether they're asynchronous or not.
But if you load them both at once and asynchronously, and depending on the state of the client's connection to the server, and a whole bunch of considerations which can only be truly determined by profiling, it can be faster.
(function(d){
var s=d.getElementsByTagName('script')[0],f=d.createElement('script');
f.type='text/javascript';
f.async=true;
f.src='a.js';
s.parentNode.insertBefore(f,s);
f=d.createElement('script');
f.type='text/javascript';
f.async=true;
f.src='b.js';
s.parentNode.insertBefore(f,s);
})(document)
It's much more complicated, but will load both a.js and b.js without blocking each other or anything else. Eventually the async attribute will be supported properly, and you'll be able to do this as easily as loading synchronously. Eventually.

There are two concerns here: a) ease of development b) client-side performance while downloading JS assets
As far as development is concerned, modularity is never a bad thing; there are also Javascript autoloading frameworks (like requireJS and AMD) you can use to help you manage your modules and their dependencies.
However, to address the second point, it is better to combine all your Javascript into a single file and minify it so that the client doesn't spend too much time downloading all your resources. There are tools (requireJS) that let you do this as well (i.e., combine all your dependencies into a single file).

It's depending on the protocol you are using now. If you are using http2, I suggest you to split the js file. If you use http, I advise you to use minified js file.
Here is the sample of website using http and http2
Thanks, hope it helps.

It does not really matter. If you use the same JavaScript in multiple files, it can surely be good to have a file with the JavaScript to fetch from. So you just need to update the script from one place.

Referencing separate JS files vs one JS file

Which would result in greater speed/efficiency: Referencing one JavaScript file for all files in the directory OR referencing a different JavaScript file for each file in the directory?
So basically, referencing the same JavaScript file in all web pages vs a unique JavaScript file for every webpage.
Note: I thought that referencing the single file would be slower as there is code in there that is obsolete to some files, thus running useless code and causing the file to run less efficient.

There are tradeoffs involved so you may ultimately need to measure your specific circumstances to be sure. But, I'll explain some of the tradeoffs.
If you have giant amounts of data or giant amounts of code that are only used in one or a few pages, then you will probably want to separate that out into its own file just so you can ONLY load it, initialize it and have it take memory when it's actually needed. But, note with the amount of memory in modern computers (even phones these days), the data or code has to be pretty large to warrant a separate download.
Other than item 1, you pretty much always want to optimize for maximum caching efficiency. Retrieving a file (even a larger file than needed) from the cache is so massively much faster than retrieving any file (even a small file) over the network that you really want to optimize for caching. And, the time to retrieve these files generally dwarfs any of the JS parse time (CPUs are pretty fast these days) so triggering an extra download to save some JS parse time is unlikely to be faster.
The best way to optimize for caching is to have most of your pages reference the same common script files. Then, they get loaded once when the viewer first hits your site and all subsequent loads come right from the browser cache. This is ideal. This caching efficiency easily overcomes having some unused or untriggered code in the master file that is not used in some pages.
Lots of small downloads (even from the cache) is less efficient than one larger download. More separate requests generally just isn't as efficient for either the browser or the server. So, combining JS files into larger concatenated files is generally a good thing.
There are limits to all of this. If you had completely separate code for 100 separate pages all concatenated together and each piece of code would search the DOM for multiple page elements (and not find them 99% of the time), then that's probably not an efficient way to do things either. But, usually you can make your shared code smarter than that by breaking things into categories based on a high level class name. So, for example, based on the presence of a class name on the <body> tag, you would then run only part of the initialization code, skipping the rest because its classification is not present. So, when combining code, much of which won't be relevant on any given page, it's wise to be smart in how you decide what initialization code in the shared file to actually run.

You need to measure for your specific case - as every site/page have its own balance between loading less files/loading extra unnecessary scripts (same apply to CSS too).
Generally single file is faster in HTTP v1 as there are restrictions on total number of parallel downloads, HTTP v2 should be removing the difference.

Should I split my javascript into multiple files?

I'm used to working with Java in which (as we know) each object is defined in its own file (generally speaking). I like this. I think it makes code easier to work with and manage.
I'm beginning to work with javascript and I'm finding myself wanting to use separate files for different scripts I'm using on a single page. I'm currently limiting myself to only a couple .js files because I'm afraid that if I use more than this I will be inconvenienced in the future by something I'm currently failing to foresee. Perhaps circular references?
In short, is it bad practice to break my scripts up into multiple files?

There are lots of correct answers, here, depending on the size of your application and whom you're delivering it to (by whom, I mean intended devices, et cetera), and how much work you can do server-side to ensure that you're targeting the correct devices (this is still a long way from 100% viable for most non-enterprise mortals).
When building your application, "classes" can reside in their own files, happily.
When splitting an application across files, or when dealing with classes with constructors that assume too much (like instantiating other classes), circular-references or dead-end references ARE a large concern.
There are multiple patterns to deal with this, but the best one, of course is to make your app with DI/IoC in mind, so that circular-references don't happen.
You can also look into require.js or other dependency-loaders. How intricate you need to get is a function of how large your application is, and how private you would like everything to be.
When serving your application, the baseline for serving JS is to concatenate all of the scripts you need (in the correct order, if you're going to instantiate stuff which assumes other stuff exists), and serve them as one file at the bottom of the page.
But that's baseline.
Other methods might include "lazy/deferred" loading.
Load all of the stuff that you need to get the page working up-front.
Meanwhile, if you have applets or widgets which don't need 100% of their functionality on page-load, and in fact, they require user-interaction, or require a time-delay before doing anything, then make loading the scripts for those widgets a deferred event. Load a script for a tabbed widget at the point where the user hits mousedown on the tab. Now you've only loaded the scripts that you need, and only when needed, and nobody will really notice the tiny lag in downloading.
Compare this to people trying to stuff 40,000 line applications in one file.
Only one HTTP request, and only one download, but the parsing/compiling time now becomes a noticeable fraction of a second.
Of course, lazy-loading is not an excuse for leaving every class in its own file.
At that point, you should be packing them together into modules, and serving the file which will run that whole widget/applet/whatever (unless there are other logical places, where functionality isn't needed until later, and it's hidden behind further interactions).
You could also put the loading of these modules on a timer.
Load the baseline application stuff up-front (again at the bottom of the page, in one file), and then set a timeout for a half-second or so, and load other JS files.
You're now not getting in the way of the page's operation, or of the user's ability to move around. This, of course is the most important part.

Update from 2020: this answer is very old by internet standards and is far from the full picture today, but still sees occasional votes so I feel the need to provide some hints on what has changed since it was posted. Good support for async script loading, HTTP/2's server push capabilities, and general browser optimisations to the loading process over the years, have all had an impact on how breaking up Javascript into multiple files affects loading performance.
For those just starting out with Javascript, my advice remains the same (use a bundler / minifier and trust it to do the right thing by default), but for anybody finding this question who has more experience, I'd invite them to investigate the new capabilities brought with async loading and server push.
Original answer from 2013-ish:
Because of download times, you should always try to make your scripts a single, big, file. HOWEVER, if you use a minifier (which you should), they can combine multiple source files into one for you. So you can keep working on multiple files then minify them into a single file for distribution.
The main exception to this is public libraries such as jQuery, which you should always load from public CDNs (more likely the user has already loaded them, so doesn't need to load them again). If you do use a public CDN, always have a fallback for loading from your own server if that fails.
As noted in the comments, the true story is a little more complex;
Scripts can be loaded synchronously (<script src="blah"></script>) or asynchronously (s=document.createElement('script');s.async=true;...). Synchronous scripts block loading other resources until they have loaded. So for example:
<script src="a.js"></script>
<script src="b.js"></script>
will request a.js, wait for it to load, then load b.js. In this case, it's clearly better to combine a.js with b.js and have them load in one fell swoop.
Similarly, if a.js has code to load b.js, you will have the same situation no matter whether they're asynchronous or not.
But if you load them both at once and asynchronously, and depending on the state of the client's connection to the server, and a whole bunch of considerations which can only be truly determined by profiling, it can be faster.
(function(d){
var s=d.getElementsByTagName('script')[0],f=d.createElement('script');
f.type='text/javascript';
f.async=true;
f.src='a.js';
s.parentNode.insertBefore(f,s);
f=d.createElement('script');
f.type='text/javascript';
f.async=true;
f.src='b.js';
s.parentNode.insertBefore(f,s);
})(document)
It's much more complicated, but will load both a.js and b.js without blocking each other or anything else. Eventually the async attribute will be supported properly, and you'll be able to do this as easily as loading synchronously. Eventually.

There are two concerns here: a) ease of development b) client-side performance while downloading JS assets
As far as development is concerned, modularity is never a bad thing; there are also Javascript autoloading frameworks (like requireJS and AMD) you can use to help you manage your modules and their dependencies.
However, to address the second point, it is better to combine all your Javascript into a single file and minify it so that the client doesn't spend too much time downloading all your resources. There are tools (requireJS) that let you do this as well (i.e., combine all your dependencies into a single file).

It's depending on the protocol you are using now. If you are using http2, I suggest you to split the js file. If you use http, I advise you to use minified js file.
Here is the sample of website using http and http2
Thanks, hope it helps.

It does not really matter. If you use the same JavaScript in multiple files, it can surely be good to have a file with the JavaScript to fetch from. So you just need to update the script from one place.

How to handle javascript & css files across a site?

I have had some thoughts recently on how to handle shared javascript and css files across a web application.
In a current web application that I am working on, I got quite a large number of different javascripts and css files that are placed in an folder on the server. Some of the files are reused, while others are not.
In a production site, it's quite stupid to have a high number of HTTP requests and many kilobytes of unnecessary javascript and redundant css being loaded. The solution to that is of course to create one big bundled file per page that only contains the necessary information, which then is minimized and sent compressed (GZIP) to the client.
There's no worries to create a bundle of javascript files and minimize them manually if you were going to do it once, but since the app is continuously maintained and things do change and develop, it quite soon becomes a headache to do this manually while pushing out new updates that features changes to javascripts and/or css files to production.
What's a good approach to handle this? How do you handle this in your application?

I built a library, Combres, that does exactly that, i.e. minify, combine etc. It also automatically detects changes to both local and remote JS/CSS files and push the latest to the browser. It's free & open-source. Check this article out for an introduction to Combres.

I am dealing with the exact same issue on a site I am launching.
I recently found out about a project named SquishIt (see on GitHub). It is built for the Asp.net framework. If you aren't using asp.net, you can still learn about the principles behind what he's doing here.
SquishIt allows you to create named "bundles" of files and then to render those combined and minified file bundles throughout the site.

CSS files can be categorized and partitioned to logical parts (like common, print, vs.) and then you can use CSS's import feature to successfully load the CSS files. Reusing of these small files also makes it possible to use client side caching.
When it comes to Javascript, i think you can solve this problem at server side, multiple script files added to the page, you can also dynamically generate the script file server side but for client side caching to work, these parts should have different and static addresses.

I wrote an ASP.NET handler some time ago that combines, compresses/minifies, gzips, and caches the raw CSS and Javascript source code files on demand. To bring in three CSS files, for example, it would look like this in the markup...
<link rel="stylesheet" type="text/css"
href="/getcss.axd?files=main;theme2;contact" />
The getcss.axd handler reads in the query string and determines which files it needs to read in and minify (in this case, it would look for files called main.css, theme2.css, and contact.css). When it's done reading in the file and compressing it, it stores the big minified string in server-side cache (RAM) for a few hours. It always looks in cache first so that on subsequent requests it does not have to re-compress.
I love this solution because...
It reduces the number of requests as much as possible
No additional steps are required for deployment
It is very easy to maintain
Only down-side is that all the style/script code will eventually be stored within server memory. But RAM is so cheap nowadays that it is not as big of a deal as it used to be.
Also, one thing worth mentioning, make sure that the query string is not succeptible to any harmful path manipulation (only allow A-Z and 0-9).

What you are talking about is called minification.
There are many libraries and helpers for different platforms and languages to help with this. As you did not post what you are using, I can't really point you towards something more relevant to yourself.
Here is one project on google code - minify.
Here is an example of a .NET Http handler that does all of this on the fly.

How to deal with browser cache?

What are your tricks on getting the caching part of web application just right?
Make the expiry date too long and we'll have a lot of stale caches, too short and we risk the servers overloaded with unnecessary requests.
How to make sure that all changes will refresh all cache?
How to embed SVN revision into code/url?
Does having multiple version side-by-side really help to address version mismatch problem?

Look at the minify project. It's written in PHP but you could use it as a blueprint for any language.
Key features:
a config file to combine & minify several js or css files into one
always uses the last modified date of the last modified file in a config group as a URL parameter
example resource might look like
<script type="text/javascript" src="/min/g=js1&1248185458"></script>
which would fetch the 'js1' group of javascript files in your configuration with the version number "1248185458" which is really just the last modified date converted to epoch time.
When you put updated js files on your production servers, they'll have a new modified date which automatically becomes a new version number - no stale caches, no manual versioning.
It's a very cool project with some really well thought out ideas about optimization and caching. I've modified the process slightly to insert YUI compressor into the build process. You can optimize it even more by preventing the last modified lookups from the browser by modifying your server's headers (here and here).

I think you are on the right track with putting version numbers on your js css files. And you may want to use a build tool to put all of this together for you like http://ant.apache.org/ or http://nant.sourceforge.net/

Couple of ways to deal with this issue:
Following the clue given about using version #s, if that presents difficulties for you in your build environment it is also just as effective to put a URL parameter at the end of your URL. The browser clients will treat each URL with a different version parameter as URL no in their cache and will download the file again. The servers won't care that the parameter is there, for static content
So, for example, http://mydomain.com/js/main.js can be included in your HTML as http://mydomain.com/js/main.js?v1.5. It might be easier for you to pass version #s into your serverside scripts and append them onto your clientside include URLs.
The second method I've seen work well is to use a controller serverside to deliver your code. Facebook makes use of this. You will see includes in script tags that end in ".php" all the time.
E.g.
<script src="http://static.ak.connect.facebook.com/js/api_lib/v0.4/FeatureLoader.js.php" type="text/javascript"></script>
Their backend determines what JS needs to be sent to the client based on the environment that was sent up in the request.

Develop Reference

JavaScript is the programming language of the Web.