HTML5 LocalStorage as cache and single asset request

HTML5 LocalStorage as cache and single asset request - javascript

I would like to know what the limits, cons are of the following concept:
Requirements:
Browser with LocalStorage support.
Serverside asyncronous non-blocking i/o technology.
Lets imagine the following request flow:
client GET / request -> server. We call this stage "greeting", which is an interesting stage because the client now sends (also trough headers ofcourse) :
ip
browser
browser version
language
charset
server -> client (200 OK)
client -> IF OK
-> establish a websocket with the server
once the websocket has been established we enter the "asset stream" stage.
server -> looks for matching assets (stylesheets, images, javascript files, fonts etc.) that are specific for: language, browser, resolution specific assets) and STREAMS them through the websocket.
server -> request (websocket, async stream of assets)
BENEFIT 1. No multiple requests through the wire avoiding DNS lookups etc.
BENEFIT 2. Cache the hell out of these assets in localStorage, which is the following stage.
request -> put in LocalStorage cache.
request -> render website.
I would like to know get some opinions, what might be a good idea, what might not etc.
My first thoughts where:
CDN's not supported in this Architecture
We need one single request to get the javascript / html to start WebSocket etc.
I hope my question was clear.

Interesting approach, it's definitely worth thinking about. Let me be your devil's advocate:
BENEFIT 1. No multiple requests through the wire avoiding DNS lookups
etc.
This is true, although it's only an issue when you're accessing a page/site for the first time. It's also somewhat mitigated by prefetching that modern browsers implement. It's important to remember that browsers will download multiple resources in parallel, which could be faster, and definitely more progressively responsive, than downloading the whole payload in bulk.
With today's technologies you can already serve a full fledged pages and applications with only a handful of resources as far as a web client is concerned (all of them could be gziped!):
HTML
combined and minified CSS files as one resource
same for JS
image sprite
BENEFIT 2. Cache the hell out of these assets in localStorage...
Browsers already cache the hell out of such assets! In addition, there are proven and intelligent techniques to invalidate those caches (which is the second biggest challenge in software development).
Other things to consider:
Don't underestimate CDN. They are life savers when it comes to
latency. Your approach is not latency friendly during the first
request.
AJAX and progressive enhancement approaches can optimize web app
experience to make it feel like a desktop app already.
You will need to re-invent or modify tools like FireBug to work
with one stream containing all resources. No web development can be
imagined nowadays without those tools.
If browsers don't support this approach natively, then you will
still have a hell of a time programming and letting browser know
what your stream contains and how to handle it. By the time you
process the stream and fire all necessary events (in the optimal
sequence!) you might not gain as much benefits as you hoped for.
Good luck!

Related

When do you want more/less http requests? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
It seems like to have you page load fast, you would want a series of small http requests.
If it was one big one, the user might have to wait much longer to see that the page was there at all.
However, I'v heard that minimizing your HTTP requests is more efficient. For example, this is why sprites are created for multiple images.
Is there a general guideline for when you want more and when you want less?

Multiple requests create overhead from both the connection and the headers.
Its like downloading the contents of an FTP site, one site has a single 1GB blob, another has 1,000,000 files totalling a few MB. On a good connection, the 1GB file could be downloaded in a few minutes, but the other is sure to take all day because the transfer negotiation ironically takes more time that the transfer itself.
HTTP is a bit more efficient than FTP, but the principle is the same.
What is important is the initial page load, which needs to be small enough to show some content to the user, then load additional assets outside of the user's view. A page with a thousand tiny images will benefit from a sprite always because the negotiations would not only cause strain to the connection, but also potentially the client computer.

EDIT 2 (25-08-2017)
Another update here; Some time has passed and HTTP2 is (becoming) a real thing. I suggest reading this page for more information about it.
Taken from the second link (at the time of this edit):
It is expected that HTTP/2.0 will:
Substantially and measurably improve end-user perceived latency in
most cases, over HTTP/1.1 using TCP. Address the "head of line
blocking" problem in HTTP.
Not require multiple connections to a server to enable parallelism,
thus improving its use of TCP, especially regarding congestion
control.
Retain the semantics of HTTP/1.1, leveraging existing documentation
(see above), including (but not limited to) HTTP methods, status
codes, URIs, and where appropriate, header fields.
Clearly define how HTTP/2.0 interacts with HTTP/1.x, especially in
intermediaries (both 2->1 and 1->2).
Clearly identify any new extensibility points and policy for their
appropriate use.
The bold sentence (emphasis mine) explains how HTTP2 will handle requests differently from HTTP1. Whereas HTTP1 will create ~8 (differs per browser) simultaneous (or "parallel") connections to fetch as much resources as possible, HTTP2 will re-use the same connection. This reduces overall time and network latency required to create a new connection which in turn, speeds up asset delivery. Additionally, your webserver will also have an easier time keeping ~8 times less connections open. Imagine the gains there :)
HTTP2 is also already quite widely supported in major browsers, caniuse has a table for it :)
EDIT (30-11-2015)
I've recently found this article on the topic 'page speed'. this post is very thorough and it's an interesting read at worst so I'd definitely give it a shot.
Original
There are too many answers to this question but here's my 2cents.
If you want to build a website you'll need few basic things in your tool belt like HTML, CSS, JS - maybe even PHP / Rails / Django (or one of the 10000+ other web frameworks) and MySQL.
The front-end part is basically all that gets sent to the client every request. The server-sided language calculates what needs to be sent which is how you build your website.
Now when it comes to managing assets (images, CSS, JS) you're diving into HTTP land since you'll want to do as few requests as possible. The reason for this is that there is a DNS penalty.
This DNS penalty however does not dictate your entire website of course. It's all about the balance between amount of requests and read- / maintainability for the programmers building the website.
Some frameworks like rails allow you to combine all your JS and CSS files into a big meta-like JS and CSS file before you deploy your application on your server. This ensures that (unless done otherwise) for instance ALL the JS and ALL the CSS used in the website get sent in one request per file.
Imagine having a popup script and something that fetches articles through AJAX. These will be two different scripts and when deploying without combining them - each page load including the popup and article script will send two requests, one for each file respectively.
The reason this is not true is because browsers cache whatever they can whenever they can because in the end browsers and people who build websites want the same thing. The best experience for our users!
This means that during the first request your website will ever answer to a client will cache as much as possible to make consecutive page loads faster in the future.
This is kind of like the browser way of helping websites become faster.
Now when the brilliant browserologists think of something it's more or less our job to make sure it works for the browser. Usually these sorts of things with caching etc are trivial and not hard to implement (thank god for that).
Having a lot of HTTP requests in a page load isn't an end-of-the-world thing since it'll only slow your first request but overall having less requests makes this "DNS-penalty" thing appear less often and will give your users more of an instant page load.
There are also other techniques besides file-merging that you could use to your advantage, when including a javascript you can choose it to be async or defer.
For async it means the script will be loaded and executed in the background whenever it's loaded, regardless of order of inclusion within HTML. This also pauses the HTML parser to execute the script directly.
For defer it's a bit different. It's kind of like async but files will be executed in the correct order and only after the HTML parser is done.
Something you wouldn't want to be "async" would be jQuery for instance, it's the key library for a lot of websites and you'll want to use it in other scripts so using async and not being sure when it's downloaded and executed is not a good plan.
Something you would want to be "async" is a google analytics script for instance, it's effectively optional for the end-user and thus should be labelled as not important - no matter how much you care about the stats your website isn't built for you but by you :)
To get back to requests and blend all this talk about async and deferred together, you can have multiple JS on your page for instance and not have the HTML parser pause to execute some JS - instead you can make this script defer and you'll be fine since the user's HTML and CSS will load while the JS parser waits nicely for the HTML parser.
This is not an example of reducing HTTP requests but it is an example of an alternative solution should you have this "one file" that doesn't really belong anywhere except in a separate request.
You will also never be able to build a perfect website, nor will http://github.com or http://stackoverflow.com but it doesn't matter, they are fast enough for our eyes to not see any crazy flashing content and those things are truly important for end-users.
If you are curious about how much requests is normal - don't. It's different for every website and the purpose of the website, tho I agree some things do go over the top sometimes but it is what it is and all we have to do is support browsers like they are supporting us - Even looking at IE / Edge there since they are also improving (slowly but steady anyways).
I hope my story made sense to you, I did re-read before the post but couldn't find anything while scouting for irregular typing or other kinds of illogical things.
Good luck!

The HTTP protocol is verbose, so the ratio of header size to payload size makes it more efficient to have a larger payload. On top of that, this is still a distributed communication which makes it inherently slow. You also, usually, have to set up and tear down the TCP connection for each request.
Also, I have found, that the small requests repeat data between themselves in an attempt to achieve RESTful purity (like including user data in every response).
The only time small requests are useful is when the data may not be needed at all, so you only load it when needed. However, even then it may be more performant to.simply retrieve it all in one go.

You always want less requests.
The reason we separate any javascript/css code in other files is we want the browser to cache them so other pages on our website will load faster.
If we have a single page website with no common libraries (like jQuery) it's best if you include all the code in your html.

Node Packages vs Browser ones

For example, packages like highlight.js works in node just like in browser. What is considered best practice/faster/ideal?
In this case, highlight.js beautifies a <code> tag with color schemes. Example: In a blog where you use it, there are 2 cases:
Fetch post, show post to user and let the browser/client version
beautify the code, or
Fetch post, pass the contents to the highlight
node function, and show the entire results to the user.
My concerns:
Free up server stress. Show website earlier, since it doesn't need to
parse any data.
Avoid browser incompatibility (not a big deal tbh).
Save some static requests if not using CDN. Maybe faster?
I don't know what else I'm missing or what should be considered. What do you think?
PD: Every day more packages are browser/node compatible, but I think this is the best example I can provide.

The answer to that question can vary, but I would prefer to do it on the client side. Here are some pros and cons of the client-side route:
PRO: The one you mentioned, server load reduced. Remember, you're paying for your server and your client is paying for the connection (sometimes figuratively, as in wait time). If you process server-side, you pay more; if you process client-side, the client pays more. I would let the client pay!
CON: On the other hand, the syntax highlighting will load faster if you process server-side, because you can process once then cache for all subsequent clients.
CON: Browser incompatibility, like you said.
PRO: Semantics. You're augumenting highlihgting on top of the raw data, rather than having the raw data strung up between <span>s. Think about non-JS machines trying to process your page.

GWT server with get() and post() built on client end

This is more of a curiosity really, to see if some one has done anything similar, or if at all it is possible.
I'm working on a project that will get notification through external notifications. Now I could go about doing this by having notifications coming to my server and have a comet setup between my client and server.
BUT
I was wondering if I could write server logic into my client and listen out for notifications from external sources. Immediately one issue I see is, external sources would need callback URL etc, which I dont know if you could do from client side (unless one could use the IP address in that way).
As you can see it is more ideas and discussions if such a thing was possible, this is somewhat inspired by P2P models whereby you wouldn't be mediating things through your central server.
Thanks in advance!

GWT compiles (nearly) Java source into JavaScript, so compiled GWT apps can't do anything that traditional JavaScript running in the browser cannot do. The major advantage of bringing Java into the picture isn't automatic access to any/all JVM classes, but the ability to not only maintain Java sources, which tend to be easier to refactor and test as well as keep consistent with the server, and to compile that statically defined code into JavaScript, performing all kinds of optimizations at compile time that aren't possible for normal JavaScript.
So no, while you can have some code shared by the client (in a browser) and the server (running in a JVM), you can't run Tomcat/Jetty/etc in the browser just by using GWT to compile the java code into JS.
As you point out, even if this was possible, it would be difficult to get different clients to talk back and forth, without also requiring that the browsers can see and connect at will to one another. BitTorrent and Skype have different ways for facilitating this, and currently browsers do not allow anything like this - they are designed to make connections to other servers, not to allow connections to be made to them.
Push notifications from the web server to the browser are probably the best way forward, either through wrapping comet or the like, or through an existing GWT library like Atmosphere (see https://github.com/Atmosphere/atmosphere/tree/master/samples/gwt-demo for a demo).

Why do web applications send HTML over the wire?

This question pertains to web applications. I have very little web app development experience, so might be missing some very obvious points/issues. Please point them out.
As I understand, in most web applications, a web server sends HTML over the wire to a client (browser). This happens every time a HTTP request is made. I feel this is very wasteful of bandwidth.
1) Since browsers can run JavaScript, why don't we just send a JavaScript program which can generate the webpage's HTML content (which the browser then renders).
2) Further a browser might cache the JavaScript program and next time the server only need send the data. The protocol might involve the browser sending the "program version" it has.
Consider an example of a relatively simple website Hacker News [http://news.ycombinator.com]. Let us separate the data (30 posts + their metadata) from its presentation. Assuming 1) above, the server can just send the data (say in JSON) + a JavaScript program to generate HTML. This gist shows the idea. The data for the 30 posts is in JSON [http://www.json.org/js.html] format. For this particular example the data transferred is cut in 1/2 (size of data+JavaScript / size of HTML). Further if browsers can do 2) above, it reduces the data transferred on each visit to 1/4 (size of data / size of HTML). [Note: this analysis is without considering compression; gzip,deflate is very successful in reducing the size of HTML. But isn't prevention better than cure?]
I see atleast the following advantages of this :-
* For most web pages, it will reduce the size of data transferred over the wire.
* Forces web apps to separate data from its presentation.
Disadvantages might include - more complex browsers, time to run the JavaScript program to generate HTML (this might get offset by the reduction in data size).
Now my question is - why are web applications not developed this way, or, why do web applications send HTML over the wire? Surely the web server (sending out HTML) doesn't care about HTML at all, so why should it, first, generate it, and then send it over the wire?

There are a few reasons, some of them historical this is by no means a complete list but just some of my experiences:
HTML predates JS, and a lot of scripts and libraries predate JS
Older browsers (think IE<=6) had rubbish, inconsistent JS engines, their rendering engines were much more consistent in how they treat HTML. So many more libraries and scripts predate consistent JS
It is a nightmare to debug applications written as you suggest if they are not constructed right (we have one at my work, it takes 30 minutes to find where a piece of html is actually generated)
It is a lot more work to do it right - why not use templates or static docs or something much simpler
Its not really a problem - HTML compresses really well
What you suggest is done - its called AJAX (OK, so ajax is more general than this but you all know what i mean)
It simply doesn't work for most plain-text user agents including those used by most search engines. If this page is serving most of your content, its generally a good idea to make it easy for Google to parse

Well the obvious reason on why this is the case is that JavaScript wasn't around when we started sending HTML around, and HTML was an improvement to sending around plaintext documents.
The reason we don't do this now: we eschew complex solutions to problems that aren't really problems.
Average internet connections download nearly 1M bytes per second, and web browsers are quite adept at parsing and starting to render this HTML before it's even all ready to be. They're also great at parallelizing the downloading of resources on the page. If we want to save a few bytes at the cost of some compute cycles, we gzip content before sending it. Problem solved.
And for the record, we do this with AJAX in complex webpages (checkout Github's source browsing for a great example of how awesome this can be).

What you suggest can, and is, done. Remember, web pages used to be static documents. Full blown web-based applications are a relatively recent idea.
I might also suggest that it isn't necessarily more efficient, especially when your pages are sent gzipped.

What you suggest is basically what a JavaScript full stack framework like ExtJS does. You can create rich, data intensive applications without writing any HTML -- well, only enough to reference the necessary .js libraries. The complex DOM needed for layouts, grids, forms etc is all created by the framework.

The simple answer is that HTML is older. Why is C99 not fully implemented with a lot of compilers? They figure 1989 is new enough for them. Also, JavaScript exercises a lot more control over people's browsers than they seem to want. Conditional statements and encoded data pose a security concern, and some people want to keep that can of worms closed to begin with. True, HTML is a very inefficient markup, but the size is insignificant compared to the images you download from the internet. That favicon takes up as much data as the page itself, and it's only 16 pixels across.

A good reason that the server-side code of a web application might do lots of HTML template work on the server side is that in many server environments it's not made easy to bundle up server-side data structures (object graphs) for easy delivery to the client. There may be information kept in server-side data structures that really shouldn't be delivered out to the client. Thus in order to send out a "pure" data-only response, the server would have to trim off sensitive data before delivering out the JSON. That's not an unsolvable problem, but I don't know of many server frameworks that facilitate a solution.
The server has direct, unfettered access to the database and to everything else that makes an application work: user preferences, history, account details, system settings, etc. To build an application that's client-centric for rendering purposes would mean concocting ways of keeping all that information intact and up-to-date on the client. For a lot of applications, that might not be terribly easy.
Finally, it's only relatively recently that it would make sense to trust a browser to provide a stable enough platform for building a long-lived "application environment" as a continually-updating web page. By building a web app such that pages are sometimes completely reloaded, there are lots of little "reboots". That's a cheap and dumb way of keeping a lid on at least some kinds of memory leaks.

Most implementations of sites with heavy Javascript use won't start executing until the DOM has fully loaded; then you'll get every page with 'loading screens' when the page wrapper has downloaded, but none of the content has.
Also, do remember that not all users have Javascript enabled, and not all browsers support high-level Javascript (think mobiles).

I would send HTML in a response if I wanted my application to work without Javascript. I would write HTML rendering code in my server-side language (most of the time not Javascript), which could then be used for two purposes: serving whole HTML pages, and serving bits of HTML in response to XHRs.
If the Javascript code is restricted to things like reporting UI events and replacing innerHTML with server-generated code, I don't have to duplicate any of my application logic across languages/frameworks. This duplication problem is one of the reasons why server-side Javascript is getting people excited.

Should I link to Google API's cloud for JS libraries?

I'm looking for the pros/cons of pulling jQuery & other JS libraries from Google API's cloud as opposed to downloading files and deploying directly.
What say you?
My decision
The likelihood of the lib already cached on the users system is the overriding factor for me, so I'm going with a permalink to googleapis.com (e.g. ajax.googleapis.com/ajax/libs/…). I agree with others here that loss of access to the Google server cloud is a minimal concern.

Con
Users in countries embargoed by the U.S. (e.g. Iran) won't get a response from Google

Pros: It may already be cached on the user's system. Google has big pipes. You don't pay for the bandwidth.
Cons: You now have two different ways for your site to become unavailable: A service interruption on your server or one on Google's server.

I've been looking at the real-world performance of the Google loader for jQuery, particularly, and here's what I've found:
Google's servers are quick and plenty reliable.
They are serving from a CDN, which means if you have a lot of overseas users they'll get much better load times.
They are not serving gzipped files. So they're serving a lot more bytes than they need to.
If you know what you're doing in Apache, Lighttpd, or whatever you're serving files with, you could set your cache headers just like Google's and significantly reduce the amount of data your end user has to download by serving it from your own server. You could also combine your scripts at that point and reduce your overall HTTP requests.
Bottom line: Google's performance is good but not great. If you have many many overseas users then Google is probably better, if your users are mostly US-based and maximum performance is your concern, learn about caching, Etags, gzipping, etc. and serve it yourself.

Pros:
Google's connectivity is probably way better than yours
It's a free CDN (content distribution network)
Your webapp might load faster, since you're using a CDN
Cons:
If/when you need to optimize by repackaging a subset of that third-party JS library, you're on your own, and your webapp might then load slower

In addition to points made by others I'll point out two additional cons:
An additional external HTTP request, so assuming you have a Javascript file of your own (almost certain) that's two minimum instead of one minimum; and
IMHO because jQuery load is async your entire page can load before the library has loaded so the effects that you do on document ready are sometimes visibly noticeable to the user when they are applied. I think this is not a great user experience.

The pros are quite obvious and are in the other answers :
you save bandwidth
google is probably more reliable than your server
probably cached in most browsers (anyone stats on this ?)
But the cons can be very tricky :
If you are using https, you will get an error on most browsers as your certificate isn't valid for google's domain, only yours. This is a major issue for https.

I think what would be cool to do is run A/B tests and see what the latency is to load minified version of jquery from Google's servers vs your server. Hopefully that'll put things into perspective. Chances are the Google server might be faster, but in terms of accepting responsibility of down time, nothing beats hosting it yourself.

Pro:
Google's Ajaxlibs offer a very fine-grained "version control" for the included libraries. You can enforce a certain version (e.g. JQuery 1.3.2) or automatically request the latest version from a certain branch (e.g. JQuery 1.3 series -> would currently deliver 1.3.2, but maybe soon 1.3.3).
The later has definitely has benefits: you'll profit from smaller bugfixes/performance improvements without breaking your scripts/plugins.
Maintaining such a multi-library repository on your own can become quite ressource intensive.

Con:
When afraid of DNS poisoning, or when afraid that some public wireless network might not be trusted, then the non-SSL versions might actually not be served by Google at all, opening up drive-by installation of malware. (But: caching is set to be a full year, so even though many browsers will issue a If-Modified-Since request for cached content when hitting refresh, this might still be a theoretical issue as most users will already have cached the resources while using another network.)
When taking extreme care for your visitors' privacy, you might not want Google to record visits to your site by using their CDN. (Quite theoretical as well, as the same note on caching applies.)

Develop Reference

JavaScript is the programming language of the Web.