Page Load alternative in a Pure HTML AJAX Website

Page Load alternative in a Pure HTML AJAX Website - javascript

I am working on a pure HTML website, all pages are HTML with no relation to any server side code.
Basically every request to the server is made using AJAX, I send data from forms, I process this data in Handlers, then I return a JSON string that will be processed back on the client side.
Let's say the page is loaded with parameters in the URL, something like question.html?id=1. Earlier, I used to read this query string on Page Load method, then read data from the database and so on...
Now, since its pure HTML pages, I'm trying to think of an approach that will allow me to do the same, I have an idea but its 99% a bad idea.
The idea is to read URL parameters using JS (after the page has loaded), and then make an AJAX request, and then fetch the data and show them on the page. I know that instead of having 1 request to the server (Web Forms), we are now having 2 Requests, the first request to get the page, and the second request is the AJAX request. And of course this has lots of delays, since the page will be loaded at the beginning without the actual data that I need inside it.
Is my goal impossible or there's a mature approach out there?

Is my goal impossible or there's a mature approach out there?
Lately there are a good handful of JavaScript frameworks designed around this very concept ("single page app") of having a page load up without any data pre-loaded in it, and accessing all of the data over AJAX. Some examples of such frameworks are AngularJS, Backbone.js, Ember.js, and Knockout. So no, this is not at all impossible. I recommend learning about these frameworks and others to find one that seems right for the site you are making.
The idea is to read URL parameters using JS (after the page has loaded), and then make an AJAX request, and then fetch the data and show them on the page.
This sounds like a fine idea.
Here is an example of how you can use JavaScript to extract the query parameters from the current page's URL.
I know that instead of having 1 request to the server (Web Forms), we are now having 2 Requests, the first request to get the page, and the second request is the AJAX request. And of course this has lots of delays, since the page will be loaded at the beginning without the actual data that I need inside it.
Here is why you should not worry about this:
A user's browser will generally cache the HTML file and associated JavaScript files, so the second time they visit your site, the browser will send requests to check whether the files have been modified. If not, the server will send back a short message simply saying that they have not been modified and the files will not need to be transmitted again.
The AJAX response will only contain the data that the page needs and none of the markup. So retrieving a page generated on the server would involve more data transfer than an approach that combines a cacheable .html file and an AJAX request.
So the total load time should be less even if you make two requests instead of one. If you are concerned that the user will see a page with no content while the AJAX data is loading, you can (a) have the page be completely blank while the data is loading (as long as it's not too slow, this should not be a problem), or (b) Throw up a splash screen to tell the user that the page is loading. Again, users should generally not have a problem with a small amount of load time at the beginning if the page is speedy after that.

I think you are overthinking it. I'd bet that the combined two calls that you are worried about are going to run in roughly the same amount of time as the single webforms page_load would if you coded otherwise - only difference now being that the initial page load is going to be really fast (because you are only loading a lightweight, html/css/images page with no slowdown for running any server code.
Common solution would be to then have a 'spinner' or some sort (an animated GIF) that gives the user an visual indication that the page isn't done loading while your ajax calls wait to complete.
Watch a typical page load done from almost any major website in any language, you are going to see many, many requests that make up a single page being loaded, wether it be pulling css/images from a CDN, js from a CDN, loading google analytics, advertisements from ad networks etc. Trying to get 100% of your page to load in a single call is not really a goal you should be worried about.

I don't think the 2-requests is a "bad idea" at all. In fact there is no other solution if you want to use only static HTML + AJAX (that is the moderm approach to web development since this allow to reuse AJAX request for other non-HTML clients like Android or iOS native apps). Also performance is very relative. If your client can cache the first static HTML it will be much faster compared to server-generated approach even if two requests are needed. Just use a network profiler to convince yourself.
What you can do if you don't want the user to notice any lag in the GUI is to use a generic script that shows a popup hiding/blocking all the full window (maybe with a "please wait") until the second request with the AJAX is received and a "data-received" (or similar) event is triggered in the AJAX callback.
EDIT:
I think that probably what you need is to convert your website into a webapp using a manifest to list "cacheable" static content. Then query your server only for dynamic (AJAX) data:
http://diveintohtml5.info/offline.html
(IE 10+ also support Webapp manifests)
Moderm browsers will read the manifest to know whether they need to reload static content or not. Using a webapp manifest will also allow to integrate your web site within the OS. For example, on Android it will be listed in the recent-task list (otherwise only your browser, not your app is shown) and the user can add a shorcut to the desktop.

So, you have static HTMLs and user server side code only in handlers? Why you can't have one ASP .Net page (generated on server side) to load initial data and all other data will be processed using AJAX requests?

if its possible to use any backed logic to determine what to load on server side, that will be easy to get the data.
say for example if you to load json a int he page cc.php by calling the page cc.php?json=a, you can determine from the PHP code to put a json into the page it self and use as object in your HTML page
if you are using query string to read and determine, what to load you have to make two calls.

The primary thing you appear to want is what is known as a router.
Since you seem to want to keep things fairly bare metal, the traditional answer would be Backbone.js. If you want even faster and leaner then the optimised Backbone fork ExoSkeleton might be just the ticket but it doesn't have the following that Backbone proper has. Certainly better than cooking your own thing.
There are some fine frameworks around, like Ember and Angular which have large user bases. I've been using Ember recently for a fairly complex application as it has a very sophisticated router, but based on my experiences I'm more aligned with the architecture available today in React/Flux (not just React but the architectural pattern of Flux).
React/Flux with one of the add-on router components will take you very far (Facebook/Instrgram) and in my view offers a superior architecture for web applications than traditional MVC; it is currently the fastest framework for updating the DOM and also allows isomorphic applications (run on both client and server). This represents the so called "holy grail" of web apps as it sends the initial rendered page from the server and avoids any delays due to framework loading, subsequent interactions then use ajax.
Above all, checkout some of the frameworks and find what works best for you. You may find some value comparing framework implementations over at TodoMVC but in my view the Todo app is far too simple and contrived to really show how the different frameworks shine.
My own evolution has been jQuery -> Backbone -> Backbone + Marionette -> Ember -> React/Flux so don't expect to get a good handle on what matters most to you until you have used a few frameworks in anger.

The main issue is from a UX / UI point of view.
Once you get your data from the server (in Ajax) after the page has been loaded - you'll get a "flickering" behavior, once the data is injected into the page.
You can solve this by presenting the page only after the data has arrived, OR use a pre-loader of some kind - to let the user know that the page is still getting its data, but then you'll have a performance issue as you already mentioned.
The ideal solution in this case is to get the "basic" data that the page needs (on the first request to the server), and manipulate it via the client - thus ease-in the "flickering" behavior.
It's the consideration between performance and "flickering" / pre-loading indication.
The most popular library for this SPA (Single Page Application) page - is angularJS

If I understand your inquiry correctly. You might want to look more about:
1) window.location.hash
Instead of using the "?", you can make use of the "#" to manipulate your page based on query string.
Reference: How to change the querystring on the same page without postback
2) hashchange event
This event fires whenever there's a changed in the fragment/hash("#") of the url. Also, you might want to track the hash to compare between the previous hash value and the current hash value.
e.g.
$(window).on('hashchange', function () {
//your manipulation for query string goes here...
prevHash = location.hash;
});
var prevHash = location.hash; //For tracking the previous hash.
Reference: On - window.location.hash - Change?
3) For client-side entry-point or similar to server-side PageLoad, you may make use of this,
e.g.
/* Appends a method - to be called after the page(from server) has been loaded. */
function addLoadEvent(func) {
var oldonload = window.onload;
if (typeof window.onload != 'function') {
window.onload = func;
} else {
window.onload = function () {
if (oldonload) {
oldonload();
}
func();
}
}
}
function YourPage_PageLoad()
{
//your code goes here...
}
//Client entry-point
addLoadEvent(YourPage_PageLoad);
Since you're doing pure ajax, the benefit of this technique is you would be able to easily handle the previous/next button click events from the browser and present the proper data/page to the user.

I would prefer AngularJS. This will be a good technology and you can do pagination with one HTML. So i think this will be good framework for you as your using static content.
In AngularJS MVC concept is there please read the AngularJS Tutorial. So this framework will be worth for your new project. Happy coding

Related

How to help search engines to search in your AJAX driven website? [duplicate]

There are a lot of cool tools for making powerful "single-page" JavaScript websites nowadays. In my opinion, this is done right by letting the server act as an API (and nothing more) and letting the client handle all of the HTML generation stuff. The problem with this "pattern" is the lack of search engine support. I can think of two solutions:
When the user enters the website, let the server render the page exactly as the client would upon navigation. So if I go to http://example.com/my_path directly the server would render the same thing as the client would if I go to /my_path through pushState.
Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_path the server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.
The first solution is discussed further here. I have been working on a website doing this and it's not a very nice experience. It's not DRY and in my case I had to use two different template engines for the client and the server.
I think I have seen the second solution for some good ol' Flash websites. I like this approach much more than the first one and with the right tool on the server it could be done quite painlessly.
So what I'm really wondering is the following:
Can you think of any better solution?
What are the disadvantages with the second solution? If Google in some way finds out that I'm not serving the exact same content for the Google bot as a regular user, would I then be punished in the search results?

While #2 might be "easier" for you as a developer, it only provides search engine crawling. And yes, if Google finds out your serving different content, you might be penalized (I'm not an expert on that, but I have heard of it happening).
Both SEO and accessibility (not just for disabled person, but accessibility via mobile devices, touch screen devices, and other non-standard computing / internet enabled platforms) both have a similar underlying philosophy: semantically rich markup that is "accessible" (i.e. can be accessed, viewed, read, processed, or otherwise used) to all these different browsers. A screen reader, a search engine crawler or a user with JavaScript enabled, should all be able to use/index/understand your site's core functionality without issue.
pushState does not add to this burden, in my experience. It only brings what used to be an afterthought and "if we have time" to the forefront of web development.
What your describe in option #1 is usually the best way to go - but, like other accessibility and SEO issues, doing this with pushState in a JavaScript-heavy app requires up-front planning or it will become a significant burden. It should be baked in to the page and application architecture from the start - retrofitting is painful and will cause more duplication than is necessary.
I've been working with pushState and SEO recently for a couple of different application, and I found what I think is a good approach. It basically follows your item #1, but accounts for not duplicating html / templates.
Most of the info can be found in these two blog posts:
http://lostechies.com/derickbailey/2011/09/06/test-driving-backbone-views-with-jquery-templates-the-jasmine-gem-and-jasmine-jquery/
and
http://lostechies.com/derickbailey/2011/06/22/rendering-a-rails-partial-as-a-jquery-template/
The gist of it is that I use ERB or HAML templates (running Ruby on Rails, Sinatra, etc) for my server side render and to create the client side templates that Backbone can use, as well as for my Jasmine JavaScript specs. This cuts out the duplication of markup between the server side and the client side.
From there, you need to take a few additional steps to have your JavaScript work with the HTML that is rendered by the server - true progressive enhancement; taking the semantic markup that got delivered and enhancing it with JavaScript.
For example, i'm building an image gallery application with pushState. If you request /images/1 from the server, it will render the entire image gallery on the server and send all of the HTML, CSS and JavaScript down to your browser. If you have JavaScript disabled, it will work perfectly fine. Every action you take will request a different URL from the server and the server will render all of the markup for your browser. If you have JavaScript enabled, though, the JavaScript will pick up the already rendered HTML along with a few variables generated by the server and take over from there.
Here's an example:
<form id="foo">
Name: <input id="name"><button id="say">Say My Name!</button>
</form>
After the server renders this, the JavaScript would pick it up (using a Backbone.js view in this example)
FooView = Backbone.View.extend({
events: {
"change #name": "setName",
"click #say": "sayName"
},
setName: function(e){
var name = $(e.currentTarget).val();
this.model.set({name: name});
},
sayName: function(e){
e.preventDefault();
var name = this.model.get("name");
alert("Hello " + name);
},
render: function(){
// do some rendering here, for when this is just running JavaScript
}
});
$(function(){
var model = new MyModel();
var view = new FooView({
model: model,
el: $("#foo")
});
});
This is a very simple example, but I think it gets the point across.
When I instante the view after the page loads, I'm providing the existing content of the form that was rendered by the server, to the view instance as the el for the view. I am not calling render or having the view generate an el for me, when the first view is loaded. I have a render method available for after the view is up and running and the page is all JavaScript. This lets me re-render the view later if I need to.
Clicking the "Say My Name" button with JavaScript enabled will cause an alert box. Without JavaScript, it would post back to the server and the server could render the name to an html element somewhere.
Edit
Consider a more complex example, where you have a list that needs to be attached (from the comments below this)
Say you have a list of users in a <ul> tag. This list was rendered by the server when the browser made a request, and the result looks something like:
<ul id="user-list">
<li data-id="1">Bob
<li data-id="2">Mary
<li data-id="3">Frank
<li data-id="4">Jane
</ul>
Now you need to loop through this list and attach a Backbone view and model to each of the <li> items. With the use of the data-id attribute, you can find the model that each tag comes from easily. You'll then need a collection view and item view that is smart enough to attach itself to this html.
UserListView = Backbone.View.extend({
attach: function(){
this.el = $("#user-list");
this.$("li").each(function(index){
var userEl = $(this);
var id = userEl.attr("data-id");
var user = this.collection.get(id);
new UserView({
model: user,
el: userEl
});
});
}
});
UserView = Backbone.View.extend({
initialize: function(){
this.model.bind("change:name", this.updateName, this);
},
updateName: function(model, val){
this.el.text(val);
}
});
var userData = {...};
var userList = new UserCollection(userData);
var userListView = new UserListView({collection: userList});
userListView.attach();
In this example, the UserListView will loop through all of the <li> tags and attach a view object with the correct model for each one. it sets up an event handler for the model's name change event and updates the displayed text of the element when a change occurs.
This kind of process, to take the html that the server rendered and have my JavaScript take over and run it, is a great way to get things rolling for SEO, Accessibility, and pushState support.
Hope that helps.

I think you need this: http://code.google.com/web/ajaxcrawling/
You can also install a special backend that "renders" your page by running javascript on the server, and then serves that to google.
Combine both things and you have a solution without programming things twice. (As long as your app is fully controllable via anchor fragments.)

So, it seem that the main concern is being DRY
If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
(test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
When the request is not from a google bot, you just process normally.
If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server mylink, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
For your html I'm assuming you're using normal links with some kind of hijacking (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse <a> tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.
Here are a couple of examples using phantom.js for seo:
http://backbonetutorials.com/seo-for-single-page-apps/
http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering

If you're using Rails, try poirot. It's a gem that makes it dead simple to reuse mustache or handlebars templates client and server side.
Create a file in your views like _some_thingy.html.mustache.
Render server side:
<%= render :partial => 'some_thingy', object: my_model %>
Put the template your head for client side use:
<%= template_include_tag 'some_thingy' %>
Rendre client side:
html = poirot.someThingy(my_model)

To take a slightly different angle, your second solution would be the correct one in terms of accessibility...you would be providing alternative content to users who cannot use javascript (those with screen readers, etc.).
This would automatically add the benefits of SEO and, in my opinion, would not be seen as a 'naughty' technique by Google.

Interesting. I have been searching around for viable solutions but it seems to be quite problematic.
I was actually leaning more towards your 2nd approach:
Let the server provide a special website only for the search engine
bots. If a normal user visits http://example.com/my_path the server
should give him a JavaScript heavy version of the website. But if the
Google bot visits, the server should give it some minimal HTML with
the content I want Google to index.
Here's my take on solving the problem. Although it is not confirmed to work, it might provide some insight or idea's for other developers.
Assume you're using a JS framework that supports "push state" functionality, and your backend framework is Ruby on Rails. You have a simple blog site and you would like search engines to index all your article index and show pages.
Let's say you have your routes set up like this:
resources :articles
match "*path", "main#index"
Ensure that every server-side controller renders the same template that your client-side framework requires to run (html/css/javascript/etc). If none of the controllers are matched in the request (in this example we only have a RESTful set of actions for the ArticlesController), then just match anything else and just render the template and let the client-side framework handle the routing. The only difference between hitting a controller and hitting the wildcard matcher would be the ability to render content based on the URL that was requested to JavaScript-disabled devices.
From what I understand it is a bad idea to render content that isn't visible to browsers. So when Google indexes it, people go through Google to visit a given page and there isn't any content, then you're probably going to be penalised. What comes to mind is that you render content in a div node that you display: none in CSS.
However, I'm pretty sure it doesn't matter if you simply do this:
<div id="no-js">
<h1><%= #article.title %></h1>
<p><%= #article.description %></p>
<p><%= #article.content %></p>
</div>
And then using JavaScript, which doesn't get run when a JavaScript-disabled device opens the page:
$("#no-js").remove() # jQuery
This way, for Google, and for anyone with JavaScript-disabled devices, they would see the raw/static content. So the content is physically there and is visible to anyone with JavaScript-disabled devices.
But, when a user visits the same page and actually has JavaScript enabled, the #no-js node will be removed so it doesn't clutter up your application. Then your client-side framework will handle the request through it's router and display what a user should see when JavaScript is enabled.
I think this might be a valid and fairly easy technique to use. Although that might depend on the complexity of your website/application.
Though, please correct me if it isn't. Just thought I'd share my thoughts.

Use NodeJS on the serverside, browserify your clientside code and route each http-request's(except for static http resources) uri through a serverside client to provide the first 'bootsnap'(a snapshot of the page it's state). Use something like jsdom to handle jquery dom-ops on the server. After the bootsnap returned, setup the websocket connection. Probably best to differentiate between a websocket client and a serverside client by making some kind of a wrapper connection on the clientside(serverside client can directly communicate with the server). I've been working on something like this: https://github.com/jvanveen/rnet/

Use Google Closure Template to render pages. It compiles to javascript or java, so it is easy to render the page either on the client or server side. On the first encounter with every client, render the html and add javascript as link in header. Crawler will read the html only but the browser will execute your script. All subsequent requests from the browser could be done in against the api to minimize the traffic.

This might help you : https://github.com/sharjeel619/SPA-SEO
Logic
A browser requests your single page application from the server,
which is going to be loaded from a single index.html file.
You program some intermediary server code which intercepts the client
request and differentiates whether the request came from a browser or
some social crawler bot.
If the request came from some crawler bot, make an API call to
your back-end server, gather the data you need, fill in that data to
html meta tags and return those tags in string format back to the
client.
If the request didn't come from some crawler bot, then simply
return the index.html file from the build or dist folder of your single page
application.

Best practise to give JS its initial dataset on page load

Imagine a page that shows a newsfeed. As soon as a user request said page, we go to the database (or whatever) and get the newsfeed. After the page is already loaded, we also have new news items added dynamically (through ajax/json). This means we effectively have two mechanisms to build a newsfeed. One with our server side language for the initial page request, and one with Javascript for any new items.
This is hard to maintain (because when something changes, we have to change both the JS mechanism and the Server side mechanism).
What is a good solution for this? And why? I've come up with the following scenarios:
Giving javascript an intial set, somewhere in the html, and let it build the initial view when document is ready;
Letting javascript do an ajax request on document ready to get the initial data; or
Keep it as described above, having a JS version and a SS version.
I'm leaning towards the first scenario, And for that I have a followup question: How do you give JS the dataset? in a hidden div or something?

Doing one more AJAX request to get the data isn't really costly and lets you have one simple architecture. This is a big benefit.
But another benefit you seem to forget is that by serving always the same static resources, you let them be cached.
It seems to me there's no benefit in integrating data in your initial page, use only one scheme, AJAX, and do an initial request.

Use a separate news provider to be loaded from page providing data as-is. This will keep things simple and make it load very quickly to be available nearly as fast as any embedded but hidden data set.

best way to send JSON to the page at creation time

I need to send some data to a web page, ideally in json format and I wonder what method is considered best, and why. Overall what good or bad experiences and surprises you had with them.
<script>var myJson = <? echo json_encode($myVar);
?>;</script>
advantage: the json is directly in javascript, were it will be used.
inconvenient: <script> in the middle of html/dom is bad (js belong
to .js files).
<div data-myJson='<? echo json_encode($myVar); ?>'>
advantage: html5 data thing is easy to work with. inconvenient:
bunch of data in the dom, it doesn't look elegant note: in my
case, I can afford to ignore "old" browsers.
ajax everything.
advantage: the json doesn't even need to be sent in this case, as it
will be already available (no page change). inconvenient: not
really an option as I would need to rewrite the full website.
instead of sending the full json, store it in the session and send a
key.
advantage: less data moving around inconvenient: the
data/session couple needs to be kept track of, and I like my session to be kept clean and tidy. (even if user just close the page before the flow is
finished) (which won't close the session).
Cookies.
advantage: herr.. is reverse evil a good thing? inconvenient: like session variables, but out of the cage.
Store the json in the session, and ajax it when the page is loaded.
advantage: somewhat elegant conceptually. inconvenient: heavy, as the ajax instruction has
to be added to a js file, and the session has to be managed. (and
cleansed. if the page load doesn't finish, the json will stay until
I cleanse it or the session finishes). Plus the html header means more bandwidth, and the we have to wait for the success to use the object.
other?
edit: as there seems to be a bit of confusion, with option 3 "ajax everything" I was meaning one page load, and all content loaded by ajax, even if you go through menus, links to other pages, forms submit, and such. I consider a more traditional navigation, (pages sent by the server as new a pages), with a page doing an ajax request to retrieve some value (here, my json object) on the server, as point 4 "session", as the main data has to remain on the server after the page has been sent to be later fetched by the ajax request. I did add option 6 for this.

I unhesitatingly recommend #1. You want to use your data in javascript, right? #1 is the simplest way and most direct way to ensure that your data exists, as a plain-old javascript object, when the page loads. I transfer data from the server side to the browser side all the time this way and it works beautifully.
You could arguably create better separation between your data and your UI by loading your data in an ajax call, but this is an additional http request, which will slow your page load.

Been a few years since this was asked, but for anyone else who finds themselves here and curious, I've been doing a variation of option #1 for quite a while. Additionally, Nike Plus does this as well. When the page loads, Nike sets window.np = {}
I've never really found a convention I love, but I've tried:
window.data
window.app.data (inspired by Symfony, literally uses the attribute app)
window.[app_name].data (inspired by Nike Plus)
window.initData (inspired by Google+)
In my case, I overwrite these JS objects with Backbone models/collections when the main Backbone app loads.

"Single-page" JS websites and SEO

While #2 might be "easier" for you as a developer, it only provides search engine crawling. And yes, if Google finds out your serving different content, you might be penalized (I'm not an expert on that, but I have heard of it happening).
Both SEO and accessibility (not just for disabled person, but accessibility via mobile devices, touch screen devices, and other non-standard computing / internet enabled platforms) both have a similar underlying philosophy: semantically rich markup that is "accessible" (i.e. can be accessed, viewed, read, processed, or otherwise used) to all these different browsers. A screen reader, a search engine crawler or a user with JavaScript enabled, should all be able to use/index/understand your site's core functionality without issue.
pushState does not add to this burden, in my experience. It only brings what used to be an afterthought and "if we have time" to the forefront of web development.
What your describe in option #1 is usually the best way to go - but, like other accessibility and SEO issues, doing this with pushState in a JavaScript-heavy app requires up-front planning or it will become a significant burden. It should be baked in to the page and application architecture from the start - retrofitting is painful and will cause more duplication than is necessary.
I've been working with pushState and SEO recently for a couple of different application, and I found what I think is a good approach. It basically follows your item #1, but accounts for not duplicating html / templates.
Most of the info can be found in these two blog posts:
http://lostechies.com/derickbailey/2011/09/06/test-driving-backbone-views-with-jquery-templates-the-jasmine-gem-and-jasmine-jquery/
and
http://lostechies.com/derickbailey/2011/06/22/rendering-a-rails-partial-as-a-jquery-template/
The gist of it is that I use ERB or HAML templates (running Ruby on Rails, Sinatra, etc) for my server side render and to create the client side templates that Backbone can use, as well as for my Jasmine JavaScript specs. This cuts out the duplication of markup between the server side and the client side.
From there, you need to take a few additional steps to have your JavaScript work with the HTML that is rendered by the server - true progressive enhancement; taking the semantic markup that got delivered and enhancing it with JavaScript.
For example, i'm building an image gallery application with pushState. If you request /images/1 from the server, it will render the entire image gallery on the server and send all of the HTML, CSS and JavaScript down to your browser. If you have JavaScript disabled, it will work perfectly fine. Every action you take will request a different URL from the server and the server will render all of the markup for your browser. If you have JavaScript enabled, though, the JavaScript will pick up the already rendered HTML along with a few variables generated by the server and take over from there.
Here's an example:
<form id="foo">
Name: <input id="name"><button id="say">Say My Name!</button>
</form>
After the server renders this, the JavaScript would pick it up (using a Backbone.js view in this example)
FooView = Backbone.View.extend({
events: {
"change #name": "setName",
"click #say": "sayName"
},
setName: function(e){
var name = $(e.currentTarget).val();
this.model.set({name: name});
},
sayName: function(e){
e.preventDefault();
var name = this.model.get("name");
alert("Hello " + name);
},
render: function(){
// do some rendering here, for when this is just running JavaScript
}
});
$(function(){
var model = new MyModel();
var view = new FooView({
model: model,
el: $("#foo")
});
});
This is a very simple example, but I think it gets the point across.
When I instante the view after the page loads, I'm providing the existing content of the form that was rendered by the server, to the view instance as the el for the view. I am not calling render or having the view generate an el for me, when the first view is loaded. I have a render method available for after the view is up and running and the page is all JavaScript. This lets me re-render the view later if I need to.
Clicking the "Say My Name" button with JavaScript enabled will cause an alert box. Without JavaScript, it would post back to the server and the server could render the name to an html element somewhere.
Edit
Consider a more complex example, where you have a list that needs to be attached (from the comments below this)
Say you have a list of users in a <ul> tag. This list was rendered by the server when the browser made a request, and the result looks something like:
<ul id="user-list">
<li data-id="1">Bob
<li data-id="2">Mary
<li data-id="3">Frank
<li data-id="4">Jane
</ul>
Now you need to loop through this list and attach a Backbone view and model to each of the <li> items. With the use of the data-id attribute, you can find the model that each tag comes from easily. You'll then need a collection view and item view that is smart enough to attach itself to this html.
UserListView = Backbone.View.extend({
attach: function(){
this.el = $("#user-list");
this.$("li").each(function(index){
var userEl = $(this);
var id = userEl.attr("data-id");
var user = this.collection.get(id);
new UserView({
model: user,
el: userEl
});
});
}
});
UserView = Backbone.View.extend({
initialize: function(){
this.model.bind("change:name", this.updateName, this);
},
updateName: function(model, val){
this.el.text(val);
}
});
var userData = {...};
var userList = new UserCollection(userData);
var userListView = new UserListView({collection: userList});
userListView.attach();
In this example, the UserListView will loop through all of the <li> tags and attach a view object with the correct model for each one. it sets up an event handler for the model's name change event and updates the displayed text of the element when a change occurs.
This kind of process, to take the html that the server rendered and have my JavaScript take over and run it, is a great way to get things rolling for SEO, Accessibility, and pushState support.
Hope that helps.

I think you need this: http://code.google.com/web/ajaxcrawling/
You can also install a special backend that "renders" your page by running javascript on the server, and then serves that to google.
Combine both things and you have a solution without programming things twice. (As long as your app is fully controllable via anchor fragments.)

So, it seem that the main concern is being DRY
If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
(test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
When the request is not from a google bot, you just process normally.
If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server mylink, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
For your html I'm assuming you're using normal links with some kind of hijacking (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse <a> tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.
Here are a couple of examples using phantom.js for seo:
http://backbonetutorials.com/seo-for-single-page-apps/
http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering

If you're using Rails, try poirot. It's a gem that makes it dead simple to reuse mustache or handlebars templates client and server side.
Create a file in your views like _some_thingy.html.mustache.
Render server side:
<%= render :partial => 'some_thingy', object: my_model %>
Put the template your head for client side use:
<%= template_include_tag 'some_thingy' %>
Rendre client side:
html = poirot.someThingy(my_model)

To take a slightly different angle, your second solution would be the correct one in terms of accessibility...you would be providing alternative content to users who cannot use javascript (those with screen readers, etc.).
This would automatically add the benefits of SEO and, in my opinion, would not be seen as a 'naughty' technique by Google.

Use NodeJS on the serverside, browserify your clientside code and route each http-request's(except for static http resources) uri through a serverside client to provide the first 'bootsnap'(a snapshot of the page it's state). Use something like jsdom to handle jquery dom-ops on the server. After the bootsnap returned, setup the websocket connection. Probably best to differentiate between a websocket client and a serverside client by making some kind of a wrapper connection on the clientside(serverside client can directly communicate with the server). I've been working on something like this: https://github.com/jvanveen/rnet/

Use Google Closure Template to render pages. It compiles to javascript or java, so it is easy to render the page either on the client or server side. On the first encounter with every client, render the html and add javascript as link in header. Crawler will read the html only but the browser will execute your script. All subsequent requests from the browser could be done in against the api to minimize the traffic.

This might help you : https://github.com/sharjeel619/SPA-SEO
Logic
A browser requests your single page application from the server,
which is going to be loaded from a single index.html file.
You program some intermediary server code which intercepts the client
request and differentiates whether the request came from a browser or
some social crawler bot.
If the request came from some crawler bot, make an API call to
your back-end server, gather the data you need, fill in that data to
html meta tags and return those tags in string format back to the
client.
If the request didn't come from some crawler bot, then simply
return the index.html file from the build or dist folder of your single page
application.

How do modern web-apps/sites do postbacks? javascript/ajax, <form> or X?

I am curious to know how "modern" web-apps/sites do postbacks, meaning when they are sending back user-input to the site be processed.
Do modern sites still use the old fashion <form>-tags or do they use some sort of javascript/ajax implementation? or is there even a third option?

I tend to use both, where a full page refresh from a normal <form> submit works, but if JavaScript is enabled you hook up to the submit event and override it. This gives you fancy AJAX if possible, and graceful degradation if it's not.
Here's a quick example (jQuery for brevity):
<form type="POST" action="myPage.htm">
<input type="text" name="UserName" />
<button type="submit">Submit me!</button>
</form>
If the user has no JavaScript, no problem the form works. If they do (and most will) I can do this:
$(function() {
$("form").submit(function() {
$.ajax({
url: this.action,
type: this.type,
data: $(this).serialize(),
success: function(data) {
//do something with the result
}
});
});
});
This allows the graceful degradation, which can be very important (depends on your attitude/customer base I suppose). I try and not screw over users without JavaScript, or if a JavaScript error happens on my part, and you can see from above this can be done easily/generically enough to not be painful either.
If you're curious about content, on the server you can check for the X-Requested-With header from most libraries and if it's present, return just the <form> or some success message (since it was an AJAX request), if it's not present then assume a full page load, and send the whole thing.

It will vary depending on what exactly is being done, but most web-apps will do a combination of AJAX for short content (esp where live/interactive updates are helpful without requiring a page refresh) and "old-fashioned" forms for longer content wherein a full page load is not an issue.
Note that nothing about AJAX prevents its use for long content as well as short.
Also note that from the stand-point of the code driving the server app, there is not necessarily much difference at all between a request coming from a standard form or a request coming from an AJAX submission. You could easily structure a single script to properly respond to both types of request (though in some cases you can save bandwidth by sending a shorter "data-only" response to the AJAX version, since the client-side code can be responsible for parsing the data into meaningful content).

Basically, there are three models.
The traditional form model uses form postbacks for every action that requires going to the server, and may use javascript to perform client-side tasks that make the user's life easier without compromising security (e.g. pre-postback validation). The advantage of this is that such a web application, if written correctly, will work without needing any javascript at all, and it is easier to make it work on a wide range of devices (e.g. audio browsers).
The server-centric ajax model uses ajax calls to provide partial page refreshes; the page is built like in the traditional model, but instead of triggering a full page post-back, client-side script uses ajax calls to send clicks and other events, and retrieves information to replace a part of the document (usually in JSON or XHTML form). Because you don't post the entire page every time, the response is usually quicker. This model is useful if you want to use ajax where possible, but still want to be able to fall back to traditional postbacks if javascript isn't available.
Client-centric ajax takes a different path; the core idea is that you write your entire application in javascript, using ajax to exchange data, not markup. When you click a button, the application sends a request to the server, and receives data. The data is then used for further processing on the client. This is more flexible and usually faster than the other methods, but it requires good knowledge of javascript. Applications written in this style generally don't work at all when javascript is disabled on the client.

Most decent web applications try to put progressive enhancement. Meaning that a simple old fashioned button click results in a form post which can be handled by the server. This is for the scenario of people who use an old browser or turned off javascript for some reason.
The enhancement can be done by using hijaxing. Meaning that that same page can perform, with the help of some ajax, the form post to the server but not the entire page. This enables users with javascript enabled to have a better user experience.
old fashion -tags or do they use some sort of javascript/ajax implementation?
In order to have javascript to work on something you still need that somethingl old fashioned tags. Web applications can be structured down to 3 major parts:
Content: HTML
Layout: CSS
Behavior: javascript/ajax

Develop Reference

JavaScript is the programming language of the Web.