Dual-Side Templating vs. Server-Side DOM Manipulation - javascript

I'm making an app that requires dynamic content be fully rendered on the page for search engine bots - a problem, potentially, should I use JS templating to control the content. Web spiders are supposedly getting better at indexing RIA sites, but I don't want to risk it. Also, as mobile internet is still spotty in most places, it seems like a good practice to maximize the server load initially to ensure that basic functionality/styles/dynamic content show up on your pages, even if the client hasn't downloaded any JS libraries.
That's how I stumbled upon dual-side templating:
Problem: How can you allow for dynamic, Ajax-style, rendering in the browser, but at the same time output it from the server upon initial page load?
c. 2010: Dual-Side Templating A single template is used on both browser and server, to render content wherever it’s appropriate – typically the server as the page loads and the browser as the app progresses. For example, blog comments. You output all existing comments from the server, using your server-side template. Then, when the user makes a new comment, you render a preview of it – and the final version – using browser-side templating.
I want to try dual-side templating with Node.js and Eco templates, but I don't know how to proceed. I'm new to JavaScript and all things Node.
Node-Lift is said to help, but I don't understand what it's doing or why.
Can someone provide a high level overview of how you might use dual-templating in the context of a mobile web app?
Where does server-side DOM manipulation with jQuery and JSDOM fit in to the equation?
TIA

Dav Glass gave a great talk about this last year: http://www.youtube.com/watch?v=bzCnUXEvF84
And here is a blog article that goes over some of the details: http://www.yuiblog.com/blog/2010/04/09/node-js-yui-3-dom-manipulation-oh-my/

Related

How to download and query html pages where JS processing is necessary?

I often compile informal datasets by running some kind of XPath/XQuery on publicly available web pages. Usually the structure of the HTML is regular enough that useful information can be extracted easily.
But today I've come across tunefind.com. This website makes extensive use of the REACTJS framework, and so most of the structure of the page is configured client-side by Javascript. The pages, when initially downloaded, are very basic and missing a lot of information. The pages are populated by a script that uses a hopelessly messy blob of JSON data at the bottom of the page.
The only way I can think of to deal with this would be to use some kind of GUI-based web engine and just not display the GUI part. But that is a preposterous amount of work for these casual little CLI tools that I use to gather information.
Is there any way to perform the javascript preprocessing without dealing with unnecessary graphics?
Even if you were to process without the graphics the react javascript will be geared towards running in a browser context, at the very least it will expect a functioning DOM to exist, the application itself may also require clicks / transitions to happen before you can see some data.
Your best bet then is to load the page in a browser, to keep this simple, there are plenty of good browser automation frameworks designed for this.
I've used a fair few libraries over the years including phantomJS and recently I've gotten the most mileage out of nightmarejs.
It runs an electron browser for you and gives you a useful promisified javascript API to control it with, that has common browser functions such as clicking, following links etc.
You can configure it to hide the browser which is useful for making a CLI tool, however its a bit of a pseudo-headless mode and will still require a windowing/graphical context (e.g. x window).
Hope this helps.
PS - If you're at all used to docker it's not hard to make this just a running container!

PHP template system vs javascript AJAX template

My PHP template looks like this:
$html=file_get_contents("/path/to/file.html");
$replace=array(
"{title}"=>"Title of my webpage",
"{other}"=>"Other information",
...
);
foreach(replace AS $search=>$replace){
$html=str_replace($search,$replace,$html);
}
echo $html;
I am considering switching to a javascript/ajax template system. The AJAX will fetch the $replace array in JSON format and then I'll use javascript to replace the HTML.
The page would then be a plain .html file and a loading screen would be shown until the ajax was complete.
Is there any real advantages to this or is the transition a waste of time?
A few of the reasons I think this will be beneficial:
Page will still load even if the Mysql or PHP services are down. If the ajax fails I can handle it with an error message.
Bot traffic (and anything else that doesnt run JS) will cause very little load to my server since the ajax will never be sent.
Please let me know what your thoughts are.
My 2cents is it is better to do the logic on the template side (javascript). If you have a high traffic site you can off load some of the processing to each computer calling the site. Maybe less servers.
With Javascript frameworks like AngularJs the template stuff is pretty simple and efficient. And the framework will do caching for you.
Yes, SEO can be an issue with certain sites. There at proxy tools you can put in place that will render the site and return the static html to the bot. Plus I think some bots render javascript these days.
Lastly, I like to template on the front-end because I like the backend to be a generic data provider (RESTful API). This way I can build a generic backend that drives web / mobile and other platforms in a generic way. The UI logic can be its separate thing in javascript.
But it comes down to the design needs of your application. I build lots of Software as a service applications so a single page application works well for me.
I've worked with similar design pattern in other projects. There are several ways to do this and the task would involve managing multiple project or application modules. I am assume you are working with a team of developers and not using either PHP or JavaScript MVC framework.
PHP Template
For many reasons, I'm against using “search and replace” method especially using server-side scripting language to parse HTML documents as a templating kit.
Why?
As you maintain business rules and project becomes larger, you will
find yourself reading through a long list of regular expressions,
parse HTML into DOM, and/or complicated algorithms for searching
nodes to replace with correct text(s).
If you had a placeholder, such as {title}, that would help the
script to have fewer search and replace expressions but the design
pattern could lead to messy sharing with multiple developers.
It is ok to parse one or two HTML files to manage the output but not
the entire template. The network response could be slower with
multiple and repetitive trips to server and that's just only for
template. There could be other scripts that is also making trips to
the server for different reason unrelated to template.
AJAX/JavaScript
Initially, AJAX with JavaScript might sound like a neat idea but I'm still not convinced.
Why?
You can't assume web browser is JavaScript-enabled in every mobile
or desktop. You might need to structure the HTML template in few
ways to manage the output for non-JavaScript browsers. You might
need to include <noscript> and/or <iframe> tags on every page. And,
managing alternative template for non-JavaScript browser can be
tedious.
Every web browser interpret JavaScript differently. Most developers
should know few differences between IE, FireFox, Chrome, Safari, and
to name few. You might need to create multiple JavaScript files to
detect then load JavaScript for that specific web browser. You
update one feature, you have to update script for all web browsers.
JavaScript is visible in page source. I wouldn't want to display
confidential JavaScript functions that might include credentials,
leak sensitive data about web services, and/or SQL queries. The idea
is to secure your page as much as possible.
I'm not saying both are impossible. You could still do either PHP or JavaScript for templating.
However, my “ideal” web structure should consist of a reliable MVC like Zend, Spring, or Magnolia. Those MVC framework include many useful features such as web services, data mapping, and templating kits. Granted, it's difficult for beginners with configuration requirements to integrate MVC into your project. But in the end, you could delegate tasks in configurations, MVC concepts, custom SQL queries, and test cases to developers. That's my two cents.
I think the most important aspects you forgot are:
SEO : What about search engine bots ? They wont be able to index your content if it is set by javascript only.
Execution and Network Latency : When your service is working, the browser will wait until the page is loaded (let's say 800ms) before making the extra Ajax calls to get your values. This might add an extra 500ms to get it (depending on network speed and geographic location...). If you have sent all the generated data by your server, you would have spent only ~1ms more to prepare the complete response. You would have a lot of waiting on a blank page.
Caching : You could cache the generated pages on your web app. That way your load will be minimized as well. And also, if you still want to deliver content while your backend services (MySQL/PHP..) are down you could even use Apache or Nginx caching.
But I guess it really depends on what you want to do.
For fast and simple pages, which seems to be your case, stick with backend enhancements.
For a dynamic/interactive app which can afford loading times, and doesn't care about SEO, you can delegate most things to the front-end. But then use an advanced framework like Angular, to handle templating, caching, etc...

Generate Static HTML From Client-Side JavaScript Generated Site

I'm generating an entire site using just an index.html with JS scripts.
The JS creates the HTML content based on JSON data received via the server-side API. This works great client-side and makes the site load speed and interaction very fast but there is a snag... when a crawler comes to index the page it will see a blank page.
The obvious solution is to provide an XML site map with static versions of all the pages. The problem is... how to generate static versions of each page when they are only generated client-side and all logic and templates are client-side?
This is not a new issue... I'm sure anyone generating pages dynamically client-side has hit this issue and solved it but I thought I'd ask the dev community before diving in and trying to solve this.
2019 Update
Tech has moved on significantly. I would encourage anyone looking to create SSR (server-side rendered) and client-side web apps in one isomorphic code base to take a look at the excellent Next.js.
Next.js wraps React with a server-side routing and rendering system built in Node.js, defines a standard interface to getting data for pages on server and client, and comes with some out of the box features that make it one of the best choices (IMHO) for both SSR and CSR web applications.
Oh... and they have a great tutorial too!
2013 Answer
I've managed to generate static pages from the client-side output by using PhantomJS and capturing the HTML output after the page and all JS has finished loading/executing. This method is slower than I would like and unlikely to scale well but it's the only option that I can think of so far.
The site already receives over 10,000 page views a day with over 8,000 unique visitors so pages get updated regularly as new comments / posts are created and then these changes are added to a queue which gets process in a separate server to generate static pages with Phantom.
The only other way I can think of doing this is to create a Node.js process that uses the same jsRender library and builds HTML output from the template files based on some data, but this would be time consuming to set up and would not generate the exact same output that the dynamic site creates. Google may frown on me serving it static pages that don't really represent the dynamic version that "normal" visitors can see.
This seems like an unsolvable issue. Either I generate the pages entirely server-side, or crawlers cannot index the pages. :(

Using node.js to serve content from a Backbone.js app to search crawlers for SEO

Either my google-fu has failed me or there really aren't too many people doing this yet. As you know, Backbone.js has an achilles heel--it cannot serve the html it renders to page crawlers such as googlebot because they do not run JavaScript (although given that its Google with their resources, V8 engine, and the sobering fact that JavaScript applications are on the rise, I expect this to someday happen). I'm aware that Google has a hashbang workaround policy but it's simply a bad idea. Plus, I'm using PushState. This is an extremely important issue for me and I would expect it to be for others as well. SEO is something that cannot be ignored and thus cannot be considered for many applications out there that require or depend on it.
Enter node.js. I'm only just starting to get into this craze but it seems possible to have the same Backbone.js app that exists on the client be on the server holding hands with node.js. node.js would then be able to serve html rendered from the Backbone.js app to page crawlers. It seems feasible but I'm looking for someone who is more experienced with node.js or even better, someone who has actually done this, to advise me on this.
What steps do I need to take to allow me to use node.js to serve my Backbone.js app to web crawlers? Also, my Backbone app consumes an API that is written in Rails which I think would make this less of a headache.
EDIT: I failed to mention that I already have a production app written in Backbone.js. I'm looking to apply this technique to that app.
First of all, let me add a disclaimer that I think this use of node.js is a bad idea. Second disclaimer: I've done similar hacks, but just for the purpose of automated testing, not crawlers.
With that out of the way, let's go. If you intend to run your client-side app on server, you'll need to recreate the browser environment on your server:
Most obviously, you're missing the DOM (Document Object Model) - basically the AST on top of your parsed HTML document. The node.js solution for this is jsdom.
That however will not suffice. Your browser also exposes BOM (Browser Object Model) - access to browser features like, for example, history.pushState. This is where it gets tricky. There are two options: you can try to bend phantomjs or casperjs to run your app and then scrape the HTML off it. It's fragile since you're running a huge full WebKit browser with the UI parts sawed off.
The other option is Zombie - which is lightweight re-implementation of browser features in Javascript. According to the page it supports pushState, but my experience is that the browser emulation is far from complete - however give it a try and see how far you get.
I'm going to leave it to you to decide whether pushing your rendering engine to the server side is a sound decision.
Because Nodejs is built on V8 (Chrome's engine) it will run javascript, like Backbone.js. Creating your models and so forth would be done in exactly the same way.
The Nodejs environment of course lacks a DOM. So this is the part you need to recreate. I believe the most popular module is:
https://github.com/tmpvar/jsdom
Once you have an accessible DOM api in Nodejs, you simply build its nodes as you would for a typical browser client (maybe using jQuery) and respond to server requests with rendered HTML (via $("myDOM").html() or similar).
I believe you can take a fallback strategy type approach. Consider what would happen with javascript turned off and a link clicked vs js on. Anything you do on your page that can be crawled should have some reasonable fallback procedure when javascript is turned off. Your links should always have the link to the server as the href, and the default action happening should be prevented with javascript.
I wouldn't say this is backbone's responsibility necessarily. I mean the only thing backbone can help you with here is modifying your URL when the page changes and for your models/collections to be both client and server side. The views and routers I believe would be strictly client side.
What you can do though is make your jade pages and partial renderable from the client side or server side with or without content injected. In this way the same page can be rendered in either way. That is if you replace a big chunk of your page and change the url then the html that you are grabbing can be from the same template as if someone directly went to that page.
When your server receives a request it should directly take you to that page rather than go through the main entry point and the load backbone and have it manipulate the page and set it up in a way that the user intends with the url.
I think you should be able to achieve this just by rearranging things in your app a bit. No real rewriting just a good amount of moving things around. You may need to write a controller that will serve you html files with content injected or not injected. This will serve to give your backbone app the html it needs to couple with the data from the models. Like I said those same templates can be used when you directly hit those links through the routers defined in express/node.js
This is on my todo list of things to do with our app: have Node.js parse the Backbone routes (stored in memory when the app starts) and at the very least serve the main pages template at straight HTML—anything more would probably be too much overhead /processing for the BE when you consider thousands of users hitting your site.
I believe Backbone apps like AirBnB do it this way as well but only for Robots like Google Crawler. You also need this situation for things like Facebook likes as Facebook sends out a crawler to read your og:tags.
Working solution is to use Backbone everywhere
https://github.com/Morriz/backbone-everywhere but it forces you to use Node as your backend.
Another alternative is to use the same templates on the server and front-end.
Front-end loads Mustache templates using require.js text plugin and the server also renders the page using the same Mustache templates.
Another addition is to also render bootstrapped module data in javascript tag as JSON data to be used immediately by Backbone to populate models and collections.
Basically you need to decide what it is that you're serving: is it a true app (i.e. something that could stand in as a replacement for a dedicated desktop application), or is it a presentation of content (i.e. classical "web page")? If you're concerned about SEO, it's likely that it's actually the latter ("content site") and in that case the "single-page app" model isn't appropriate; you really want the "progressively enhanced website" model instead (look up such phrases as "unobtrusive JavaScript", "progressive enhancement" and "adaptive Web design").
To amplify a little, "server sends only serialized data and client does all rendering" is only appropriate in the "true app" scenario. For the "content site" scenario, the appropriate model is "server does main rendering, client makes it look better and does some small-scale rendering to avoid disruptive page transitions when possible".
And, by the way, the objection that progressive enhancement means "making sure that a user can see doesn't get anything better than a blind user who uses text-to-speech" is an expression of political resentment, not reality. Progressively enhanced sites can be as fancy as you want them to from the perspective of a user with a high-end rendering system.

Handlebars.js and SEO

I have read a great deal of discussions about javascript templating and Search Engine Optimization. Still, I haven't found a satisfying answer to the question (either poorly-documented or outdated).
Currently I am looking into handlebars.js as a client-side template solution, because I love the possibility to create helper functions. But what about indexing for search engines? Does the bot index the generated content (as intended) or only the source with the ugly javascript pseudo-variables? I know that there are lots of threads going on about this matter but I feel that nobody does exactly know the answer.
If engines like Google would not index these templates properly, why would one bother using this for public websites?
Another question within this context: Is it possible to render Handlebar.js templates on server side and then present them onto the client side? Obviously to avoid all this SEO discussion.
For dom crunching client side, most web bots (i.e. Google and others) don't interpret js on the fly and parse newly rendered content for indexing. Instead Google (and now Bing) support the 'Google Ajax Crawling Scheme' (https://developers.google.com/webmasters/ajax-crawling/docs/getting-started) - which basically states that IF you want js rendered dom content to be indexed (i.e. rendering ajax call results), you must be able to:
Trigger the async js rendering via the url using hashbangs #! (i.e. http://www.mysite.com/#!my-state), and
Be able to serve a prerendered dom snapshot of your site AFTER js modification on request.
If using a client side MVC framework like Backbone.js, or Spine - you will need to provide this service if you want your web app indexed.
Generally this means you intercept a request made by the web bot (explained on the link above), and scrape your side server side using a headless browser (i.e. QT + capybara-webkit, HtmlUnit, etc.), then deliver the generated dom back to the requesting bot.
I've started a gem to do this in ruby (now taking pull requests) at https://github.com/benkitzelman/google-ajax-crawler
It does this as rack middleware using capybara-webkit (and soon phantomjs)
I do not know about Handlebar.js, but for my understanding SEO have some problem with CONTENT in JAVASCRIPT. Make sure your content is visible to Search Engine (use a spyder simulator for some test). Avoid spyder traps generally would be the way to go. Hope it could help you.
Search engines don't run JavaScript, so if you want to have your content indexed you'll need to render your templates on the server as well. You can use handlebars in Node (server-side JS) to render your template there when the page request comes from a spider. It's more work but it's possible. Github, google plus, and twitter all do something similar.
You could use Distal templates which puts the templates as part of the HTML for SEO.
See Spiderable for a temporary solution Meteor project (which uses Handlebars.js) uses for SEO purposes.
http://docs.meteor.com/#spiderable
Does the bot index the generated content (as intended) or only the source with the ugly javascript pseudo-variables?
Neither, because indexer bots don't run JavaScript and you don't serve up templates as HTML documents.
Build a site that works without JavaScript, then build on top of it.

Categories

Resources