hashbang vs hijax - javascript

Can anyone give me a good reason why not to use the hijax (Progressive enhancement) method in addition to the hashbang method google proposes? As far as i can see, the hijax method is still the better one:
it works for no-javascript browsers
all search engines can index
The only counter argument i found so far is when they click on a link in a search engine and you have javascript enabled you'll need to do a redirect to the javascript enabled version (with the #-tag).
For Google's hashbang version it's difficult to supply a no-javascript based version and Bing and Yahoo can't crawl your website.
Kind regards,
Daan

The "value allocation" answer isn't quite correct.
The question is regarding surfacing content for search engines. Hashbang is Google's answer for that. That said, a user (or another search engine or social network scraper that doesn't support hashbang) who doesn't have JS enabled will never see your content. Google can see it because they're the one's checking for hashbang.
Hijax, on the other hand, always allows non-JS users/bots to see your content because it does not rely on hash/hashbang. Hijax relies on standard query string parameters. This means your application must have back-end logic to render your content for non-JS user agents. In the end, with Hijax JS enabled users get the asynchronous experience and non-JS enabled users get full page loads.
Google continues to recommend Hijax. Hashbang is their offering for non-hijax apps already out there in the wild, and/or JS apps that don't have a back-end.
http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html
(see progressive enhancement section)

I think this is not an issue any more, since Bing (this means Yahoo as well) started crawling ajax pages employing google's hashbang proposal!
Lense about ajax-crawling in Bing

The reason is value allocation
Hijax
Ok lets say a user links to http://www.example.com/stuff#fluff
The link actually counts as a link to http://www.example.com/stuff#fluff but as
http://www.example.com/stuff#fluff and http://www.example.com/stuff are the same HTML content, google will canonicalize (summarize) the value allocation to http://www.example.com/stuff
Your site www.example.com/stuff/fluff that you communicated to non javascript clients (googlebot) does not come up in this whole process
Fazit: so basically a link to http://www.example.com/stuff#fluff is seen by google as a vote for http://www.example.com/stuff
Hashbang
A user links to http://www.example.com/stuff#!fluff
Googlebot interpretes it as www.example.com/stuff?_escaped_fragment_=fluff
And as it offers different content (i.e.: different content from www.example.com/stuff) google will not canonicalize (summerize) it with any other URL.
Google will display http://www.example.com/stuff#!fluff to it's users
Fazit: A link to http://www.example.com/stuff#!fluff is seen by google as a vote for www.example.com/stuff?_escaped_fragment_=fluff (but displayed to it's users as http://www.example.com/stuff#!fluff)

Use dual links (AJAX and normal links), they are compatible with Bing, Yahoo and others
Take a look to Single Page Interface and Search Engine Optimization

Have a look at this example http://www.amitpatil.me/create-gmail-like-app-using-html5-history-api-and-hashbang/

Related

Dynamic web application without hashbang #!

How is it possible that web applications like Google Maps and Mixcloud update their urls without the use of a hashbang (also known as #!)?
Notice for example the coordinates right after the # sign in the URL while swiping the view in Google Maps. Or note that the music keeps playing while following some links.
I'm looking for a programmatic way to achieve the same functionality and I would also like to know how this works.
The HTML5 history API is a standardized way to manipulate the browser history via script. Part of this API — navigating the history — has been available in previous versions of HTML. The new parts in HTML5 include a way to add entries to the browser history, to visibly change the URL in the browser location bar (without triggering a page refresh), and an event that fires when those entries are removed from the stack by the user pressing the browser’s back button. This means that the URL in the browser location bar can continue to do its job as a unique identifier for the current resource, even in script-heavy applications that don’t ever perform a full page refresh.
Source: http://diveintohtml5.info/history.html
Have you taken a look at ASP.Net MVC? It uses the single page application concept. I'm not entirely sure what you're looking for but this is a good example: http://www.microsoftvirtualacademy.com/training-courses/introduction-to-asp-net-mvc
You also might want to look at AngularJs, which makes routing urls really easy.

Twitter Cards using Backbone's HTML5 History

I'm working on a web app which uses Backbone's HTML5 History option. In order to avoid having to code everything on the client and on the server, I'm using this method to route every request to index.html
I was wondering if there is a way to get Twitter Cards to work with this setup, as currently it can't read the page as everything is loaded in dynamically with Javascript.
I was thinking about using User Agents to detect whether it's the TwitterBot, and if it is, serving a static version of the page with the required meta-tags. Would this work?
Thanks.
Yes.
At one job we did this for all the SEO/search/facebook stuff etc.
We would sniff the user-agent, and if it was one of the following sniffers
Facebook Open Graph
Google
Bing
Twitter
Yandex
(a few others I can't remember)
we would redirect to a special page that was written to dump all the relevant data about the page for SEO purposes into a nicely formatted (but completely unstyled) page.
This allowed us to retain our google index position and proper facebook sharing even though our site was a total single-page app in backbone.
Yes, serving a specific page for Twitterbot with the right meta data markup will work.
You can test your results while developing using the card's preview tool.
https://dev.twitter.com/docs/cards/preview (with your static URL or just the tags).

How should I handle tracking fragment page views in Google Analytics?

I've been searching through the Google Analytics documentation, but I still don't understand how I should track page views for a single "page" site that uses ajax to reveal different views. I use shebang URLs and _escaped_fragment_ to help search engines understand the site layout, but our analytics guy told me to strip out the #! part of the URL when tracking, so when you visit mysite.com/#!/fish/bonker we would run:
_gaq.push(["_trackPageview", "/fish/bonker"]);
but that seems wrong to me. Wouldn't we want our tracked URLs to align with what Google actually spiders? Is there anything wrong with tracking _gaq.push(["_trackPageview", "#!/fish/bonker"]);?
It's important to recognize that there is a wall between Google Analytics and Google Search. There's no reason you would be penalized by having your URLs in one not correspond to what the other sees.
escaped_fragment is purely a semi-standard for crawlers seeking to crawl AJAX content.
By default, Google Analytics does the equivalent when you don't pass a custom pageview value:
_gaq.push(["_trackPageview", location.pathname+location.search]);
If you want to have it also track the anchor value, you can simply pass it on your own:
_gaq.push(["_trackPageview", location.pathname+location.search+location.hash]);
The benefit here is that the URLs will correspond with "real" URLs.
Long story short: You're perfectly fine doing your proposed method; I would prefer the latter (explicitly passing the actual location.hash, not a hacked version of it), but both work.

How to implement #! based links?

I always wondered how to instantly navigate through pages using # or #! in URLs. Many websites like Google are using it on http://www.google.com/nexus/ , when user click any of the links, nothing changes and things open instantly, only URL changes, for ex: www.example.com/#contact or www.example.com/#home
How can I do this with 8 of my pages? (Home, Features, Rates, Contact, Support)
You may want to take a look at a basic AJAX tutorial (such as http://marc.info/?l=php-general&m=112198633625636&w=2). The real reason the URLS use #! is to have them get indexed by google. If you want you AJAX'ed URLs to be indexed by Google, you'll have to implement support for _escaped_fragment_ (see: http://code.google.com/web/ajaxcrawling/docs/specification.html).
The only reason this is used, is to show the state of an AJAX enhanced page in the url. This way, you can copy and bookmark the url to come back to the same state.
Older browsers don't allow you to change the url in the address bar without the page being reloaded. The latest browsers do (search for PushState). To work around this, you can change the hash of the url. This is the part that is normally used to jump to an anchor, but you can use it for other purposes using JavaScript.
The ! isn't strictly necessary for this process. The ! is implemented by Google. It allows these urls to be indexed. Normally hashes aren't indexed separately, because they mark only a different part of the same page (anchor). But by adding the !, you create a shebang or hashbang, which is indexed by Google.
Without explaining everything here, you should find a lot of information when you search for Ajax, HashBang and PushState.
Addition: Check History.js. It is a wrapper for the PushState api, that falls back to using hashes on older browsers.

SEO and AJAX (Twitter-style)

Okay, so I'm trying to figure something out. I am in the planing stages of a site, and I want to implement "fetch data on scroll" via JQuery, much like Facebook and Twitter, so that I don't pull all data from the DB at once.
But I some problems regarding the SEO, how will Google be able to see all the data? Because the page will fetch more data automatically when the user scrolls, I can't include any links in the style of "go to page 2", I want Google to just index that one page.
Any ideas for a simple and clever solution?
Put links to page 2 in place.
Use JavaScript to remove them if you detect that your autoloading code is going to work.
Progressive enhancement is simply good practise.
You could use PHP (or another server-side script) to detect the user agent of webcrawlers you specifically want to target such as Googlebot.
In the case of a webcrawler, you would have to use non-JavaScript-based techniques to pull down the database content and layout the page. I would recommended not paginating the search-engine targeted content - assuming that you are not paginating the "human" version. The URLs discovered by the webcrawler should be the same as those your (human) visitors will visit. In my opinion, the page should only deviate from the "human" version by having more content pulled from the DB in one go.
A list of webcrawlers and their user agents (including Google's) is here:
http://www.useragentstring.com/pages/Crawlerlist/
And yes, as stated by others, don't reply on JavaScript for content you want see in search engines. In fact, it is quite frequently use where a developer doesn't something to appear in search engines.
All of this comes with the rider that it assumes you are not paginating at all. If you are, then you should use a server-side script to paginate you pages so that they are picked up by search engines. Also, remember to put sensible limits on the amout of your DB that you pull for the search engine. You don't want it to timeout before it gets the page.
Create a Google webmaster tools account, generate a sitemap for your site (manually, automatically or with a cronjob - whatever suits) and tell Google webmaster tools about it. Update the sitemap as your site gets new content. Google will crawl this and index your site.
The sitemap will ensure that all your content is discoverable, not just the stuff that happens to be on the homepage when the googlebot visits.
Given that your question is primarily about SEO, I'd urge you to read this post from Jeff Atwood about the importance of sitemaps for Stackoverflow and the effect it had on traffic from Google.
You should also add paginated links that get hidden by your stylesheet and are a fallback for when your endless-scroll is disabled by someone not using javascript. If you're building the site right, these will just be partials that your endless scroll loads anyway, so it's a no-brainer to make sure they're on the page.

Categories

Resources