I've been searching through the Google Analytics documentation, but I still don't understand how I should track page views for a single "page" site that uses ajax to reveal different views. I use shebang URLs and _escaped_fragment_ to help search engines understand the site layout, but our analytics guy told me to strip out the #! part of the URL when tracking, so when you visit mysite.com/#!/fish/bonker we would run:
_gaq.push(["_trackPageview", "/fish/bonker"]);
but that seems wrong to me. Wouldn't we want our tracked URLs to align with what Google actually spiders? Is there anything wrong with tracking _gaq.push(["_trackPageview", "#!/fish/bonker"]);?
It's important to recognize that there is a wall between Google Analytics and Google Search. There's no reason you would be penalized by having your URLs in one not correspond to what the other sees.
escaped_fragment is purely a semi-standard for crawlers seeking to crawl AJAX content.
By default, Google Analytics does the equivalent when you don't pass a custom pageview value:
_gaq.push(["_trackPageview", location.pathname+location.search]);
If you want to have it also track the anchor value, you can simply pass it on your own:
_gaq.push(["_trackPageview", location.pathname+location.search+location.hash]);
The benefit here is that the URLs will correspond with "real" URLs.
Long story short: You're perfectly fine doing your proposed method; I would prefer the latter (explicitly passing the actual location.hash, not a hacked version of it), but both work.
Related
I'm reading this specification which is an agreement between web servers and search engine crawlers that allows for dynamically created content to be visible to crawlers.
It's stated there that in order for a crawler to index html5 application one must implement routing using #! in URLs. In angular html5mode(true) we get rid of this hashed part of the URL. I'm wondering whether this is going to prevent crawlers from indexing my website.
Short answer - No, html5mode will not mess up your indexing, but read on.
Important note: Both Google and Bing can crawl AJAX based content without HTML snapshots
I know, the documentation you link to says otherwise but about a year or two ago they officially announced that they handle AJAX content without the need for HTML snapshots, as long as you use pushstates, but a lot of the documentation is old and unfortunately not updated.
SEO using pushstates
The requirement for AJAX crawling to work out of the box is that you are changing your url using pushstates. This is just what html5mode in Angular does (and also what a lot of other frameworks do). When pushstates is on the crawlers will wait for ajax calls to finish and for javascript to update the page before they index it. You can even update things like page-title or even meta tags in your router and it will index properly. In essence you don't need to do anything, there is no difference between server-side and client-side rendered sites in this case.
To be clear, a lot of SEO-analysis tools (such as Moz) will spit out warnings on pages using pushstates. That's because those tools (and their reps if you talk to them) are at the time of writing not up to date, so ignore them.
Finally, make sure you are not using the fragment meta-tag from below when doing this. If you have that tag the crawlers will think that you want to use the non-pushstates method and things might get messed up.
SEO without pushstates
There is very little reason not to use pushstates with Angular, but if you don't you need to follow the guidelines linked to in the question. In short you create snapshots of the html on your server and then you use the fragment meta tag to change your url-fragment to be "#!" instead of "#".
<meta name="fragment" content="!" />
When a crawler finds a page like this it will remove the fragment part of the url and instead requests the url with the parameter _escaped_fragment_, and you can serve your snapshotted page in response. Giving the crawler a normal static page to index.
Note that the fragment meta-tag should only be used if you want to trigger this behaviour. If you are using pushstates and want the page to index that way, don't use this tag.
Also, when using snapshots in Angular you can have html5mode on. In html5mode the fragment is hidden but it is still technically exists and will still trigger the same behaviour, assuming the fragment meta-tag is set.
A warning - Facebook crawler
While both Google and Bing will crawl your AJAX pages without problem (if you are using pushstates), Facebook will not. Facebook does not understand ajax-content and still requires special solutions, like html snapshots served specifically to the facebook bot (user agent facebookexternalhit/1.1).
Edit - I should probably mention that I have deployed sites with all of these versions. Both with html5mode, fragment meta tag and snapshots and without any snapshots and just relying on the pushstate-crawling. It all works fine, except for pushstates and Facebook as noted above.
To allow indexing of your AJAX application, you have to add special meta tag in the head section of your document:
<meta name="fragment" content="!" />
Source:
https://docs.angularjs.org/guide/$location#crawling-your-app
Towards the bottom look for crawling your app
I always wondered how to instantly navigate through pages using # or #! in URLs. Many websites like Google are using it on http://www.google.com/nexus/ , when user click any of the links, nothing changes and things open instantly, only URL changes, for ex: www.example.com/#contact or www.example.com/#home
How can I do this with 8 of my pages? (Home, Features, Rates, Contact, Support)
You may want to take a look at a basic AJAX tutorial (such as http://marc.info/?l=php-general&m=112198633625636&w=2). The real reason the URLS use #! is to have them get indexed by google. If you want you AJAX'ed URLs to be indexed by Google, you'll have to implement support for _escaped_fragment_ (see: http://code.google.com/web/ajaxcrawling/docs/specification.html).
The only reason this is used, is to show the state of an AJAX enhanced page in the url. This way, you can copy and bookmark the url to come back to the same state.
Older browsers don't allow you to change the url in the address bar without the page being reloaded. The latest browsers do (search for PushState). To work around this, you can change the hash of the url. This is the part that is normally used to jump to an anchor, but you can use it for other purposes using JavaScript.
The ! isn't strictly necessary for this process. The ! is implemented by Google. It allows these urls to be indexed. Normally hashes aren't indexed separately, because they mark only a different part of the same page (anchor). But by adding the !, you create a shebang or hashbang, which is indexed by Google.
Without explaining everything here, you should find a lot of information when you search for Ajax, HashBang and PushState.
Addition: Check History.js. It is a wrapper for the PushState api, that falls back to using hashes on older browsers.
Okay, so I'm trying to figure something out. I am in the planing stages of a site, and I want to implement "fetch data on scroll" via JQuery, much like Facebook and Twitter, so that I don't pull all data from the DB at once.
But I some problems regarding the SEO, how will Google be able to see all the data? Because the page will fetch more data automatically when the user scrolls, I can't include any links in the style of "go to page 2", I want Google to just index that one page.
Any ideas for a simple and clever solution?
Put links to page 2 in place.
Use JavaScript to remove them if you detect that your autoloading code is going to work.
Progressive enhancement is simply good practise.
You could use PHP (or another server-side script) to detect the user agent of webcrawlers you specifically want to target such as Googlebot.
In the case of a webcrawler, you would have to use non-JavaScript-based techniques to pull down the database content and layout the page. I would recommended not paginating the search-engine targeted content - assuming that you are not paginating the "human" version. The URLs discovered by the webcrawler should be the same as those your (human) visitors will visit. In my opinion, the page should only deviate from the "human" version by having more content pulled from the DB in one go.
A list of webcrawlers and their user agents (including Google's) is here:
http://www.useragentstring.com/pages/Crawlerlist/
And yes, as stated by others, don't reply on JavaScript for content you want see in search engines. In fact, it is quite frequently use where a developer doesn't something to appear in search engines.
All of this comes with the rider that it assumes you are not paginating at all. If you are, then you should use a server-side script to paginate you pages so that they are picked up by search engines. Also, remember to put sensible limits on the amout of your DB that you pull for the search engine. You don't want it to timeout before it gets the page.
Create a Google webmaster tools account, generate a sitemap for your site (manually, automatically or with a cronjob - whatever suits) and tell Google webmaster tools about it. Update the sitemap as your site gets new content. Google will crawl this and index your site.
The sitemap will ensure that all your content is discoverable, not just the stuff that happens to be on the homepage when the googlebot visits.
Given that your question is primarily about SEO, I'd urge you to read this post from Jeff Atwood about the importance of sitemaps for Stackoverflow and the effect it had on traffic from Google.
You should also add paginated links that get hidden by your stylesheet and are a fallback for when your endless-scroll is disabled by someone not using javascript. If you're building the site right, these will just be partials that your endless scroll loads anyway, so it's a no-brainer to make sure they're on the page.
Can anyone give me a good reason why not to use the hijax (Progressive enhancement) method in addition to the hashbang method google proposes? As far as i can see, the hijax method is still the better one:
it works for no-javascript browsers
all search engines can index
The only counter argument i found so far is when they click on a link in a search engine and you have javascript enabled you'll need to do a redirect to the javascript enabled version (with the #-tag).
For Google's hashbang version it's difficult to supply a no-javascript based version and Bing and Yahoo can't crawl your website.
Kind regards,
Daan
The "value allocation" answer isn't quite correct.
The question is regarding surfacing content for search engines. Hashbang is Google's answer for that. That said, a user (or another search engine or social network scraper that doesn't support hashbang) who doesn't have JS enabled will never see your content. Google can see it because they're the one's checking for hashbang.
Hijax, on the other hand, always allows non-JS users/bots to see your content because it does not rely on hash/hashbang. Hijax relies on standard query string parameters. This means your application must have back-end logic to render your content for non-JS user agents. In the end, with Hijax JS enabled users get the asynchronous experience and non-JS enabled users get full page loads.
Google continues to recommend Hijax. Hashbang is their offering for non-hijax apps already out there in the wild, and/or JS apps that don't have a back-end.
http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html
(see progressive enhancement section)
I think this is not an issue any more, since Bing (this means Yahoo as well) started crawling ajax pages employing google's hashbang proposal!
Lense about ajax-crawling in Bing
The reason is value allocation
Hijax
Ok lets say a user links to http://www.example.com/stuff#fluff
The link actually counts as a link to http://www.example.com/stuff#fluff but as
http://www.example.com/stuff#fluff and http://www.example.com/stuff are the same HTML content, google will canonicalize (summarize) the value allocation to http://www.example.com/stuff
Your site www.example.com/stuff/fluff that you communicated to non javascript clients (googlebot) does not come up in this whole process
Fazit: so basically a link to http://www.example.com/stuff#fluff is seen by google as a vote for http://www.example.com/stuff
Hashbang
A user links to http://www.example.com/stuff#!fluff
Googlebot interpretes it as www.example.com/stuff?_escaped_fragment_=fluff
And as it offers different content (i.e.: different content from www.example.com/stuff) google will not canonicalize (summerize) it with any other URL.
Google will display http://www.example.com/stuff#!fluff to it's users
Fazit: A link to http://www.example.com/stuff#!fluff is seen by google as a vote for www.example.com/stuff?_escaped_fragment_=fluff (but displayed to it's users as http://www.example.com/stuff#!fluff)
Use dual links (AJAX and normal links), they are compatible with Bing, Yahoo and others
Take a look to Single Page Interface and Search Engine Optimization
Have a look at this example http://www.amitpatil.me/create-gmail-like-app-using-html5-history-api-and-hashbang/
I'm using google maps for a website. All the data available via the map on the home page is also listed on other pages throughout the site, but with javascript turned on these pages redirect to the home page: eg http://host.com/county/essex redirects to http://host.com/#county/essex, which makes the map load all the data for Essex.
I'd like Google to index all the pages on my site, as otherwise none of the data included in the map will be searchable. But I know that for very good reasons Google doesn't normally index pages which get redirected. Is there a way to get Google to index my pages. Or, failing that, is there a way to submit all the data to google in some other way?
*edit
All my pages are linked to from my main navigation (it's just that javascript overrides the default link action)... the upshot being that there should be no need for a sitemap as all the pages are discoverable by google bot using normal link following
Regarding submitting data to google, there is a thing called Google sitemaps that is supposed to notify google of all URIs/locations that exist on a given site and are/should be indexed. Truth is, however, that sites that aren't crawlable by default rarely benefit much from the aforementioned sitemap.
Have a look at site Maps it allows you to include urls you need indexed.
You write:
with javascript turned on these pages redirect to the home page
Google bot doesn't execute javascript. If your redirects are made in javascript, Google bot will simple index the page and ignore the redirect.