SEO and AJAX (Twitter-style)

SEO and AJAX (Twitter-style) - javascript

Okay, so I'm trying to figure something out. I am in the planing stages of a site, and I want to implement "fetch data on scroll" via JQuery, much like Facebook and Twitter, so that I don't pull all data from the DB at once.
But I some problems regarding the SEO, how will Google be able to see all the data? Because the page will fetch more data automatically when the user scrolls, I can't include any links in the style of "go to page 2", I want Google to just index that one page.
Any ideas for a simple and clever solution?

Put links to page 2 in place.
Use JavaScript to remove them if you detect that your autoloading code is going to work.
Progressive enhancement is simply good practise.

You could use PHP (or another server-side script) to detect the user agent of webcrawlers you specifically want to target such as Googlebot.
In the case of a webcrawler, you would have to use non-JavaScript-based techniques to pull down the database content and layout the page. I would recommended not paginating the search-engine targeted content - assuming that you are not paginating the "human" version. The URLs discovered by the webcrawler should be the same as those your (human) visitors will visit. In my opinion, the page should only deviate from the "human" version by having more content pulled from the DB in one go.
A list of webcrawlers and their user agents (including Google's) is here:
http://www.useragentstring.com/pages/Crawlerlist/
And yes, as stated by others, don't reply on JavaScript for content you want see in search engines. In fact, it is quite frequently use where a developer doesn't something to appear in search engines.
All of this comes with the rider that it assumes you are not paginating at all. If you are, then you should use a server-side script to paginate you pages so that they are picked up by search engines. Also, remember to put sensible limits on the amout of your DB that you pull for the search engine. You don't want it to timeout before it gets the page.

Create a Google webmaster tools account, generate a sitemap for your site (manually, automatically or with a cronjob - whatever suits) and tell Google webmaster tools about it. Update the sitemap as your site gets new content. Google will crawl this and index your site.
The sitemap will ensure that all your content is discoverable, not just the stuff that happens to be on the homepage when the googlebot visits.
Given that your question is primarily about SEO, I'd urge you to read this post from Jeff Atwood about the importance of sitemaps for Stackoverflow and the effect it had on traffic from Google.
You should also add paginated links that get hidden by your stylesheet and are a fallback for when your endless-scroll is disabled by someone not using javascript. If you're building the site right, these will just be partials that your endless scroll loads anyway, so it's a no-brainer to make sure they're on the page.

Related

How to hide the source URL in an iFrame on Wordpress

I have been reading up on here about how to hide the source URL in an iFrame using Javascript and Jquery, but for some reason, I haven't found a solution that works so far with Wordpress.
So short and sweet, I am trying to hide a URL for competition reasons, I'm about to launch a new web page using an iFrame to display content from another website that I subscribe to and pay quite a bit of money for their services. So I don't want my competition to know about them and copy what I do.
What would be the best way that I could mask the HTML source link in the iFrame? I use the latest version of Wordpress to host my site.
Alternatively, I am open to more advanced solutions.
Thanks in advance.

I'm not stealing anything. I'm a businessman who pays taxes and I pay $1400 per year for the web service that I use and I don't want my competitors STEALING MY set up
Ok I'll take you at face value. As has been pointed out in comments, it's impossible to do this with plain HTML or Javascript. It simply can not be done. There is no way to prevent a user of a browser from seeing the URI's that are contacted. Period.
That being said you could:
Retrieve the HTML from the 3rd party with your web server using something like CURL in PHP.
Take the result of the above and send it out on a URL of your choosing.
Create HTML with an <iframe> or whatever else you want to display the content from the 3rd party and point it at your new URL on your web server.
The above is not trivial to do correctly and will require some knowledge and work on your part. One issue with the above method will be: passing along javascript, images, css, etc... correctly so that they appear to come from your server.

Using JS to create a "link" to a password protected site

Quite a few of the sites that the schools I work in use have user accounts to protect the content from people who haven't paid for it which means that the users (aged 5+) have to type in some pretty weird usernames/passwords before they can do their work.
I was wondering if it possible to use Javascript to create a page that would let me do something along the lines of:
Fetch the Login Page
Fill out the form
Submit It
Redirect the user to the site
1-3 would happen in the background without the user seeing it.
In most cases these accounts are shared and the details are on displays etc... in the classrooms so there is no issue with the details being publicly accessible.
I have used Mechanize in ruby before and would imagine a solution like it but running client side.
I know that some inspection of the target site will be needed but once I have an in-principle example I should be able to tailor it to each site later.

If you have a standardized browser, you should consider building a plugin for that browser, that's the easiest way to interact with the web pages. Otherwise you'll get into issues with anti-CSRF protections and cross-domain-policies.
As for the language, Chrome extensions are written in javascript and are pretty easy to build. For the other browsers I don't know.

How can I make sure search engines can see the content on my "single page" Backbone.js site?

So I have to build a website ideally based on Backbone.js. This website will be a sort of complex gallery, lets say hosted at www.example.com, and I need every content which I open from this gallery to be searchable on google, lets say for example www.example.com/content/contentIDNumber. So I use the router class to define this route and handle page change as you normally would with backbone. All fine til here.
The gallery will be filled with an infinite list of dynamically loaded content. The content is created via a custom CMS, so we won't really be able to predict the list of pages, or create a sitemap in advance or something like that.
This said, I know I can easily change title and description of the html container dynamically when I visit that new page, but will this be enough for the site to show up on google? My client hasn't requested that we actually do proper SEO, they just want to know that specific pages will show up on google if searched. So if the title of the www.example.com/content/contentIDNumber page is "chihuahua specialties" they just want to know that searching for example.com chihuahua specialties they'll find it on google.
Sorry if I didn't exaplain myself too well, hope someone can help!
Please ask if anything isn't clear.

If you want the content to be indexed by search engines then you need to have a real URL for each page, and for the page to load the important content even if JavaScript is not available.
This will involve replicating your Backbone logic server-side.
Then, when you update the view with JS, use the history API to change the URL to the one which will generate the same view server side.
Note that the Google #! approach is a hack that predates the history API.

Twitter Cards using Backbone's HTML5 History

I'm working on a web app which uses Backbone's HTML5 History option. In order to avoid having to code everything on the client and on the server, I'm using this method to route every request to index.html
I was wondering if there is a way to get Twitter Cards to work with this setup, as currently it can't read the page as everything is loaded in dynamically with Javascript.
I was thinking about using User Agents to detect whether it's the TwitterBot, and if it is, serving a static version of the page with the required meta-tags. Would this work?
Thanks.

Yes.
At one job we did this for all the SEO/search/facebook stuff etc.
We would sniff the user-agent, and if it was one of the following sniffers
Facebook Open Graph
Google
Bing
Twitter
Yandex
(a few others I can't remember)
we would redirect to a special page that was written to dump all the relevant data about the page for SEO purposes into a nicely formatted (but completely unstyled) page.
This allowed us to retain our google index position and proper facebook sharing even though our site was a total single-page app in backbone.

Yes, serving a specific page for Twitterbot with the right meta data markup will work.
You can test your results while developing using the card's preview tool.
https://dev.twitter.com/docs/cards/preview (with your static URL or just the tags).

hashbang vs hijax

Can anyone give me a good reason why not to use the hijax (Progressive enhancement) method in addition to the hashbang method google proposes? As far as i can see, the hijax method is still the better one:
it works for no-javascript browsers
all search engines can index
The only counter argument i found so far is when they click on a link in a search engine and you have javascript enabled you'll need to do a redirect to the javascript enabled version (with the #-tag).
For Google's hashbang version it's difficult to supply a no-javascript based version and Bing and Yahoo can't crawl your website.
Kind regards,
Daan

The "value allocation" answer isn't quite correct.
The question is regarding surfacing content for search engines. Hashbang is Google's answer for that. That said, a user (or another search engine or social network scraper that doesn't support hashbang) who doesn't have JS enabled will never see your content. Google can see it because they're the one's checking for hashbang.
Hijax, on the other hand, always allows non-JS users/bots to see your content because it does not rely on hash/hashbang. Hijax relies on standard query string parameters. This means your application must have back-end logic to render your content for non-JS user agents. In the end, with Hijax JS enabled users get the asynchronous experience and non-JS enabled users get full page loads.
Google continues to recommend Hijax. Hashbang is their offering for non-hijax apps already out there in the wild, and/or JS apps that don't have a back-end.
http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html
(see progressive enhancement section)

I think this is not an issue any more, since Bing (this means Yahoo as well) started crawling ajax pages employing google's hashbang proposal!
Lense about ajax-crawling in Bing

The reason is value allocation
Hijax
Ok lets say a user links to http://www.example.com/stuff#fluff
The link actually counts as a link to http://www.example.com/stuff#fluff but as
http://www.example.com/stuff#fluff and http://www.example.com/stuff are the same HTML content, google will canonicalize (summarize) the value allocation to http://www.example.com/stuff
Your site www.example.com/stuff/fluff that you communicated to non javascript clients (googlebot) does not come up in this whole process
Fazit: so basically a link to http://www.example.com/stuff#fluff is seen by google as a vote for http://www.example.com/stuff
Hashbang
A user links to http://www.example.com/stuff#!fluff
Googlebot interpretes it as www.example.com/stuff?_escaped_fragment_=fluff
And as it offers different content (i.e.: different content from www.example.com/stuff) google will not canonicalize (summerize) it with any other URL.
Google will display http://www.example.com/stuff#!fluff to it's users
Fazit: A link to http://www.example.com/stuff#!fluff is seen by google as a vote for www.example.com/stuff?_escaped_fragment_=fluff (but displayed to it's users as http://www.example.com/stuff#!fluff)

Use dual links (AJAX and normal links), they are compatible with Bing, Yahoo and others
Take a look to Single Page Interface and Search Engine Optimization

Have a look at this example http://www.amitpatil.me/create-gmail-like-app-using-html5-history-api-and-hashbang/

Develop Reference

JavaScript is the programming language of the Web.

SEO and AJAX (Twitter-style) - javascript

Put links to page 2 in place. Use JavaScript to remove them if you detect that your autoloading code is going to work. Progressive enhancement is simply good practise.

Related

How to hide the source URL in an iFrame on Wordpress

Using JS to create a "link" to a password protected site

How can I make sure search engines can see the content on my "single page" Backbone.js site?

Twitter Cards using Backbone's HTML5 History

hashbang vs hijax

Categories

Resources