SEO and the use of !# in a url

SEO and the use of !# in a url - javascript

I read somewhere about how you can create a website that loads each section of a page with AJAX while still providing SEO. It had something to do with the use of !# in a url. Similar to what twitter does. I can't seem to find anything about it anywhere. Does anyone know what I'm talking about?

Is this what you are looking for:
Quoting:
Slightly modify the URL fragments for stateful AJAX pages
Stateful AJAX pages display the same content whenever accessed directly. These are pages that could be referred to in search results. Instead of a URL like http://example.com/page?query#state we would like to propose adding a token to make it possible to recognize these URLs: http://example.com/page?query#[FRAGMENTTOKEN]state. Based on a review of current URLs on the web, we propose using "!" (an exclamation point) as the token for this. The proposed URL that could be shown in search results would then be: http://example.com/page?query#!state.

#! is called a "hashbang" and they are the root of all that is evil in web development.
Basically, weak web developers decided to use #anchor names as a kludgy hack to get "web 2.0" things to work on their page, then complained to google that their page rank suffered. Google made a work around to their kludge by enabling the hashbang.
Weak web developers took this work around as gospel. Don't use it. It is a crutch.
Web development that depends on hashbangs is web-development done wrong.
This article is far more well worded than I could ever be, and deals with the Gawker media fiasco from their migration to a (failed) hashbang centric website. It tells you WHAT is happening and why it's bad.
http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs

Take a look at this article:
http://eliperelman.com/blog/2011/10/06/handling-googles-ajax-crawling-hashbang-number-navigation-in-asp-dot-net-mvc-3/
It explains the implementation of hashbang navigation, allowing google to index your site.

Check Modify Address Bar URL in AJAX App to Match Current State, also this can be done from Flash and Ajax with http://www.asual.com/swfaddress .

I believe you are looking for this forum question here: http://www.google.com/support/forum/p/Webmasters/thread?tid=55f82c8e722ecbf2&hl=en

Related

How do you use React.js for SEO?

Articles on React.js like to point out, that React.js is great for SEO purposes. Unfortunately, I've never read, how you actually do it.
Do you simply implement _escaped_fragment_ as in https://developers.google.com/webmasters/ajax-crawling/docs/getting-started and let React render the page on the server, when the url contains _escaped_fragment_, or is there more to it?
Being able not to rely on _escaped_fragment_ would be great, as probably not all potentially crawling sites (e.g. in sharing functionalities) implement _escaped_fragment_.

I'm pretty sure anything you've seen promoting React as being good for SEO has to do with being able to render the requested page on the server, before sending it to the client. So it will be indexed just like any other static page, as far as search engines are concerned.
Server rendering made possible via ReactDOMServer.renderToString. The visitor will receive the already rendered page of markup, which the React application will detect once it has downloaded and run. Instead of replacing the content when ReactDOM.render is called, it will just add the event bindings. For the rest of the visit, the React application will take over and further pages will be rendered on the client.
If you are interested in learning more about this, I suggest searching for "Universal JavaScript" or "Universal React" (formerly known as "isomorphic react"), as this is becoming the term for JavaScript applications that use a single code base to render on both the server and client.

As the other responder said, what you are looking for is an Isomorphic approach. This allows the page to come from the server with the rendered content that will be parsed by search engines. As another commenter mentioned, this might make it seem like you are stuck using node.js as your server-side language. While it is true that have javascript run on the server is needed to make this work, you do not have to do everything in node. For example, this article discusses how to achieve an isomorphic page using Scala and react:
Isomorphic Web Design with React and Scala
That article also outlines the UX and SEO benefits of this sort of isomorphic approach.

Two nice example implementations:
https://github.com/erikras/react-redux-universal-hot-example: Uses Redux, my favorite app state management framework
https://github.com/webpack/react-starter: Uses Flux, and has a very elaborate webpack setup.
Try visiting https://react-redux.herokuapp.com/ with javascript turned on and off, and watch the network in the browser dev tools to see the difference…

Going to have to disagree with a lot of the answers here since I managed to get my client-side React App working with googlebot with absolutely no SSR.
Have a look at the SO answer here. I only managed to get it working recently but I can confirm that there are no problems so far and googlebot can actually perform the API calls and index the returned content.

It is also possible via ReactDOMServer.renderToStaticMarkup:
Similar to renderToString, except this doesn't create extra DOM
attributes such as data-react-id, that React uses internally. This is
useful if you want to use React as a simple static page generator, as
stripping away the extra attributes can save lots of bytes.

There is nothing you need to do if you care about your site's rank on Google, because Google's crawler could handle JavaScript very well! You can check your site's SEO result by search site:your-site-url.
If you also care about your site's rank such as Baidu, and your sever side implemented by PHP, maybe you need this: react-php-v8js.

Will Javascript URLs hurt SEO?

I'm making a word cloud with the jQuery plugin jQCloud, in which each word in the cloud is associated with a URL. I want each of those URLs crawled and indexed by Google/Bing.
jQCloud takes a hash specifying the word, rank, and URL. So if the bots read the JavaScript they will read the URL, but there will be no HREF without the JavaScript being rendered.
Based on Google's SEO documentation, I presume the bots won't index those URLs. Is this right? If so, what would be the most SEO-friendly approach to this wordcloud?

In short yes. Search bots are not going to bother parsing your JS because you could not be bothered to provide static accessible content.

Don't use "javascript URLs", they are an anti-accessibility feature. Some reading:
Broken Links
Hash, Bang, Wallop.
Breaking the Web with hash-bangs
Going Postel

That's why some browsers already implemented HTML5 PushState API, which uses the original URL but has the capability of understanding if it's Ajax or not and enables browser's navigation buttons (back/next).
Give a look at History.js project, a wrapper to help you use the API.

One possibility is to have your cloud degrade gracefully. For example, you could have a static list of your links created server-side with the page; if JavaScript is enabled, you could replace this list with your prettier cloud.
This has a benefit apart from being more transparent to search engines: people with JavaScript off will be able to see your links and it will improve accessibility.

Versioning and routing for single-page-app?

Background
I'm working on a an educational JavaScript application/site (SPA) that will eventually have 1000s of dynamic urls that I'd like to make crawlable.
I'm now researching how to implement versioning, routing and seo (and i18n).
My general idea is to use hashbangs and have resources like:
example.com/#!/v1?page=story1&country=denmark&year=1950
The "page" parameter here decides which controllers/views that need to be loaded and the subsequent parameters could instruct controllers to load corresponding content.
Versioning of parameters could then be handled by just replacing the "v1" part of the url - and have a specific route handler mapping deprecated parameters for each version.
SEO would be improved by having node.js or other backend delivering an "escaped fragment" version of the content.
i18n should probably be handled by node.js as well? This way, what gets delivered to the crawler is already translated?
Is this a viable approach to making a Single-page-application versioned and crawlable?
I'm using Backbone.js now, what would you add to the mix to help out with the above?

1) Hell no. (well it could work, but designing your application from the ground up with hashbangs is a bad idea)
2) node.js and backbone are a good combination. Personally I like express for routing/templating on the server.
--The argument against hashbangs:There is so much good information on the web that I will defer to them.
here: http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs
here: http://www.pixelflips.com/blog/do-clean-urls-still-matter/
and this wonderful library: https://github.com/browserstate/History.js/
and this wiki page from that library: https://github.com/browserstate/history.js/wiki/Intelligent-State-Handling
Than check out this chrome extension that will ajax StackOverflow(or any other site with normal urls) using that library: https://chrome.google.com/webstore/detail/oikegcanmmpmcmbkdopcfdlbiepmcebg
Are 15 parameters absolutely necessary? Put the content parameters(page, country) in the url and the presentational (ie: sortby=author) in the query string.
in response to "You are still stuck with hash tag serialization" I give this:
every route should point to a valid resource location. ie: /v1/page/denmark/some-slug-for-post should be a resource location and when you change it to a new post/page, that should be a resource location too. What I'm saying is that if you can't use the url to bookmark the page, than the implementation is broken.
Also, are you planning on breaking all your links with every version? I'm not sure why you're including the version in the url.
I hope this helps.

In answer to number 1, the requirement is that all "pages" have a unique URL and can be found and viewed without JavaScript.
You MUST make a robots.txt that lists all your unique URLs or have a site map somewhere so the crawlers can find all URLs
I'm not sure exactly what yo mean by SEO in this context. It seems like you are suggesting you will give different content to the crawlers than to the browsers. Typically not a great idea unless your site is so dynamic there is no other way.

How does disqus work?

Does anyone know how disqus works?
It manages comments on a blog, but the comments are all held on third-party site. Seems like a neat use of cross-site communication.

The general pattern used is JSONP
Its actually implemented in a fairly sophisticated way (at least on the jQuery site) ... they defer the loading of the disqus.js and thread.js files until the user scrolls to the comment section.
The thread.js file contains json content for the comments, which are rendered into the page after its loaded.

You have three options when adding Disqus commenting to a site:
Use one of the many integrated solutions (WordPress, Blogger, Tumblr, etc. are supported)
Use the universal JavaScript code
Write your own code to communicate with the Disqus API
The main advantage of the integrated solutions is that they're easy to set up. In the case of WordPress, for example, it's as easy as activating a plug-in.
Having the ability to communicate with the API directly is very useful, and offers two advantages over the other options. First, it gives you as the developer complete control over the markup. Secondly, you're able to process comments server-side, which may be preferable.

Looks like that using easyXDM library, which uses the best available way for current browser to communicate with other site.

Quoting Anton Kovalyov's (former engineer at Disqus) answer to the same question on a different site that was really helpful to me:
Disqus is a third-party JavaScript application that runs in your browser and injects itself on publishers' websites. These publishers need to install a small snippet of JavaScript code that makes the first request to our servers and loads initial JavaScript loader. This loader then creates all necessary iframe elements, gets the data from our servers, renders templates and injects the result into some element on the page.
As you can probably guess there are quite a few different technologies supporting what seems like a simple operation. On the back-end you have to run and scale a gigantic web application that serves millions of requests (mostly read). We use Python, Django, PostgreSQL and Redis (for our realtime service).
On the front-end you have to minimize your payload, make sure your app is super fast and that it doesn't break in extremely hostile environments (you will be surprised how screwed up publisher websites can be). Cross-domain communication—ability to send messages from hosting website to your servers—can be tricky as well.
Unfortunately, it is impossible to explain how everything works in a comment on Quora, or even in an article. So if you're interested in the back-end side of Disqus just learn how to write, run and operate highly-scalable websites and you'll be golden. And if you're interested in the front-end side, Ben Vinegar and myself (both front-end engineers at Disqus) wrote a book on the topic called Third-party JavaScript (http://thirdpartyjs.com/).
I'm planning to read the book he mentioned, I guess it will be quite helpful.
Here's also a link to the official answer to this question on the Disqus site.

short answer? AJAX, you get your own url eg "site.com/?comments=ID" included via javascript... but with real time updates like that you would need a polling server.

I think they keep the content on their site and your site will only send & receive the data to/from disqus. Now I wonder what happens if you decide that you want to bring your commenting in house without losing all existing comments!. How easy would you get to your data I wonder? They claim that the data belongs to you, but they have the control over it, and there is not much explanation on their site about this.

I'm always leaving comment in disqus platform. Sometimes, comment seems to be removed once you refreshed it and sometimes it's not. I think the one that was removed are held for moderation without saying it.

URL Redirects for SEO (in Flash)?

I am creating a flash site and am trying to make it SEO. I'm thinking a possible solution would be to render html to any search engine bot, or to anyone who needs accessibility, and rendering the flash site for the rest of the users.
First question is, is this acceptable for google, and SEO in general?
This would mean I would redirect urls to flash users from site.com/home.html to site.com/#/home only if they weren't a bot of some sort.
Second question is, is it possible to do this in javascript or rails?
I would do this by capturing the URL, checking to see who the user is (is it google, or is it a human), I'm just not sure how to do this with javascript/rails, whatever need be. Then once I found "hey this is google", I would return the html page; if it was a user, I'd return flash.
Would that work? Is there something better?

It'd be worth reading up on Google's policies toward cloaking, sneaky Javascript redirects, and doorway pages.
Personally, I'd build the site in HTML and use the Flash for progressive enhancement where appropriate.

Its not doable in javascript, because javascript is executed after the page is sent, so the damage is already done.
Your webserver would have to recognize the google useragent when the page request is made, and serve a different page accordingly. Then you can avoid the whole redirect nonsense entirely. I know you can configure most webservers to do that, however I do not know the required steps, and it depends on what webserver you are using.
I'm not going to comment on the merits/demerits of flash based websites.

This is a form of SEO called cloaking that's widely considered unscrupulous (though your intended use doesn't sound malicious to me). It can get you banned by Google.

Have you looked in to using SWFAddress?

The flash framework, Gaia, uses separate xhtml pages for it's SEO solution. From it's site:
"The Search Engine Optimization Scaffolding engine in Gaia creates an XHTML file for every page you specify in the site.xml, as well as a sitemap.xml file that follows sitemaps.org protocols.
The purpose of SEO Scaffolding is to provide search engines and non-Flash users with easy access to the content on your site, as well a convenient single data source for the copy on your site, organized by page.
This technique is white hat compliant, and is discussed on the Gaia forums."
More information here: http://www.gaiaflashframework.com/wiki/index.php?title=SEO

Develop Reference

JavaScript is the programming language of the Web.