Versioning and routing for single-page-app? - javascript

Background
I'm working on a an educational JavaScript application/site (SPA) that will eventually have 1000s of dynamic urls that I'd like to make crawlable.
I'm now researching how to implement versioning, routing and seo (and i18n).
My general idea is to use hashbangs and have resources like:
example.com/#!/v1?page=story1&country=denmark&year=1950
The "page" parameter here decides which controllers/views that need to be loaded and the subsequent parameters could instruct controllers to load corresponding content.
Versioning of parameters could then be handled by just replacing the "v1" part of the url - and have a specific route handler mapping deprecated parameters for each version.
SEO would be improved by having node.js or other backend delivering an "escaped fragment" version of the content.
i18n should probably be handled by node.js as well? This way, what gets delivered to the crawler is already translated?
Is this a viable approach to making a Single-page-application versioned and crawlable?
I'm using Backbone.js now, what would you add to the mix to help out with the above?

1) Hell no. (well it could work, but designing your application from the ground up with hashbangs is a bad idea)
2) node.js and backbone are a good combination. Personally I like express for routing/templating on the server.
--The argument against hashbangs:There is so much good information on the web that I will defer to them.
here: http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs
here: http://www.pixelflips.com/blog/do-clean-urls-still-matter/
and this wonderful library: https://github.com/browserstate/History.js/
and this wiki page from that library: https://github.com/browserstate/history.js/wiki/Intelligent-State-Handling
Than check out this chrome extension that will ajax StackOverflow(or any other site with normal urls) using that library: https://chrome.google.com/webstore/detail/oikegcanmmpmcmbkdopcfdlbiepmcebg
Are 15 parameters absolutely necessary? Put the content parameters(page, country) in the url and the presentational (ie: sortby=author) in the query string.
in response to "You are still stuck with hash tag serialization" I give this:
every route should point to a valid resource location. ie: /v1/page/denmark/some-slug-for-post should be a resource location and when you change it to a new post/page, that should be a resource location too. What I'm saying is that if you can't use the url to bookmark the page, than the implementation is broken.
Also, are you planning on breaking all your links with every version? I'm not sure why you're including the version in the url.
I hope this helps.

In answer to number 1, the requirement is that all "pages" have a unique URL and can be found and viewed without JavaScript.
You MUST make a robots.txt that lists all your unique URLs or have a site map somewhere so the crawlers can find all URLs
I'm not sure exactly what yo mean by SEO in this context. It seems like you are suggesting you will give different content to the crawlers than to the browsers. Typically not a great idea unless your site is so dynamic there is no other way.

Related

Client-side Website Localization Using URL Path

I'm working on localizing a website that I recently built - https://xmllint.com
The project is rather small, and I mostly use it to teach myself javascript along with Webpack and other web-related technologies/frameworks.
The website is 100% browser-based and does not have a lot of content. For that reason, I decided to go with this approach to translate the content itself.
The replacement of the placeholders with the 'real' content happens via javascript that is at the bottom of the HTML. Ultimately I want to have the content ready before the page renders. Just so that that search engines can index the new pages nicely.
What I want to achieve is that the page itself detects the language code (e.g., https://xmllint.com/es/ for Spanish) from the URL and then performs the translation based on that value.
What I'm struggling with is how to handle the part of the URL in the web page itself as the directory itself does not exist on the server directly.
So far, I tried redirecting all HTTP 404 codes to the index.html file itself (on the hosting side) - As suggested for SPAs.
This leads me to problems loading the resources as the relative paths now include the language code part of the URL.
Two ideas came to mind.
Improve the current Webpack build so that I only deliver a single file including all assets. That way I would not have problems with relative paths and I should be good. (Is Single page application just one page using for entire web application?)
Should I introduce a routing framework like Vue?
What I'm not asking for is
How to parse the URL itself.
For SEO reasons I also don't want to use URL parameters.
Hacky ideas or workarounds. I have no time pressure and want to know how this is done best.
Any help/ideas are greatly appreciated.
Under the circumstances that you have no time pressure, I'd personally recommend to use a JavaScript framework - or more specifically - Vue.js. Since you already mentioned it, I assume you have basic knowledge of it.
I see various ways to benefit from choosing this path:
The actual problem you're facing will no longer be an issue. The application will handle all the routing, so all you have to do is return the index.html and you're good to go
The developer experience (build process, hot reload, deployment, ...) will dramatically improve your daily work
Your bundle size will very likely reduce
You're prepared for future growth of your application
Best of all: you're challenging yourself by using a technology you probably have not much experience with. Speaking for myself, that should be reason enough. :-)
Happy coding!

SEO and the use of !# in a url

I read somewhere about how you can create a website that loads each section of a page with AJAX while still providing SEO. It had something to do with the use of !# in a url. Similar to what twitter does. I can't seem to find anything about it anywhere. Does anyone know what I'm talking about?
Is this what you are looking for:
Quoting:
Slightly modify the URL fragments for stateful AJAX pages
Stateful AJAX pages display the same content whenever accessed directly. These are pages that could be referred to in search results. Instead of a URL like http://example.com/page?query#state we would like to propose adding a token to make it possible to recognize these URLs: http://example.com/page?query#[FRAGMENTTOKEN]state. Based on a review of current URLs on the web, we propose using "!" (an exclamation point) as the token for this. The proposed URL that could be shown in search results would then be: http://example.com/page?query#!state.
#! is called a "hashbang" and they are the root of all that is evil in web development.
Basically, weak web developers decided to use #anchor names as a kludgy hack to get "web 2.0" things to work on their page, then complained to google that their page rank suffered. Google made a work around to their kludge by enabling the hashbang.
Weak web developers took this work around as gospel. Don't use it. It is a crutch.
Web development that depends on hashbangs is web-development done wrong.
This article is far more well worded than I could ever be, and deals with the Gawker media fiasco from their migration to a (failed) hashbang centric website. It tells you WHAT is happening and why it's bad.
http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs
Take a look at this article:
http://eliperelman.com/blog/2011/10/06/handling-googles-ajax-crawling-hashbang-number-navigation-in-asp-dot-net-mvc-3/
It explains the implementation of hashbang navigation, allowing google to index your site.
Check Modify Address Bar URL in AJAX App to Match Current State, also this can be done from Flash and Ajax with http://www.asual.com/swfaddress .
I believe you are looking for this forum question here: http://www.google.com/support/forum/p/Webmasters/thread?tid=55f82c8e722ecbf2&hl=en

Will Javascript URLs hurt SEO?

I'm making a word cloud with the jQuery plugin jQCloud, in which each word in the cloud is associated with a URL. I want each of those URLs crawled and indexed by Google/Bing.
jQCloud takes a hash specifying the word, rank, and URL. So if the bots read the JavaScript they will read the URL, but there will be no HREF without the JavaScript being rendered.
Based on Google's SEO documentation, I presume the bots won't index those URLs. Is this right? If so, what would be the most SEO-friendly approach to this wordcloud?
In short yes. Search bots are not going to bother parsing your JS because you could not be bothered to provide static accessible content.
Don't use "javascript URLs", they are an anti-accessibility feature. Some reading:
Broken Links
Hash, Bang, Wallop.
Breaking the Web with hash-bangs
Going Postel
That's why some browsers already implemented HTML5 PushState API, which uses the original URL but has the capability of understanding if it's Ajax or not and enables browser's navigation buttons (back/next).
Give a look at History.js project, a wrapper to help you use the API.
One possibility is to have your cloud degrade gracefully. For example, you could have a static list of your links created server-side with the page; if JavaScript is enabled, you could replace this list with your prettier cloud.
This has a benefit apart from being more transparent to search engines: people with JavaScript off will be able to see your links and it will improve accessibility.

Pure Javascript and HTML app & deployment via CDN ... good idea?

A big and general question, though NOT a discussion
Me and a friend are discussing a web application being developed. Currently it uses PHP, but the PHP doesn't store anything and it is all OAuth based. The whole thing talks to an independent API. The PHP is really just mirroring alot of the Javascript logic for browsers without Javascript support.
If it were decided to enforce Javascript as a requirement (let's not go into that ... whole other issue)
Are there any technical, fundamental problems serving the app as HTML+Javascript hosted on a CDN? That is, 100% Static javascript and HTML with no server-side logic. As the Javascript is just as capable of doing all the API calls as the PHP. Are any existing sites doing this?
We can't think of any show-stoppers, but it seems like a scary thought to make a "web" app 100% client side ... so looking for more input.
(To clarify, the question is about deploying using ONLY javascript and HTML and abandoning server-side processing outside the JSON API or whatever)
Thanks in advance!
One issue is with search engines.
Search engine crawlers index the raw HTML source code of a web-page. If you use JavaScript to load new data and generate new content, crawlers won't come into play, so your content won't get indexed.
However, Google is offering a solution for this - read here: http://code.google.com/web/ajaxcrawling/
Other than this, I can't think of any other issue...
Amazon has been offering the service on its S3 for a little while now. http://aws.typepad.com/aws/2011/02/host-your-static-website-on-amazon-s3.html . Essentially this allows you to specify a default index page and error pages. Otherwise you just load up your html on the S3 and point your www CNAME on your domain to the Amazon S3 bucket or cloudfront CDN.
The only thing that is not possible this way is if a user ends up typing example.com instead of www.example.com, you need to ensure that you have your DNS correctly forward these to www. Also the S3 will not be able to handle a naked domain (http://example.com/).
Regarding how good an idea it is, it sounds good to us as well. And we are currently exploring the option. So far it appears to work fine. What we have done is to setup beta.example.com to point to a CDN hosted site (S3) and are testing to see if it gives us everything we need. Performance is great though !

URL Redirects for SEO (in Flash)?

I am creating a flash site and am trying to make it SEO. I'm thinking a possible solution would be to render html to any search engine bot, or to anyone who needs accessibility, and rendering the flash site for the rest of the users.
First question is, is this acceptable for google, and SEO in general?
This would mean I would redirect urls to flash users from site.com/home.html to site.com/#/home only if they weren't a bot of some sort.
Second question is, is it possible to do this in javascript or rails?
I would do this by capturing the URL, checking to see who the user is (is it google, or is it a human), I'm just not sure how to do this with javascript/rails, whatever need be. Then once I found "hey this is google", I would return the html page; if it was a user, I'd return flash.
Would that work? Is there something better?
It'd be worth reading up on Google's policies toward cloaking, sneaky Javascript redirects, and doorway pages.
Personally, I'd build the site in HTML and use the Flash for progressive enhancement where appropriate.
Its not doable in javascript, because javascript is executed after the page is sent, so the damage is already done.
Your webserver would have to recognize the google useragent when the page request is made, and serve a different page accordingly. Then you can avoid the whole redirect nonsense entirely. I know you can configure most webservers to do that, however I do not know the required steps, and it depends on what webserver you are using.
I'm not going to comment on the merits/demerits of flash based websites.
This is a form of SEO called cloaking that's widely considered unscrupulous (though your intended use doesn't sound malicious to me). It can get you banned by Google.
Have you looked in to using SWFAddress?
The flash framework, Gaia, uses separate xhtml pages for it's SEO solution. From it's site:
"The Search Engine Optimization Scaffolding engine in Gaia creates an XHTML file for every page you specify in the site.xml, as well as a sitemap.xml file that follows sitemaps.org protocols.
The purpose of SEO Scaffolding is to provide search engines and non-Flash users with easy access to the content on your site, as well a convenient single data source for the copy on your site, organized by page.
This technique is white hat compliant, and is discussed on the Gaia forums."
More information here: http://www.gaiaflashframework.com/wiki/index.php?title=SEO

Categories

Resources