How can I make an indexable website that uses Javascript router? - javascript

I have been working on a project that uses Backbone.js router and all data is loaded by javascript via restful requests. I know that there is no way to detect whether Javascript is enabled or not in server-side but here is the scenarios that I thought to make this website indexable:
I can append a query string for each link on sitemap.xml and I can put a <script> tag to detect whether Javascript is enabled or not. The server renders this page with indexable data and when a user visits this page I can manually initialize Backbone.js router. However the problem is I need to execute an sql query to render indexable data in server-side and it will cause an extra load if the visitor is not a bot. And when users share an url of the website somewhere, it won't be an indexable page and web crawlers may not identify the content of that url. And an extra string in web crawler's search page may be annoying for users.
I can detect popular web crawlers like Google, Yahoo, Bing, Facebook in server-side from their user-agents but I suspect that there will be some web crawlers that I missed.
Which way seems more convenient or do you have any idea & experience to make indexable this kind of websites?

As elias94xx suggested in his comment, one solid solution to this dilemma is to take advantage of Google's "AJAX crawling". In short Google told the web community "look we're not going to actually render your JS code for you, but if you want to render it server-side for us, we'll do our best to make it easy on you." They do that with two basic concepts: pretty URL => ugly URL translation and HTML snapshots.
1) Google implemented a syntax web developers could use to specify client-side URLs that could still be crawled. This syntax for these "pretty URLs", as Google calls them, is: www.example.com?myquery#!key1=value1&key2=value2.
When you use a URL with that with that format, Google won't try to crawl that exact URL. Instead, it will crawl the "ugly URL" equivalent: www.example.com?myquery&_escaped_fragment_=key1=value1%26key2=value2. Since that URL has a ? instead of a # this will of course result in a call to your server. Your server can then use the "HTML snapshot" technique.
2) The basics of that technique is that you have your web-server run a headless JS runner. When Google requests an "ugly URL" from your server, the server loads up your Backbone router code in the headless runner, and it generates (and then returns to Google) the same HTML that code would have generated had it been run client-side.
A full explanation of pretty=>ugly URLs can be found here:
https://developers.google.com/webmasters/ajax-crawling/docs/specification
A full explanation of HTML snapshots can be found here:
https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot
Oh, and while everything so far has been based on Google, Bing/Yahoo also adopted this syntax, as indicated by Squidoo here:
http://www.squidoo.com/ajax-crawling

Related

How does Google's analytics.js authenticate the hostname?

I'm building JS-only plugin which will be implemented on multiple websites, each website having its own unique ID, which is passed to a Rails API along with some other data. My API will verify the hostname and ID provided by the JS plugin - but these things can of course be seen and used to fake impressions or events by anyone.
As far as I'm aware, there is no foolproof way of authenticating a website without an invisible, server-side key. That said, how does Google do it?
Analytics requires no server-side implementation, only an ID, which it of course checks against the hostname. Does this not mean that page views and events can be faked by a third party, and if so, why isn't it a prevalent issue?
Thanks in advance

Angular.js and SEO

I'd like to create a site with Angular (I'm new), but also want to be able to have different "views" be cachable in the search engines and have their own URL routes. How would I achieve this with Angular, or is best not to use it?
Enable pushState in Angular with $locationProvider.html5Mode(true); so that you have real URLs and make sure that, when the URL is requested by the client, you deliver the complete page for that URL from the server (and not a set of empty templates that you populate with JS).
When a link is followed, you'll go through an Angular view and update the existing DOM (while changing the URL with pushState) but the initial load should be a complete page.
This does mean duplicating effort (you need client and server side versions of the code for building each page). Isomorphic JS is popular for dealing with that issue.
If you want to expose Angular views to search engines and other bots, I suggest using an open source framework that we developed at Say Media. It uses node.js to render the pages on the server when it detects a bot vs a real user. You can find it here:
https://github.com/saymedia/angularjs-server
I would suggest not using different routes, however, as most search engines will penalize you for having duplicate content on multiple urls. And while you might think they would just hit the bot version of your site, they are getting more sophisticated about crawling single page app like sites. I would be cautious about duplicate routes for the same content.
Good Luck!

When does Googlebot execute javascript?

I have a few single page web apps on multiple domains that heavily rely on javascript/ajax to fetch and show content. Based on logs and search results I can tell that googlebot runs javascript on some of the domains but not on others. On some it indexes everything thats only available with js on others it doesn't even seem to run js at all.
Can anybody tell me how googlebot decides what js to run and if I can to anything to get it to run js on my other domains?
PS: I know that normally I should use something like serverside rendering for this, but I'm not at all depended on search results and rankings, so its not really worth the effort. I'm just curious how googlebot decides whether it should run js or not and if there's anything easy I can do to change that on my other domains.
You can learn more about how Google render ajax based website and a list of best practice directly from Google developer website here:
https://webmasters.googleblog.com/2014/10/updating-our-technical-webmaster.html
https://developers.google.com/webmasters/ajax-crawling/
Regarding your specific problem as first thing, I suggest you to analyse each domain using Google Webmaster tool with functionality "Fetch as Google" and go trough every technical aspects mentioned in Google guide.
https://support.google.com/webmasters/answer/158587?hl=en
I think Google Updated Research on the Subject
http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157
Now the functionality to fetch your page by Google Bot and see the results has moved into Google Search Console.
You can use URL Inspection Tool to analyze your live URL.
I've tested it on AngularJS App and Google Bot was able to crawl page content with data fetched from AJAX request.
One very important restriction is that the Googlebot does not allow AJAX requests while the page is loaded.
In my blog post I am explaining how to adapt a Single Page Application so that it becomes crawlable – without the need to render HTML snapshots on the server.

Using a server to send/receive information between a mobile phone and web page

I am trying to set up a simple set up as follows:
Have a mobile app with a page consisting of 4 lines (4 html paragraph lines (I am using phonegap)).
I want to use a web page from which I will input the data for those 4 lines. This information is sent to a server and that server transfers this information to that app on that mobile phone. Now, those 4 lines on the mobile phone is filled with the new information.
Similarly user inputs information on another page consisting of 10 lines of li (list). This information is again sent to the server and to the web page where the information is displayed.
I can almost feel the "internet police guys" getting all hyped and ready to vote this question down. But please understand that I have been on this site and various forums desperate to find a tutorial to guide me to do this and not able to find.
I am trying to use ajax to perform this setup. Confused how I would be using the php file. Information such as password n username is going to go in that php file to connect to the server. But php is a server side script thus needs to sit at the public_html folder. How do I use the php file from my desktop? Write a separate javascript to access it?
It is the concept that is confusing me. I am familiar with html,js,php.
I would appreciate any guidance or maybe a link to a tutorial which would help me to do the concept I mentioned. Thanks for listening.
You will need to create an API using PHP. This API is uploaded to your server and is considered "RESTful". Google a tutorial for what fits your needs. You can set all sorts of rules in this API such as requiring any requests to have an ID or access token.
Since you are using PhoneGap, your HTML and JS files rest on the device, so you will need to allow permissions to your API from anywhere. For this you will have to speak to your host provider about unless you know how to configure it yourself (some providers restrict what you want to do by default as an extra security precaution against XSS attacks).
Next, you can either use jQuery, or you can write some AJAX calls by writing the JavaScript yourself.
The most efficient way for this to work is to send JSON objects to and from the API. You will include a "command" in the JSON when you are sending from your app. On the PHP side, you will retrieve this command and use the rest of the data included in your JSON object to process the request. Your API will need to encode a JSON object for return (such as a user's profile information).
Here is a basic PHP API tutorial to get you going that explains some of the features of a RESTful API: PHP API
Here is a simple AJAX function (you will probably want to make this much more modular): AJAX
As broad as your question is, it seems like the best/easiest thing for you to do will be for you to first create a PHP webpage that will access a SQL database to perform the record updating. Actually, this should serve all of your needs for your mobile users assuming you don't need push notifications for live data updates.
I am assuming, since you are using phone gap, that you are more comfortable with web languages. After you get the webpage fully operational, then you should start building your app based on that exact same SQL database. With mobile app development there are a lot more "what if's" (what if the phone rings, what if the app is running in the background, what if there is no cellular service, etc...)
It is always easier to start with what you know and build on that, rather than starting with a new development platform and troubleshooting as problems arise.

Can I access an API without authentication in JavaScript?

Circumstances
I develope a WebApp with AngularJS.
I've an restful API on server-side with GET and POST commands.
I want to use the API within my module (means: in JavaScript) to display and edit my data.
I want to protect the API with some kind of authentication (basic auth with an API key for example)
I don't want to protect the API when a user uses the app itself
Actual question
Okay, I guess the last point is a bit unclear.
I want that a user can use the app with his browser without any authentication
But when a third-party app want to access the API it have to use authentication
Since JavaScript is executed on client-side of course I can't write a master key into js or something similar..
Is there any kind of pattern or solution to solve this problem?
More specifications
referring to #EliranMalka and #shaunhusain
On the server-side I do use Tornado with it's builtin template engine. I do use the template engine actually just to write the index page and insert CSS, JS dynamically.
The code for authentication would just something like:
def is_authenticated(request):
if 'api_key' in request.arguments:
return sql('SELECT id FROM keys WHERE key=%S' % request.arguments['api_key']).count == 1
My AngularJS module is doing something similar to:
$http.get('/api/foo?api_key=1234')
.then(function (result) {
$scope.data = result.data
});
As you can see I'm writing my API key into js at the moment. But I wan't to avoid this.
Also, what do you mean exactly by third-party?
not a third-party request would be: Using the App on http:/ /app.example.com with a browser
A third-party request would be from an Android app for example. Something that comes from outside or remote.
A JS request from the browser on the actual page would be not from remote (again: since it's JS it is actually from remote - but I hope it gets more clear now)
Oh and before I forget...
I'm aware of that my plan is a bit weird - but it's just a learning(-web-development)-by-doing project.
Also the API key is not absolutely to avoid abusion, it is rather to log 3rd-party usage.
PS I hope my question was clear for you
Hmm, well I'll try to address the questions but here's a few things.
Question isn't really appropriate in it's current format for stackoverflow.com (should be programming questions, I tried X and Y happened) perhaps closer to a StackExchange question but is still fairly open ended.
Include more information about specifics of the languages (and/or frameworks) your using server side and any code you have that is relevant (authentication code?).
Putting the key into the client code and transmitting it from the client means anyone with a web proxy (check out Charles or Wireshark) can grab the key so just to reiterate you're right there that's not the way to go.
Check out how other organizations allow you to get access to their APIs (for example Google, LinkedIn, Facebook, Twitter) to get a feel for how it works. In all of these cases you are signed into the service to be able to make an API key, in some cases you have to specify which domain the requests with that API key will come from. If you use the same precautions and check the API key sent with a request against a database of registered API users and verify the domain in the request then I'd say you're in pretty good shape.

Categories

Resources