When does Googlebot execute javascript? - javascript

I have a few single page web apps on multiple domains that heavily rely on javascript/ajax to fetch and show content. Based on logs and search results I can tell that googlebot runs javascript on some of the domains but not on others. On some it indexes everything thats only available with js on others it doesn't even seem to run js at all.
Can anybody tell me how googlebot decides what js to run and if I can to anything to get it to run js on my other domains?
PS: I know that normally I should use something like serverside rendering for this, but I'm not at all depended on search results and rankings, so its not really worth the effort. I'm just curious how googlebot decides whether it should run js or not and if there's anything easy I can do to change that on my other domains.

You can learn more about how Google render ajax based website and a list of best practice directly from Google developer website here:
https://webmasters.googleblog.com/2014/10/updating-our-technical-webmaster.html
https://developers.google.com/webmasters/ajax-crawling/
Regarding your specific problem as first thing, I suggest you to analyse each domain using Google Webmaster tool with functionality "Fetch as Google" and go trough every technical aspects mentioned in Google guide.
https://support.google.com/webmasters/answer/158587?hl=en

I think Google Updated Research on the Subject
http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157

Now the functionality to fetch your page by Google Bot and see the results has moved into Google Search Console.
You can use URL Inspection Tool to analyze your live URL.
I've tested it on AngularJS App and Google Bot was able to crawl page content with data fetched from AJAX request.

One very important restriction is that the Googlebot does not allow AJAX requests while the page is loaded.
In my blog post I am explaining how to adapt a Single Page Application so that it becomes crawlable – without the need to render HTML snapshots on the server.

Related

How to stop users to manipulate the popup and at the same time let googlebot crawl my page

I have a very confusing problem.
I have a page which only allow paid users to view it. So if the user is not valid I use a pop up with grey backgroud to block users to view the page however there is a potential flaw with this and if a user is clever he can find a workaround and by using the inspect element bypass the popup. Another solution which comes to my mind is to redirect the user to another page instead of pop up like:
window.location = "http://www.example.com";
However there is a potential problem with this or may be I am wrong on this:
I think this way google bots wont be able to crawl that page since redirection happens however in the first approach google will definitely be able to crawl the page.
Now my question is if I use the first approach is there anyway to stop user from manipulating the popup or is there anyway I can distinguish if a user is browsing the page or google?
Also if I use the second approach will google bot be able to crawl the page?
You can't implement a paid block or any types of truly secure/working blocking on the frontend. I would suggest prevent accessing to that said page on the backend.
There's no real clean and 100% working way to this on the frontend. The user can always bypass.
For google, it will be able to crawl the page since the content is still accessible via the rendered html, as it does not care how the page is shown. It gets access to the content anyway, just like you would by fetching the html via a get request without a browser.
You could indeed just redirect, but still do it on the backend not the frontend.
Your current solution does not make the page private - as you rightly point anyone can manipulate the page using the dev tools, and crawlers can read the whole source anyway. Using server-side scripts to block access, and/or vary the content based on an authorisation token is the only way to secure it properly and ensure that only your legitimate paying users get privileged access.
You state a concern about the inability for Google (and other search engines, I assume) to crawl the page if you employ better security. But your logic is flawed: If you make it so that a google bot can still crawl the page, then by definition it must be readable without authorisation. Anyone could view it in the google cache, and parts of its content could show up in google searches. This means it isn't private. Once that's the case, then what are your users paying for, exactly?
What you might realistically want to do is have a cut-down version of the page that is displayed when the user is not authorised, containing enough information for search engines to get an idea of the overall content, and for visitors to be tempted into paying for the rest. Then if the user logs in, the server recognises that and displays the rest of the content as well when the page refreshes. That appears to be roughly what paid-content news sites do, for instance.

How to get Google indexing AngularJS Web apps

I have an single page application built using AngularJS. All requests gets served up the index.html, and from there, Angular takes on the routing and queries a set of API endpoints to get the data to display.
The title and SEO metadata and description for the site is obtained the same way. The catch is that the API endpoint is on a different domain so the SPA is actually doing cross origin requests to get the data.
Everything works fine from a users point of view. However, when google crawls the site, it does not pick up any metadata or title, instead, it just shows the angular tags.
Looking through the site logs, I can see requests with Google bot only doing an OPTIONS request and not following with the actual GET.
How can I get google to index the page properly?
Here is a screenshot of what it is looking like:
The site is https://www.careercontroller.com
Any help would be appreciated.
NOTE: I know I can get this to work by generating static HTML from the server using PhantomJS or something, but I'm looking to get Google to index it properly since according to them, they crawl AngularJS apps just fine.
I have actually got this to work before, except the requests are not cross-domain, so could that be the problem?
This is a common issue that is solved by products like prerender.io

How can I make an indexable website that uses Javascript router?

I have been working on a project that uses Backbone.js router and all data is loaded by javascript via restful requests. I know that there is no way to detect whether Javascript is enabled or not in server-side but here is the scenarios that I thought to make this website indexable:
I can append a query string for each link on sitemap.xml and I can put a <script> tag to detect whether Javascript is enabled or not. The server renders this page with indexable data and when a user visits this page I can manually initialize Backbone.js router. However the problem is I need to execute an sql query to render indexable data in server-side and it will cause an extra load if the visitor is not a bot. And when users share an url of the website somewhere, it won't be an indexable page and web crawlers may not identify the content of that url. And an extra string in web crawler's search page may be annoying for users.
I can detect popular web crawlers like Google, Yahoo, Bing, Facebook in server-side from their user-agents but I suspect that there will be some web crawlers that I missed.
Which way seems more convenient or do you have any idea & experience to make indexable this kind of websites?
As elias94xx suggested in his comment, one solid solution to this dilemma is to take advantage of Google's "AJAX crawling". In short Google told the web community "look we're not going to actually render your JS code for you, but if you want to render it server-side for us, we'll do our best to make it easy on you." They do that with two basic concepts: pretty URL => ugly URL translation and HTML snapshots.
1) Google implemented a syntax web developers could use to specify client-side URLs that could still be crawled. This syntax for these "pretty URLs", as Google calls them, is: www.example.com?myquery#!key1=value1&key2=value2.
When you use a URL with that with that format, Google won't try to crawl that exact URL. Instead, it will crawl the "ugly URL" equivalent: www.example.com?myquery&_escaped_fragment_=key1=value1%26key2=value2. Since that URL has a ? instead of a # this will of course result in a call to your server. Your server can then use the "HTML snapshot" technique.
2) The basics of that technique is that you have your web-server run a headless JS runner. When Google requests an "ugly URL" from your server, the server loads up your Backbone router code in the headless runner, and it generates (and then returns to Google) the same HTML that code would have generated had it been run client-side.
A full explanation of pretty=>ugly URLs can be found here:
https://developers.google.com/webmasters/ajax-crawling/docs/specification
A full explanation of HTML snapshots can be found here:
https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot
Oh, and while everything so far has been based on Google, Bing/Yahoo also adopted this syntax, as indicated by Squidoo here:
http://www.squidoo.com/ajax-crawling

How does Yahoo's new automatic loading work?

In the new Yahoo mail inbox, when u click the message it is displayed in a tab automatically (I guess with out server interaction). Does that mean Yahoo loads all the data first and then use them with java script when requested or not...anyho i don't have any idea and I would like if some one explain to me how it works since am planning to do the same with my application. I am sure this will boost application performance and i am eager to know.
I guess Yahoo did something similar to what Hotmail describes here
Basically they decide depending on several aspect what+when to preload...
I have not seen it but what you're describing sounds like dynamic AJAX loading. Basically, only load information when it is requested by the user. This will reduce network load and initial loading times. Most JS libraries have some form of AJAX helper. You can read more on AJAX here and here.
I am pretty sure it does have some server interaction. It definitely is using some sort of AJAX to fetch data from the server and show it to you. There are tons of tutorials about using AJAX which you can refer. You can probably start with http://w3schools.com/ajax/default.asp

Ajax: Load XML from different domain?

I have signed up(paid) for Google site search. They have me a url of a sort of web service where I can send a query to it, it searches my site, and it returns XML of the search results. Well I am trying to load this XML via Ajax from a page on my site but I cannot. I can load from any of my pages on my domain so I am assuming it is because of the XML being on Google's domain. So there has got to be a way to load it though, I don't think they would have given me the URL if I couldn't do anything with it lol. Does anyone know how to do this?
Thanks!
UPDATE:
this is what the page says on google that gave me the XML:
How to get XML
You can get XML results for your
search engine by replacing query+terms
with your search query in this URL:
http://www.google.com/cse?cx=MY_UNIQUE_KEY&client=google-csbe&output=xml_no_dtd&q=query+terms
Where MY_UNIQUE_KEY = my unique key.
You can't load external files with AJAX. However, you can set up a file on your own server that makes the content available on your server. For instance in PHP, you could write a file googlexml.php:
<?php
#readfile("http://www.google.com/cse?cx=MY_UNIQUE_KEY&client=googlecsbe&output=xml_no_dtd&q=query+terms");
?>
And then you could access that with AJAX. I'm not sure if Google's terms of use will let you do that, but if they do, then this is an option.
Does google not offer the ability to forward a DNS address to the IP of your service, folding it into your domain? This way you can do in AJAX
googleAlias.mydomain.com
Google should support this, but I don't know for sure. I imagine they would in the same way they do with GMail and external-domain mail.
Removes your cross-domain javascript issues
edit I expanded below and another user helpfully pointed out this should work (thanks Stobor)
Well, to get my company mail into GMail, if I recall, I needed to change the MX record on my DNS to point to a google IP. You may be able, if google supports it, to add an A record to your domain so an AJAX request to foo.yourdomain.com is the same as search.google.com or whatever. Google needs to recognize requests from your hostname in the A record and say "Oh yes, that's me, on my client's behalf"
For those coming across this now, the AJAX Search API may be what you want: http://code.google.com/apis/ajaxsearch/documentation/
EDIT: Actually, upon further review, that may not hook in with the site search...

Categories

Resources