Angularjs vs SEO vs pushState

Angularjs vs SEO vs pushState - javascript

After reading this thread I decided to use pushstate api in my angularjs application which is fully API-based (independent frontend and independent backend).
Here is my test site: http://huyaks.com/index.html
I created a sitemap and uploaded to google webmaster tools.
From what I can see:
google indexed the main page, indexed the dynamic navigation (cool!) but did not index
any of dynamic urls.
Please take a look.
I examined the example site given in the related thread:
http://html5.gingerhost.com/london
As far as I can see, when I directly access a particular page the content which is presumed to be dynamic is returned by the server therefore it's indexed. But it's impossible in my case since my application is fully dynamic.
Could you, please, advise, what's the problem in my particular case and how to fix it?
Thanks in advance.
Note: this question is about pushState way. Please do not advise me to use escaped fragment or 3-d party services like prerender.io. I'd like to figure out how to use this approach.

Evidently Quentin didn't read the post you're referring to. The whole point of http://html5.gingerhost.com/london is that it uses pushState and proves that it doesn't require static html for the benefit of spiders.
"This site uses HTML5 wizrdry [sic] to load the 'actual content' asynchronusly [sic] to the rest of the code: this makes it faster for users, but it's still totally indexable by search engines."
Dodgy orthography aside, this demo shows that asynchronously-loaded content is indexable.

As far as I can see, when I directly access a particular page the content which is presumed to be dynamic is returned by the server
It isn't. You are loading a blank page with some JavaScript in it, and that JavaScript immediately loads the content that should appear for that URL.
You need to have the server produce the HTML you get after running the JavaScript and not depend on the JS.

Google does interpret Angular pages, as you can see on this quick demo page, where the title and meta description show up correctly in the search result.
It is very likely that if they interpret JS at all, they interpret it enough for thorough link analysis.
The fact that some pages are not indexed is due to the fact that Google does not index every page they analyze, even if you add it to a sitemap or submit it for indexing in webmaster tools. On the demo page, both the regular and the scope-bound link are currently not being indexed.
Update: so to answer the question specifically, there is no issue with pushState on the test site. Those pages simply do not contain value-adding content for Google. (See their general guidelines).

Sray, I recently opened up the same question in another thread and was advised that Googlebot and Bingbot do index SPAs that use pushState. I haven't seen an example that ensures my confidence, but it's what I'm told. To then cover your bases as far as Facebook is concerned, use open graph meta tags.
I'm still not confident about pushing forward without sending HTML snippets to bots, but like you I've found no tutorial telling how to do this while using pushState or even suggesting it. But here's how I imagine it would work using Symfony2...
Use prerender or another service to generate static snippets of all your pages. Store them somewhere accessible by your router.
In your Symfony2 routing file, create a route that matches your SPA. I have a test SPA running at localhost.com/ng-test/, so my route would look like this:
# Adding a trailing / to this route breaks it. Not sure why.
# This is also not formatting correctly in StackOverflow. This is yaml.
NgTestReroute:
----path: /ng-test/{one}/{two}/{three}/{four}
----defaults:
--------_controller: DriverSideSiteBundle:NgTest:ngTestReroute
--------'one': null
--------'two': null
--------'three': null
--------'four': null
----methods: [GET]
In your Symfony2 controller, check user-agent to see if it's googlebot or bingbot. You should be able to do this with the code below, and then use this list to target the bots you're interested in (http://www.searchenginedictionary.com/spider-names.shtml)...
if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
// what to do
}
If your controller finds a match to a bot, send it the HTML snippet. Otherwise, as in the case with my AngularJS app, just send the user to the index page and Angular will correctly do the rest.
Also, has your question been answered? If it has, please select one so I and others can tell what worked for you.
HTML snippets for AngularJS app that uses pushState?

Related

AngularJS content and Google

Recently I've seen articles stating that Google now crawls sites and renders CSS and Javascript. Example article by Google themselves: http://googlewebmastercentral.blogspot.co.uk/2014/05/understanding-web-pages-better.html
I have a single page application setup in Angular with HTML5 mode on the routing. An ng-view in my index.html is populated based on the URL like so:
app.config(function($routeProvider, $locationProvider){
$locationProvider.html5Mode(true);
$routeProvider.when("/", {
templateUrl: "/views/dashboard.html"
}).when("/portfolio", {
templateUrl: "/views/portfolio.html"
});
});
Google should now go to www.example.com/portfolio, execute the javascript which brings in the content from views/portfolio.html and be able to read all that content right?
That's what should happen according to those articles I've read. This one in particular explains it in detail regarding Angular: https://weluse.de/blog/angularjs-seo-finally-a-piece-of-cake.html
Here's the problem. When I use Google Webmaster Tools and the Fetch, or Fetch and Render functionality to see how Google sees my pages, it doesn't render the JS and just shows the initial HTML in my index.html.
Is it working? Have I done something wrong? How can I test it?

So as I mentioned in the comments, hopefully this answer gives more context of what I mean.
So when you do your declaration of html5Mode, also include the hashPrefix:
$locationProvider
.html5Mode(true)
.hashPrefix('!');
Then, in your <head>, include this tag:
<meta name="fragment" content="!">
What happens here is that you are providing a fallback measure for the History API, meaning for all users visiting with compliant browsers (basically everything nowadays) they will see this:
http://example.com/home/
And only on dinosaur browsers like IE9 would they see this:
http://example.com/#!/home/
Now, that is in real life with actual people as visitors. You asked specifically about being indexed by Google, which uses bots. They will try to go to example.com/home/ as an actual destination on your server (meaning /home/index.html), which obviously doesn't exist. By providing the <meta> tag above, you have provided the hint to the bot to instead go to an ?_escaped_fragment version of the site (like index.html?_escaped_fragment=home) and associated it with that URL of /home/ in the actual Google searches.
It is entirely on the backend, all visitors to your site will still see the clean URL, and is only necessary because under the hood Angular uses location.hash, which is not seen on server-side. Bottom line, your actual users will be unaffected and not have the ugly URL, unless they're on a browser that does not support the History API. For those users, all you've done is make the site start working for them (as before it would have been broken).
Hope this helps!
UPDATE
Since you are using a MEAN stack, you can also go a different direction which has been around a long time, which is to use HTML snapshots. There are npms that will provide snapshots (meaning static HTML from post-render) that can be served up from your server at the locations shown. That technique is a little outdated, but its been around since like 2012 and is proven to work.
Back when I did it, i used grunt-html-snapshot, but there are others out there. You can even use PhantomJS to make the snapshots, although I never did it.

How to modify bigcommerce frontend with api (add js code in app)

I am writing and app for biggommerce which should a bit of code JS to the header. Does bigcommerce allow to do something like that? (something like ScriptTag in Shopify)
Best Regards
Vahe Abelyan

There is no way to access/edit template files thru the Bigcommerce API. It would just be best to guide the user to pasting the JS into their header (as most apps already do).

Rob is correct, however, depending on how complex you want to get you could technically write an app that would fetch or put a file into the theme's template structure using WebDAV.
Basically
if /template/panels/htmlhead.html exists find it, add your script code before , then post file back to webdav
if /template/panels/htmlhead.html does not exist, allow user to upload the default one from their theme, and then have you app do the same thing.
However I'd recommend sticking with the easy way, just provide instructions for a user to place the code into the file themselves. If you want a great example of this, see YOTPO's installation instructions.

Converting a web app into an embeddable <script> tag

I just did a proof of concept/demo for a web app idea I had but that idea needs to be embedded on pages to work properly.
I'm now done with the development of the demo but now I have to tweak it so it works within a tag on any websites.
The question here is:
How do I achieve this without breaking up the main website's stylesheets and javascript?
It's a node.js/socket.io/angularjs/bootstrap based app for your information.
I basically have a small HTML file, a few css and js files and that's all. Any idea or suggestions?

If all you have is a script tag, and you want to inject UI/HTML/etc. into the host page, that means that an iframe approach may not be what you want (although you could possibly do a hybrid approach). So, there are a number of things that you'd need to do.
For one, I'd suggest you look into the general concept of a bookmarklet. While it's not exactly what you want, it's very similar. The problems of creating a bookmarklet will be very similar:
You'll need to isolate your JavaScript dependencies. For example, you can't load a version of a library that breaks the host page. jQuery for example, can be loaded without it taking over the $ symbol globally. But, not all libraries support that.
Any styles you use would also need to be carefully managed so as to not cause issues on the host page. You can load styles dynamically, but loading something like Bootstrap is likely going to cause problems on most pages that aren't using the exact same version you need.
You'll want your core Javascript file to load quickly and do as much async work as possible as to not affect the overall page load time (unless your functionality is necessary). You'll want to review content like this from Steve Souders.
You could load your UI via a web service or you could construct it locally.
If you don't want to use JSONP style requests, you'll need to investigate enabling CORS.
You could use an iframe and PostMessage to show some UI without needing to do complex wrapping/remapping of the various application dependencies that you have. PostMessage would allow you to send messages to tell the listening iFrame "what to do" at any given point, while the code that is running in the host page could move/manipulate the iframe into position. A number of popular embedded APIs have used this technique over the years. I think DropBox was using it for example.

Web crawler: Using Perl's MozRepl module to deal with Javascript

I am trying to save a couple of web pages by using a web crawler. Usually I prefer doing it with perl's WWW::Mechanize modul. However, as far as I can tell, the site I am trying to crawl has many javascripts on it which seem to be hard to avoid. Therefore I looked into the following perl modules
WWW::Mechanize::Firefox
MozRepl
MozRepl::RemoteObject
The Firefox MozRepl extension itself works perfectly. I can use the terminal for navigating the web site just the way it is shown in the developer's tutorial - in theory. However, I have no idea about javascript and therefore am having a hard time using the moduls properly.
So here is the source i like to start from: Morgan Stanley
For a couple of listed firms beneath 'Companies - as of 10/14/2011' I like to save their respective pages. E.g. clicking on the first listed company (i.e. '1-800-Flowers.com, Inc') a javascript function gets called with two arguments -> dtxt('FLWS.O','2011-10-14'), which produces the desired new page. The page I now like to save locally.
With perl's MozRepl module I thought about something like this:
use strict;
use warnings;
use MozRepl;
my $repl = MozRepl->new;
$repl->setup;
$repl->execute('window.open("http://www.morganstanley.com/eqr/disclosures/webapp/coverage")');
$repl->repl_enter({ source => "content" });
$repl->execute('dtxt("FLWS.O", "2011-10-14")');
Now I like to save the produced HTML page.
So again, the desired code I like to produce should visit for a couple of firms their HTML site and simply save the web page. (Here are e.g. three firms: MMM.N, FLWS.O, SSRX.O)
Is it correct, that I cannot go around the page's javascript functions and therefore cannot use WWW::Mechanize?
Following question 1, are the mentioned perl modules a plausible approach to take?
And finally, if you say the first two questions can be anwsered with yes, it would be really nice if you can help me out with the actual coding. E.g. in the above code, the essential part which is missing is a 'save'-command. (Maybe using Firefox's saveDocument function?)

The web works via HTTP requests and responses.
If you can discover the proper request to send, then you will get the proper response.
If the target site uses JS to form the request, then you can either execute the JS,
or analyse what it does so that you can do the same in the language that you are using.
An even easier approach is to use a tool that will capture the resulting request for you, whether the request is created by JS or not, then you can craft your scraping code
to create the request that you want.
The "Web Scraping Proxy" from AT&T is such a tool.
You set it up, then navigate the website as normal to get to the page you want to scrape,
and the WSP will log all requests and responses for you.
It logs them in the form of Perl code, which you can then modify to suit your needs.

newbie question about javascript embed code?

I am a javascript newbie. I am trying to write a requirements document, and need some help describing what I am looking for. We want our application to generate a javascript snippet like this:
<script src="http://www.jotform.com/jsform/10511502633"></script>
This will load a web form.
So my question is:
- How does a single script load an entire web form? Is this a JSON?
- What is this called? Is this a cross browser javascript?
- Can anyone point me in the direction of learning more about what this is?
Thank you for your help!

The javascript file is just hosted on an external site. It appears to be dynamically generated, so feel free to use some fancy words ;) But basically, you just include it here, as if it was on your own site.
You could say "The application will generate the required script-tags to include dynamically generated javascript file from an external, third-party site".
Offcourse you need to take special cautions for cases when the include won't work, because the other site is not reachable (site is down, DNS does not work, file is moved on other webserver, your application is on an intranet/behind a proxy/firewall...). Why can't you copy their file and mirror it locally? Or use a reliable Content Delivery Network, like Google or Amazon.

There are many names for this type of inclusion. The most common being widget.
What does it actually do:
take an id of some sort as parameter
use the id to fetch some specific data (most likely from a database)
generate some js and html based on the id/data
usually this involves iframes of some sort.
To use a script rather than an html iframe has multiple advantages
you can change what is actually delivered to the users browsers without changing the include
you can resize the iframe to fit certain predefined sizes
you can inject the necessary things into the page the widget is included (of course you need to make sure this is sanctioned)
We use this all the time and we never regreted it.
If you don't want to build the widget infrastructure yourself you can always use one of the widget providers like widgetbox:
http://www.widgetbox.com/widgets/make/
With those you are up and running in no time.

This is typically called a script include.

Google have lots of these types of items, and even they call them by many names,
widgets, custom javascript, snippets, custom code, etc. It really depending on who you are writing for... I would go with "cross platform embeddable javascript code" meaning that it would need to load all its dependancies. Also specify which browsers need to be supported and what should happen is the user has javascript turned off.
EDIT :
Actually since we are talking unique IDs, you will need 2 parts probably, the user/site unique "cross platform embeddable javascript code" and whatever serverside code to support it. Basically this is an API that is accessed using your own javascript widget. Feel free you point to examples in your requirements document, programmers love examples.

Develop Reference

JavaScript is the programming language of the Web.