AngularJS - SEO - S3 Static Pages - javascript

My application uses AngularJS for frontend and .NET for the backend.
In my application I have a list view. On clicking each list item, It will fetch a pre rendered HTML page from S3.
I am using angular state.
app.js
...
state('staticpage', {
url: "/staticpage",
templateUrl: function (){
return 'http://xxxxxxx.cloudfront.net/staticpage/staticpage1.html';
},
controller: 'StaticPageCtrl',
title: 'Static Page'
})
StaticPage1.html
<div>
Hello static world 1!
<div>
How do I do SEO here?
Do I really need to do HTML snapshot using PanthomJS or so.

Yes PhantomJS would do the trick or you can use prerender.io with that service you can just use their open source renderer and have your own server.
Another way is to use _escaped_fragment_ meta tag
I hope this helps, if you have any questions add comments and I will update my answer.

Do you know that google renders html pages and executes javascript code in the page and does not need any pre-rendering anymore?
https://webmasters.googleblog.com/2014/05/understanding-web-pages-better.html
And take a look at these :
http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157
http://wijmo.com/blog/how-to-improve-seo-in-angularjs-applications/

My project front-end also has biult on top of Angular and I decieded to solve SEO issue like this:
I've created an endpiont for all search engines (SE) where all the requests go with _escaped_fragment_ parameter;
I parse a HTTP Request for _escaped_fragment_ GET parameter;
I make cURL request with parsed category and article parameters and get the article content;
Then I render a simpliest (and seo friendly) template for SE with the article content or throw a 404 Not Found Exception if article does not exists;
In total: I do not need to prerender some html pages or use prrender.io, have a nice user interface for my users and Search Engines index my pages very well.
P.S. Do not forget to generate sitemap.xml and include there all urls (with _escaped_fragment_) wich you want to be indexed.
P.P.S. Unfortunately my project's back-end has built on top of php and can not show you suitable example for you. But if you want more explanations do not hesitate to ask.

Firstly you can not assume anything.
Google does say that there bots can very well understand javascript application but that is not true for all scenarios.
Start from using crawl as google feature from the webmaster for your link and see if page is rendered properly. If yes, then you need not read further.
In case, you see just your skeleton HTML, this is because google bot assumes page load complete before it actually completes. To fix this you need an environment where you can recognize that a request is from a bot and you need to return it a prerendered page.
To create such environment, you need to make some changes in code.
Follow the instructions Setting up SEO with Angularjs and Phantomjs
or alternatively just write code in any server side language like PHP to generate prerendered HTML pages of your application.
(Phantomjs is not mandatory)
Create a redirect rule in your server config which detects the bot and redirects the bot to prerendered plain html files (Only thing you need to make sure is that the content of the page you return should match with the actual page content else bots might not consider the content authentic).
It is to be noted that you also need to consider how will you make entries to sitemap.xml dynamically when you have to add pages to your application in future.
In case you are not looking for such overhead and you are lacking time, you can surely follow a managed service like prerender.
Eventually bots will get matured and they would understand your application and you will say goodbye to your SEO proxy infrastructure. This is just for time being.

At this point in time, the question really becomes somewhat subjective, at least with Google -- it really depends on your specific site, like how quickly your pages render, how much content renders after the DOM loads, etc. Certainly (as #birju-shaw mentions) if Google can't read your page at all, you know you need to do something else.
Google has officially deprecated the _escaped_fragment_ approach as of October 14, 2015, but that doesn't mean you might not want to still pre-render.
YMMV on trusting Google (and other crawlers) for reasons stated here, so the only definitive way to find out which is best in your scenario would be to test it out. There could be other reasons you may want to pre-render, but since you mentioned SEO specifically, I'll leave it at that.

If you have a server-side templating system (php, python, etc.) you can implement a solution like prerender.io
If you only have AngularJS-only files hosted on a static server (e.g. amazon s3) => Have a look at the answer in the following post : AngularJS SEO for static webpages (S3 CDN)

yes you need to prerender the page for the bots, prrender.io
can be used and your page must have the
meta tag
<meta name="fragment" content="!">

Related

Call web application from another web application

I need to call a react web application to another javascript web application, I cannot use the iframe or the object, I cannot use the server require, and I don't know the frontend technologies that I will found, then I must use the basically technologies of every application ( javascript or jquery ).
I tried some ways to do it, for example in javascript
qr=new XMLHttpRequest();
qr.open('get',
https://www.mywebapp.com/
,true);
qr.send('c1=v1&c2=v2');
qr.onload=function(){test2.innerHTML=qr.responseText}
and in jquery
var link = "https://mywebapp.com";
var params = "c1=v1&c2=v2"
$("#test").load(
link,
params,
function(){alert("ciao")}
);
});
also in jquery I have tried to use
ajax,ajaxSetup+ajax and other way, in every way i tried to change the value of every parameter like traditional true and false or content-type:'text/html' or every parameter i have found the result of every tried was that I was loaded the page, but the page cannot find the static resources ( javascript, css, images ), I need help
Thank you
Wow, after some clarification it sounds like the chatbot you have written needs to be embedded in to a page in another application. This way the chatbot doesn't know anything about the page it's being embedded on and the embedding page only needs a little bit of script to kick-start the loading of the chatbot.
You'll need to make sure you have JavaScript and CSS that can be downloaded from the chatbot server and dropped in to any page, regardless of which frameworks are in use.
Your JavaScript is then free to grab any other resources (HTML fragment/JavaScript/CSS/Images/etc.) as needed from the chatbot host. This is similar to how you would go about embedding Disqus or Google Maps on to a page.
I think helping implementing this is way out of scope for a single StackOverflow question but hopefully this will give you some idea of what is required and puts you on the right track.

Understanding ms-seo for meteor

I'm trying to use ms-seo package for meteor but I'm not understanding how it works.
It's supposed to add meta tags to your page for crawlers and social media (google, facebook, twitter, etc...)
To see it working according to the docs all I should have to do is
meteor add manuelschoebel:ms-seo
and then add some defaults
Meteor.startup(function () {
if(Meteor.isClient){
return SEO.config({
title: 'Manuel Schoebel - MVP Development',
meta: {
'description': 'Manuel Schoebel develops Minimal Viable Producs (MVP) for Startups',
},
og: {
'image': 'http://manuel-schoebel.com/images/authors/manuel-schoebel.jpg',
}
});
}
});
which I did but that code only executes on the client (browser). How is that helpful to search engines?
So I test it
curl http://localhost:3000
Results have no tags
If In the browser I go to http://localhost:3000 and inspect the elements in the debugger I see the tag but if I check the source I don't.
I don't understand how client side added tags have anything to do with SEO. I thought Google, Facebook, Twitter when scanning your page for meta tags basically just do a single request. Effectively the same as curl http://localhost:3000
So how does this package actually do anything useful? I feel stupid. 27k users it must work but I don't understand how. Does it require the spiderable package to get static pages generated?
You are correct. You need to use something like the spiderable package or prerender.io to get this to work. This package will add tags, but like any Meteor page, it's rendered on the client.
Try this with curl to see the result when using spiderable:
curl http://localhost:3000/?_escaped_fragment_=
Google will now render the JS itself so for Google to index your page correctly you don't need to use spiderable/prerender.io, but for other search engines I believe you still do have to.
An alternate answer:
Don't use spiderable, as it uses PhantomJS which is rather resource intensive when bots crawl your site.
Many Meteor devs are using Prerender these days, check it out.
If you still have some problems with social share buttons or the package, try to read this: https://webdevelopment7636.wordpress.com/2017/02/15/social-share-with-meteor/ . It was the only way I got mine to work. You don't have to worry about phantomJS or spiderable to make it work fine.
It is a complete tutorial using meteorhacks:ssr and meteorhacks:picker. You have to create a crawler filter on the server side and a route that will be called by it when it is activated. The route will send dynamically the template and the data to a html on the "private" folder, and will render the html to the crawler. The template on the private folder will be the that gets the metatags and the tag.
This is the file that will be on the private folder
I can't put the other links with the code here, but if you need anymore help, go to the first link and see if the tutorial helps.

PHP HttpRequest to create a web page - how to handle long response times?

I am currently using javascript and XMLHttpRequest on a static html page to create a view of a record in Zotero. This works nicely except for one thing: The page html title.
I can of course also change the <title>...</title> tag, but if someone wants to post the view to for example facebook the static title on the web page will be shown there.
I can't think of any way to fix this with just a static page with javascript. I believe I need a dynamically created page from a server that does something similar to XMLHttpRequest.
For PHP there is HTTPRequest. Now to the problem. In the javascript version I can use asynchronous calls. With PHP I think I need synchronous calls. Is that something to worry about?
Is there perhaps some other way to handle this that I am not aware of?
UPDATE: It looks like those trying to answer are not at all familiar with Zotero. I should have been more clear. Zotero is a reference db located at http://zotero.org/. It has an API that can be used through XMLHttpRequest (which is what I said above).
Now I can not use that in my scenario which I described above. So I want to call the Zotero server from my server instead. (Through PHP or something else.)
(If you are not familiar with the concepts it might be hard to understand and answer the question. Of course.)
UPDATE 2: For those interested in how Facebook scraps an URL you post there, please test here: https://developers.facebook.com/tools/debug
As you can see by testing there no javascript is run.
Sorry, im not sure if i understand what you are trying to ask, are you just wanting to change the pages title?
Why not use javascript?
document.title = newTitle
Facebook expects the title (or opengraph :title tags) to be present when it fetches the page. It won't execyte any JavaScript for you to fill in the blanks.
A cool workaround would be to detect the Facebook scraper with PHP by parsing the User Agent string, and serving a version of the page with the information already filled in by PHP instead of JavaScript.
As far as I know, the Facebook scraper uses this header for User Agent: "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
You can check to see if part of that string is present in the header and load the page accordingly.
if (strpos($_SERVER['HTTP_USER_AGENT'], 'facebookexternalhit') !== false)
{
//synchronously load the title and opengraph tags here.
}
else
{
//load the page normally
}

Get URL and redirect before page load

So, i have an interesting situation. I've been working on re-organizing a directory on a website. I updated old files there's about 100 of them, they are in a new location. The old files have been taken down.
The problem I have is there are probably hundreds of people that have bookmarks directly to the URL of the old files. (e.i. "wahwah.com/subSite/pdfs/something.pdf") these files are 5 years old so they need to find the new ones anyways.
So instead of having a page for each individual file, Can I have something in the directory that used to house the files to watch for that URL and redirect to the new page?
It would watch for "wahwah.com/subSite/pdfs.." and redirect. Or maybe something in the main directory of this subSite to watch for the URL to have the /pdf path in it.
I know I can grab URLs in java script but that doesn't help me unless I can do what I stated above. I'm not sure how if at all I could do it in .NET. our servers support .NET because most of our site apps were made with it but I don't deal with those. I cannot use PHP, the servers don't use it.
I'm hoping JavaScript will be able to do it somehow, but it's something i've never tried before so just thinking about it i'm not sure I can. I'm not much for using JS libraries so Im not sure what is out there i've been searching a bit though.
I found Grunt but i'm not entirely sure how it works just yet. Just looking around maybe the file filter or matchBase. or some of the Global patterns.
If you have access to server, your best option is to set up redirect in there on wahwah.com/subSite/pdfs/ directory.
How to do this depends on if you're on IIS or unix.
In asp.net, 301 redirect is fairly efficient.
if (HttpContext.Contains("http://old.aspx"))
{
HttpContext.Current.Response.Status = "301 Moved Permanently";
HttpContext.Current.Response.AddHeader("http://www.new.aspx");
}
Or in page load you can write:
Response.Status = "301 Moved Permanently";
Response.AddHeader("Location","http://new.aspx");

Ajax page part load and Google

I have some div on page loaded from server by ajax, but in the scenario google and other search engine don't index the content of this div. The only solution I see, it's recognize when page get by search robot and return complete page without ajax.
1) Is there more simple way?
2) How distinguish humans and robots?
You could also provide a link to the non-ajax version in your sitemap, and when you serve that file (to the robot), you make sure to have included a canonical link-element to the "real" page you want users to see:
<html>
<head>
[...]
<link rel="canonical" href="YOUR_CANONICAL_URL_HERE" />
[...]
</head>
<body>
[...]
YOUR NON_AJAX_CONTENT_HERE
</body>
</html>
edit: if this solution is not appropriate (some comments below points out that that this solution is non-standard and only supported by the "big-three"), you might have to re-think whether you should make the non-ajax version the standard solution, and use JavaScript to hide/show the information instead of fetching it via AJAX. If it is business critical information that is fetched, you have to realize that not all users have JavaScript enabled, and thus they won't be able to see this information. A progressive enhancement approach might be more appropriate in this case.
Google gets antsy if you are trying to show different things to you users than to crawlers. I suggest simply caching your query or whatever it is that needs AJAX and then using AJAX to replace only what you need to change. You still haven't really explained what's in this div that only AJAX can provide. If you can do it without AJAX then you should be, not just for SEO but for braille readers, mobile devices and people without javascript.
You can specify a sitemap in your robots.txt. That sitemap should be a list of your static pages. You should not be giving to Google a different page at the same URL, so you should have a different URL with static and dynamic content. Typically, the static URL is .../blog/03/09/i-bought-a-puppy and dynamic URL is something like .../search/puppy.

Categories

Resources