Suppose, I have a javaScript-heavy single page web application. My Javascript render dom directly from model / datasource (Json).
I came up with an approach to generate simple html from datasource (on backend). This html is required only for search engines to index. After page is loaded, JavaScript will replace this quasi-html with the proper UI. Quasi-html can be removed from layout with display:none to avoid performance penalty on the browser.
Will it work?
Also I am concerned about legitimacy of the approach.
Thoughts?
It should work giving the search engines content to craw even if they don't read javascript. Now bots evolve and they read quite a bit of javascript nowadays, I've created a page that only has 2 sentences onBeforeLoad and uses Ajax to get the rest of the content and I see Google indexing a lot of the keywords delivered by Ajax. A problem would be misleading the search bot, like putting in content irrelevant to your other page content - something the bot might pick up at some point and penalize you for it. "I am concerned about legitimacy of the approach" - I wouldn't be, keep code valid and ride on
Related
Alright so I've been writing Backbone.js apps for over a year now and I love the framework model. I've learned how to avoid all the pitfalls and such, but there's one area I'm still quite weak as a single page app developer: how to SEO a public facing app.
I'm working on a blog project, and the easiest solution to my mind is to have a server generated list of all blog entries visible as a link from the /blog section that is rendered on page load, and to ensure that when hitting a /blog/:id url, the server loads the blog content into the very first div on the page, which will be set as display:none.
My question is if this should be sufficient for a good search engine index? SEO is still my weakest skill as a developer. Are there techniques for making sure a search engine crawls this content first and is able to use that content for its more complex indexing?
Also, is there a way to blacklist the generated app content on the page as I know Google has been testing crawling JavaScript apps? In my mind that could never be done at the level it needs to be without some sort of standard browser level event that can be triggered on a full page render or after all data has been loaded.
Anyways, this is more of an ambiguous ticket I know, but it could end up being useful to people in the future if we get a collection of good answers here.
Most of the major search engines (including Google) are rendering the content they receive from the website, in our (Google's) case with something close to a headless browser, so whatever you do for the users the search engines will also get it. Serving different stuff to search engines however will get you into a dangerous area, named cloaking.
Hiding the content with a display:none might backfire on you. We are giving hidden content way less weight in ranking.
I recent came across Javascript templates and have become quite intrigued.
I am building a large PHP application using the MVC pattern. Templating is handled by the rather awesome Twig.
I recently came across a javascript implementation of twig.
I have also read quite a bit about using javascipt template engines.
Now, in my application, the application generates a the full page for standard requests as fallback for users without javascript. For AJAX requests, it can generate the content part of the page (no <head>, <body> etc).
The ajax response object is currently just a the rendered HTML content, which is then inserted into DOM.
Should I instead return a response object containing a compiled javascript template and the objects to be inserted into the template? What are the benefits of doing that?
From the posts I have read, the javascript templates were only small snippets representing a small part of a page, for example displaying a comment on a blog post during the instant that the user has submitted it.
Are javascript templates only useful for inserting these sort of small "pieces" in a page?
Yes
A recent project I was on got "client-side template fever" and we used the dang things for every single template.
With every template library I've used (which is two or three), the error messages you get are not very good. If you have a huge template that operates on a fair amount of data, you'll quickly find the u.foo is null or not an object error message increasingly frustrating.
The best-practices I've settled on is:
Return a full HTML snippet (from the server) if it's a template that is seldom loaded. If you are only loading that HTML once on a page, then you might as well send it down populated right? This also encourages you to keep your logic up on the server, where it probably belongs.
Use client-side templates for small, repeated templates. Your blog comment example is probably a good one. I've found the most success when my client-side templates are pretty small (< 10 lines)
Use a logic-less template engine. The more logic allowable in a template, the harder they are to read/maintain. Plus, some of that logic should probably be in your business layer, not down in some JavaScript template. In other words, they force you to separate your presentation from your logic (which is good).
PS: It is cool that both your client and server-side templates could use the same templating engine. This will make developers on your project much more productive.
Depending on the scale and requirements of your application, you should take into considerations the following:
don't go rampant on Ajax; Ajax is not WebSockets, so use it sparingly. Plus, client-side execution speed is always key; AJAX is slow when compared to dumping as much resources as possible + using them when you need them; for example, you can send to javascript userdata = {name:'xxxx',address;'yyyy', ...} and using that, instead of requesting name and address via AJAX only when you need them.
it is recommended to use a global PHP var $sendData (or something like that) and right after plasting HTML, you send the $sendData with an easy <script>data = <?php echo json_encode($sendData); ?></script>
javascript templates add up to execution speed. which makes it reasonable to do the alternative, that is, separating everything that is dynamic and caching static resources like javascript functions
you can't, and I quote return a response object containing a compiled javascript template, not without residing to some server-side javascript engine that does the compile job prior to returning it
for your personal welfare, it will always boil down to how fast, easy and painlessly you respond to maintaining the application; there's no point in using super beta experimental frameworks, ports and kits over which you have limited control
Work smart, not hard.
Good luck mate.
You could also look into the Distal http://code.google.com/p/distal templating engine.
If I decided to use some javascipt in my website like
$('#body').load(URL);
or
$.get(URL, {param:value}, function(){ ... });
or
window.title = 'TEXT';
Is it good for SEO? Or am I recommended to use pure PHP for data on the page for SEO purposes?
The question of if javascript is good for SEO or not is missing the point. We should pretty much assume that any content which is only available by javascript will not be crawled by the search engines. Google at least claims to be able to crawl some javascript only content but is fairly tight lipped about what exactly they can crawl. Other search engines probably don't crawl it and it's certainly the case that not all do. So assume it doesn't get crawled.
That doesn't mean it's bad for SEO.
If the content will contribute to your SEO, then it's bad for SEO. If the content is neutral to SEO, then it's neutral for SEO. So the answer to your question really depends on the nature of your content. If the content is part of your SEO campaign, then stick with server-side HTML generation be it PHP or some other method. Otherwise the question of SEO has no bearing on the decision to to use javascript or not. Accessibility would be another thing to take into account. Javascript only content is terrible for that.
The larger search engines can/do render limited amounts of javascript. However, for SEO purposes your best bet is rendering the content via HTML rather than javascript. A good rule of thumb is to utilize HTML for content/expressing limited content structure (e.g. paragraph type text = p, lists = ul/ol, headings = h1/h2/h3, etc...), CSS for presentation, and JS for client side programming. With that being said, always ensure a good user experience first. If you can do the above while providing a great user experience, great! If you can't, users first. Its likely you can keep both users and bots happy 95% of the time if you take the time to do so.
Further reading (sorry, I can only post one link as a new user):
Matt Cutts Interview (Check out #26 on Google Javascript Rendering)
A spider's view of Web 2.0
EDIT Added that for "a new user" ;) ~ drachenstern
I think first you should consider what SEO means. It means "Search Engine Optimization" ... how does a search engine get data in the first place for it to be optimized?
It does a GET on the page and whatever data is returned in the GET is processed. No JS engine. No POST data. So you should be optimizing for whatever data is returned on a GET.
Additionally, you tagged this with PHP, but the question has nothing to do with PHP.
Have you seen any of the questions on this list?
https://stackoverflow.com/search?q=javascript+seo
No Sir, Google does not translate flash and java script properly so it may not crawl those area using java script or flash content. I suggest you should keep your website simple but if it is necessary to keep flashy/java script content then you should keep a text base backup.
The first thing you should be asking is not what is good for SEO, but what is good for users. For users, loading data with JavaScript will give them an interactive page, where they can start seeing the page immediately while it is still loading and where the page can update without having to reload it.
From Google's Webmaster Guidelines and article on Cloaking, you should not assume that crawlers can understand JavaScript. This does not mean that you should not use JavaScript on your website, but rather that you should provide the textual equivalent in noscript tags, for use both by users with JavaScript disabled
as well as for crawlers, bearing in mind that the content of these noscript tags should be roughly equivalent to what was shown with JavaScript enabled; showing different content to users and to search engines is called "cloaking" and is frowned upon to say the least.
Google doesn't (yet) execute a page's Javascript (JS). So if your JS replaces/creates content on a page then the content would normally be invisible to the crawlers (not good).
But, the Googlers have implemented a url hack that enables your server to create pages (from the server, not from JS), with all the different varients of your JS page's content.
This solves the SEO problem of Ajax powered pages. At least for Google searches...
See Crawable Ajax
Javascript or any scripts for that matter should never be used to house your sites content, ever! The entire web is driven by HTML and CSS, and in rare cases XML languages, everything else is a headache when it comes to SEO. Ask yourself this question, what exactly is SEO and what is it that search engines are indexing? Javascript and all programming/scripting languages are proprietary, this means that they are NOT standards as defined by the W3C, which means they are essentially worthless when it comes to indexing content. On the other hand, HTML, CSS, and XML are real standards developed for the web! It's ok to use scripts to add additional functionality to your pages, embed apps like social networking plugins, etc, but you should never use them to hold your websites HTML, CSS, or actual content ever, for any reason. Here's a link to a good article that will explain why you should be using HTML and CSS, and not a million scripts, optimizing webpages using proper html markup. Scripts cause other problems besides code that is hard for search engines to decipher. For one, they are harder for browsers to process, causing pages to load much slower than "static" pages made with HTML and CSS would. Pages made with PHP tend to create "dynamic" URL's that users and search engines cannot read. This is why Google recommends people who use jsp or PHP for their webpages include a sitemap, otherwise your links will never be found and might as well not exist. Stick to the conventions! Lets face it, we have standards for a reason. If every electronic component in your home required had a different type of plug that required a special socked, and all those devices had differing voltage and amperage requirements, what would happen? You would essentially burn down your house! And, you'd be spending 5 hours a day at the hardware store looking for those special adapters to fit your wall sockets with. If you plan on designing a website, use scripts for embedding apps or connecting with a database only, and use HTML and CSS to build "static" webpages. Also, use text links, as they are both human and search engine readable, and easy to index and make sense of. Never use scripts for your links. Programming and scripting can be fun, but not on the internet its not.
Search engines index HTML, CSS, and content (multi-media, graphics, videos, text, thats it!) everything else is pointless and annoying to both users and search engines alike. For best results use XML and design a custom language.
Google can crawl, index and rank javascript generated content.
But... it uses an old Chrome version (42) with an old javascript render engine.
The consequence is that your javascript code needs to work in older browsers and older chrome versions (older than 42). So no fancy ES6 functions, you need to use polyfills or use Babel for example.
Although you can do a lot with javascript (like click events or inject your mobile menu), it's recommended to still use a-href instead of a button with a javascript event and then using a function to get to a new page.
You can check the mobile testing tool from Google: https://search.google.com/test/mobile-friendly and check the errors/warnings/logs. If the rendered output looks like intended, Google will see your content.
In search console you can also ask to index the page. Sometimes the javascript crawler is first, sometimes the 'classic' crawler.
Doublecheck it some days afterward by googling a sentence or paragraph from your page.
There's no answer about if it's better or not. Content is content and Google should rank your website, SPA, PWA, AMP site, PDF document, online Doc, wikipage, and so on based on their content, not on the underlying technique.
If you are familiar with JavaScript, give it a go.
Regards, Peter
I'm having some trouble figuring out how to make the "page load" architecture of a website.
The basic idea is, that I would use XSLT to present it but instead of doing it the classic way with the XSL tags I would do it with JavaScript. Each link should therefore refer to a JavaScript function that would change the content and menus of the page.
The reason why I want to do it this way, is having the option of letting JavaScript dynamically show each page using the data provided in the first, initial XML file instead of making a "complete" server request for the specific page, which simply has too many downsides.
The basic problem of that is, that after having searched the web for a solution to access the "underlying" XML of the document with JavaScript, I only find solutions to access external XML files.
I could of course just "print" all the XML data into a JavaScript array fully declared in the document header, but I believe this would be a very, very nasty solution. And ugly, for that matter.
My questions therefore are:
Is it even possible to do what I'm
thinking of?
Would it be SEO-friendly to have all
the website pages' content loaded
initially in the XML file?
My alternative would be to dynamically load the specific page's content using AJAX on demand. However, I find it difficult to find a way that would be the least SEO-friendly. I can't imagine that a search engine would execute any JavaScript.
I'm very sorry if this is unclear, but it's really freaking me out.
Thanks in advance.
Is it even possible to do what I'm thinking of?
Sure.
Would it be SEO-friendly to have all the website pages' content loaded initially in the XML file?
No, it would be total insanity.
I can't imagine that a search engine would execute any JavaScript.
Well, quite. It's also pretty bad for accessibility: non-JS browsers, or browsers with a slight difference in JS implementation (eg new reserved words) that causes your script to have an error and boom! no page. And unless you provide proper navigation through hash links, usability will be terrible too.
All-JavaScript in-page content creation can be useful for raw web applications (infamously, GMail), but for a content-driven site it would be largely disastrous. You'd essentially have to build up the same pages from the client side for JS browsers and the server side for all other agents, at which point you've lost the advantage of doing it all on the client.
Probably better to do it like SO: primarily HTML-based, but with client-side progressive enhancement to do useful tasks like checking the server for updates and printing the “this question has new answers” announce.
maybe the following scenario works for you:
a browser requests your xml file.
once loaded, the xslt associated with the xml file is executed. result: your initial html is outputted together with a script tag.
in the javascript, an ajax call to the current location is made to get the "underlying" xml-dom. from then on, your javascript manages all the xml-processing.
you made sure that in step 3, the xml is not loaded from the server again but is taken from the browser cache.
that's it.
This post probably will need some modification. I'll do my best to explain...
Basically, as a tester, I have noticed that sometimes programers who use template-based web back ends push a lot of stuff into onload handlers that then do stuff like load menu items, change display values in forms, etc.
For example, a page that displays your network configuration loads blank (or dummy values) for the IP info, then loads a block of variables in an onload function that sets the values when the page is rendered.
My experience (and gut feeling) is that this is a really bad practice, for a couple reasons.
1- If the page is displayed in an environment where Javascript is off (such as using "Send Page") the page will not display properly in that environment.
2- The HTML page becomes very hard to diagnose, because what is actually on screen is needs to be pieced together by executing the javascript in your head (this problem is less prominent w/ Firefox because of Firebug).
3- Most of the time, this is not being done via a standard practice of feature of the environment. In other words, there isn't a service on the back-end, the back-end code looks just as spaghetti as the resulting HTML.
and, not really a reason, more a correlation:
I have noticed that most coders that do this are generally the coders that have a lot of code-related bugs or critical integration bugs.
So, I'm not saying we shouldn't use javascript, I think what I'm saying is, when you produce a page dynamically, the dynamic behavior should be isolated to the back-end, and you should avoid changing the displayed information after the page is loaded and rendered.
I think what you're saying is what we should be doing is Progressive Enhancement with JavaScript.
Also related: Progressive Enhancement with CSS, Understanding Progressive Enhancement and Test-Driven Progressive Enhancement.
So the actual question is "What are advantages/disadvantages" of javascript content generation?
here's one: a lot of the things designers want are hard in straight html/css, or not fully supported. using Jquery to do zebra-tables with ":odd" for instance. Sometimes the server-side framework doesn't have good ways to accomplish this, so the way to get the cleanest code is actually to split it up like that.