We are using an HTML parser called Html Agility Pack in our .NET web services. We parse some HTML pages using this parser and extract some content. Though this parser is useful, we are looking for better productivity. We are wondering whether we can load javascript and jquery in web services and use jquery just like we use it on web pages for extracting content. This would make our job a lot easier.
If we cannot do this, Can we leverage the power of jquery in some other way? We are really curious to know a solution.
You could send the HTML content to the user's browser and parse it with jQuery there.
Then show it to the user.
Or, if needed for others, post the result back to the server to cache it.
Related
My PHP template looks like this:
$html=file_get_contents("/path/to/file.html");
$replace=array(
"{title}"=>"Title of my webpage",
"{other}"=>"Other information",
...
);
foreach(replace AS $search=>$replace){
$html=str_replace($search,$replace,$html);
}
echo $html;
I am considering switching to a javascript/ajax template system. The AJAX will fetch the $replace array in JSON format and then I'll use javascript to replace the HTML.
The page would then be a plain .html file and a loading screen would be shown until the ajax was complete.
Is there any real advantages to this or is the transition a waste of time?
A few of the reasons I think this will be beneficial:
Page will still load even if the Mysql or PHP services are down. If the ajax fails I can handle it with an error message.
Bot traffic (and anything else that doesnt run JS) will cause very little load to my server since the ajax will never be sent.
Please let me know what your thoughts are.
My 2cents is it is better to do the logic on the template side (javascript). If you have a high traffic site you can off load some of the processing to each computer calling the site. Maybe less servers.
With Javascript frameworks like AngularJs the template stuff is pretty simple and efficient. And the framework will do caching for you.
Yes, SEO can be an issue with certain sites. There at proxy tools you can put in place that will render the site and return the static html to the bot. Plus I think some bots render javascript these days.
Lastly, I like to template on the front-end because I like the backend to be a generic data provider (RESTful API). This way I can build a generic backend that drives web / mobile and other platforms in a generic way. The UI logic can be its separate thing in javascript.
But it comes down to the design needs of your application. I build lots of Software as a service applications so a single page application works well for me.
I've worked with similar design pattern in other projects. There are several ways to do this and the task would involve managing multiple project or application modules. I am assume you are working with a team of developers and not using either PHP or JavaScript MVC framework.
PHP Template
For many reasons, I'm against using “search and replace” method especially using server-side scripting language to parse HTML documents as a templating kit.
Why?
As you maintain business rules and project becomes larger, you will
find yourself reading through a long list of regular expressions,
parse HTML into DOM, and/or complicated algorithms for searching
nodes to replace with correct text(s).
If you had a placeholder, such as {title}, that would help the
script to have fewer search and replace expressions but the design
pattern could lead to messy sharing with multiple developers.
It is ok to parse one or two HTML files to manage the output but not
the entire template. The network response could be slower with
multiple and repetitive trips to server and that's just only for
template. There could be other scripts that is also making trips to
the server for different reason unrelated to template.
AJAX/JavaScript
Initially, AJAX with JavaScript might sound like a neat idea but I'm still not convinced.
Why?
You can't assume web browser is JavaScript-enabled in every mobile
or desktop. You might need to structure the HTML template in few
ways to manage the output for non-JavaScript browsers. You might
need to include <noscript> and/or <iframe> tags on every page. And,
managing alternative template for non-JavaScript browser can be
tedious.
Every web browser interpret JavaScript differently. Most developers
should know few differences between IE, FireFox, Chrome, Safari, and
to name few. You might need to create multiple JavaScript files to
detect then load JavaScript for that specific web browser. You
update one feature, you have to update script for all web browsers.
JavaScript is visible in page source. I wouldn't want to display
confidential JavaScript functions that might include credentials,
leak sensitive data about web services, and/or SQL queries. The idea
is to secure your page as much as possible.
I'm not saying both are impossible. You could still do either PHP or JavaScript for templating.
However, my “ideal” web structure should consist of a reliable MVC like Zend, Spring, or Magnolia. Those MVC framework include many useful features such as web services, data mapping, and templating kits. Granted, it's difficult for beginners with configuration requirements to integrate MVC into your project. But in the end, you could delegate tasks in configurations, MVC concepts, custom SQL queries, and test cases to developers. That's my two cents.
I think the most important aspects you forgot are:
SEO : What about search engine bots ? They wont be able to index your content if it is set by javascript only.
Execution and Network Latency : When your service is working, the browser will wait until the page is loaded (let's say 800ms) before making the extra Ajax calls to get your values. This might add an extra 500ms to get it (depending on network speed and geographic location...). If you have sent all the generated data by your server, you would have spent only ~1ms more to prepare the complete response. You would have a lot of waiting on a blank page.
Caching : You could cache the generated pages on your web app. That way your load will be minimized as well. And also, if you still want to deliver content while your backend services (MySQL/PHP..) are down you could even use Apache or Nginx caching.
But I guess it really depends on what you want to do.
For fast and simple pages, which seems to be your case, stick with backend enhancements.
For a dynamic/interactive app which can afford loading times, and doesn't care about SEO, you can delegate most things to the front-end. But then use an advanced framework like Angular, to handle templating, caching, etc...
Suppose, I have a javaScript-heavy single page web application. My Javascript render dom directly from model / datasource (Json).
I came up with an approach to generate simple html from datasource (on backend). This html is required only for search engines to index. After page is loaded, JavaScript will replace this quasi-html with the proper UI. Quasi-html can be removed from layout with display:none to avoid performance penalty on the browser.
Will it work?
Also I am concerned about legitimacy of the approach.
Thoughts?
It should work giving the search engines content to craw even if they don't read javascript. Now bots evolve and they read quite a bit of javascript nowadays, I've created a page that only has 2 sentences onBeforeLoad and uses Ajax to get the rest of the content and I see Google indexing a lot of the keywords delivered by Ajax. A problem would be misleading the search bot, like putting in content irrelevant to your other page content - something the bot might pick up at some point and penalize you for it. "I am concerned about legitimacy of the approach" - I wouldn't be, keep code valid and ride on
I have a site written in Asp.net webforms. It uses ajax heavily.
Most forms on the site are submited with javascript. Javascript validates the input and sends it to /ajax.ashx on the server. The server processes the request and sends back a JSON response. My javascript uses the JSON to create html, which it inserts into the Dom.
I'm making a new version of my website written using asp.net MVC3. I've been looking at tutorials on this subject, and some of them recommend doing ajax in a different way. Rather than sending data and then building + inserting html with javascript, they create html at the server, and use javascript only to insert it into the Dom. For instance, in this tutorial.
Which way should I use? Using the new method will be quicker, but is it better?
That's a subjective question. Both approaches are possible and there is no better way. There are pros and cons of each approach.
Building the HTML on the server is easier and will require you less efforts but consumes more bandwidth compared to the first approach.
If you decide to go the first way you could use some client side templating framework which might help you simplify the generation of DOM elements on the client.
Creating html code directly into the server and injecting it with an ajax call is very fast and simple, the real problem is that in that way your service is bound to be used with that specific application. By sending RAW data you allow any app to use that data in any way, without bounding it to a specific application.
returning json feels more flexible to me; you can change what happens with your json response, like the layout it results in. If you return html you return data mixed with layout. This doesn't feel right to me.
I believe it is better to separate the layout from the actual data. That is why you should pass data between your scripts and not HTML.
If you go about it sending HTML, consider that you would have to build valid HTML and CSS, which might not sound hard at first but then you'll start using CSS that is not loaded in the file calling the ajax, etc.
Always separate content (data) from layout. That's why there is HTML and CSS, to separate layout from data. So why mess things up by mingling HTML between data?
Building the html serverside will probably be faster and not bog down the client which is important. Rendering data into HTML with javascript takes time and not every browser is fast with js (i.e. older versions of IE) so things can slow down if you're doing a lot of this.
Like previous posters said, it's kinda subjective because it depends on how much you're offloading to the client. I'm of the opinion that if you can do things serverside, you should.
If you are going to be using this service to return JSON to other applications/clients, then it's probably a good idea to just leave it as JSON and let the client do what it needs on their side.
I have read a great deal of discussions about javascript templating and Search Engine Optimization. Still, I haven't found a satisfying answer to the question (either poorly-documented or outdated).
Currently I am looking into handlebars.js as a client-side template solution, because I love the possibility to create helper functions. But what about indexing for search engines? Does the bot index the generated content (as intended) or only the source with the ugly javascript pseudo-variables? I know that there are lots of threads going on about this matter but I feel that nobody does exactly know the answer.
If engines like Google would not index these templates properly, why would one bother using this for public websites?
Another question within this context: Is it possible to render Handlebar.js templates on server side and then present them onto the client side? Obviously to avoid all this SEO discussion.
For dom crunching client side, most web bots (i.e. Google and others) don't interpret js on the fly and parse newly rendered content for indexing. Instead Google (and now Bing) support the 'Google Ajax Crawling Scheme' (https://developers.google.com/webmasters/ajax-crawling/docs/getting-started) - which basically states that IF you want js rendered dom content to be indexed (i.e. rendering ajax call results), you must be able to:
Trigger the async js rendering via the url using hashbangs #! (i.e. http://www.mysite.com/#!my-state), and
Be able to serve a prerendered dom snapshot of your site AFTER js modification on request.
If using a client side MVC framework like Backbone.js, or Spine - you will need to provide this service if you want your web app indexed.
Generally this means you intercept a request made by the web bot (explained on the link above), and scrape your side server side using a headless browser (i.e. QT + capybara-webkit, HtmlUnit, etc.), then deliver the generated dom back to the requesting bot.
I've started a gem to do this in ruby (now taking pull requests) at https://github.com/benkitzelman/google-ajax-crawler
It does this as rack middleware using capybara-webkit (and soon phantomjs)
I do not know about Handlebar.js, but for my understanding SEO have some problem with CONTENT in JAVASCRIPT. Make sure your content is visible to Search Engine (use a spyder simulator for some test). Avoid spyder traps generally would be the way to go. Hope it could help you.
Search engines don't run JavaScript, so if you want to have your content indexed you'll need to render your templates on the server as well. You can use handlebars in Node (server-side JS) to render your template there when the page request comes from a spider. It's more work but it's possible. Github, google plus, and twitter all do something similar.
You could use Distal templates which puts the templates as part of the HTML for SEO.
See Spiderable for a temporary solution Meteor project (which uses Handlebars.js) uses for SEO purposes.
http://docs.meteor.com/#spiderable
Does the bot index the generated content (as intended) or only the source with the ugly javascript pseudo-variables?
Neither, because indexer bots don't run JavaScript and you don't serve up templates as HTML documents.
Build a site that works without JavaScript, then build on top of it.
I'm having some trouble figuring out how to make the "page load" architecture of a website.
The basic idea is, that I would use XSLT to present it but instead of doing it the classic way with the XSL tags I would do it with JavaScript. Each link should therefore refer to a JavaScript function that would change the content and menus of the page.
The reason why I want to do it this way, is having the option of letting JavaScript dynamically show each page using the data provided in the first, initial XML file instead of making a "complete" server request for the specific page, which simply has too many downsides.
The basic problem of that is, that after having searched the web for a solution to access the "underlying" XML of the document with JavaScript, I only find solutions to access external XML files.
I could of course just "print" all the XML data into a JavaScript array fully declared in the document header, but I believe this would be a very, very nasty solution. And ugly, for that matter.
My questions therefore are:
Is it even possible to do what I'm
thinking of?
Would it be SEO-friendly to have all
the website pages' content loaded
initially in the XML file?
My alternative would be to dynamically load the specific page's content using AJAX on demand. However, I find it difficult to find a way that would be the least SEO-friendly. I can't imagine that a search engine would execute any JavaScript.
I'm very sorry if this is unclear, but it's really freaking me out.
Thanks in advance.
Is it even possible to do what I'm thinking of?
Sure.
Would it be SEO-friendly to have all the website pages' content loaded initially in the XML file?
No, it would be total insanity.
I can't imagine that a search engine would execute any JavaScript.
Well, quite. It's also pretty bad for accessibility: non-JS browsers, or browsers with a slight difference in JS implementation (eg new reserved words) that causes your script to have an error and boom! no page. And unless you provide proper navigation through hash links, usability will be terrible too.
All-JavaScript in-page content creation can be useful for raw web applications (infamously, GMail), but for a content-driven site it would be largely disastrous. You'd essentially have to build up the same pages from the client side for JS browsers and the server side for all other agents, at which point you've lost the advantage of doing it all on the client.
Probably better to do it like SO: primarily HTML-based, but with client-side progressive enhancement to do useful tasks like checking the server for updates and printing the “this question has new answers” announce.
maybe the following scenario works for you:
a browser requests your xml file.
once loaded, the xslt associated with the xml file is executed. result: your initial html is outputted together with a script tag.
in the javascript, an ajax call to the current location is made to get the "underlying" xml-dom. from then on, your javascript manages all the xml-processing.
you made sure that in step 3, the xml is not loaded from the server again but is taken from the browser cache.
that's it.