I am still very new to the concepts and design of ASP .NET's MVC and AJAX and I was wondering how secure the Controller is to unwanted user's when webdeployed.
I ask because for fun I made a little admin panel that requires a user name and password. Once input is entered the information is AJAX submitted to a ActionResult method in the Controller that just compares the strings to see if they match, then returns the response back to the AJAX.
My question is, how easy is it for someone to get into my Controller and see the hard-coded password?
No professional-type person will ever try to break into this, as it is a free site for a university club, but I want to make sure that the average Computer Science student couldn't just "break in" if they happen to "rage" or get mad about something (you never know! haha).
Question: Is having a password validation within the Controller "decently" secure on a ASP .NET MVC web-deployed application? Why or why not?
Here is the actual code in case the use of it matters for the answer (domain is omitted for privacy)
Note: I understand this use of Javascript might be bad, but I am looking for an answer relative to AJAX and Controller security of the password check.
View (Admin/)
//runs preloadFunc immediately
window.onpaint = preloadFunc();
function preloadFunc() {
var prompting = prompt("Please enter the password", "****");
if (prompting != null) {
$.ajax({
url: "/Admin/magicCheck",
type: "POST",
data: "magic=" + prompting,
success: function (resp) {
if (resp.Success) {
//continue loading page
}
else {
//wrong password, re-ask
preloadFunc();
}
},
error: function () {
//re-ask
preloadFunc();
}
});
}
else {
// Hitting cancel
window.stop();
window.location.replace("google.com");
}
}
Controller (ActionResult Snippet)
[HttpPost]
public ActionResult magicCheck(string magic)
{
bool success = false;
if (magic == "pass")
{
success = true;
}
else
{
success = false;
}
return Json(new { Success = success });
}
Again I am new to MVC and AJAX, let alone anything dealing with security so I am just wondering how secure the Controller is, specifically on webdeploy for this simple password setup.
During normal operation, there is no concern as your code is compiled, the DLL prevented from being served, and there is no way for the browser to request the controller to divulge its own code.
However, it is not impossible (but quite rare) that unforeseen bugs, vulnerabilities, or misconfigurations of the server could lead to the server divulging compiled code, web.config, etc., whereby someone could disassemble the code (IL is easily decompiled) and reveal your secret.
More worrisome would be someone having physical access to the server just grabbing the binaries directly and disassembling to find your secret.
Another thing to consider is who, during normal situations, might see that secret and whether or not they should know it. A developer, tester, or reviewer may be allowed to write or inspect code, but you may not want them to know the secret.
One way to handle this is not store secrets in plain text. Instead, create a hash of the valid value, then update your application to hash the user's input in the same manner, and compare the results. That way if the user ever gets your source code, they can't read the original plain text value or even copy/paste it into your UI. You can roll your own code to do the hashing, use the FormsAuthentication API, or something else.
Finally, do not rely on client-side enforcement of security. You can check security on the client side to have the UI react appropriately, but all server-side requests should be doing checks to make sure the user's security claims are valid.
The question really goes out of scope from here, regarding how to manage identities, passwords, and make security assertions. Spend a little time looking through the myriad articles on the subject. Also, the Visual Studio ASP.NET project templates include a lot of the security infrastructure already stubbed out for you to give you a head start.
Never leaving things to chance is an important policy. Learning about ASP.NET and MVC's various facilities for authentication and authorization is a worthwhile effort. Plus, there are numerous APIs you can plug in to do a lot of the heavy lifting for you.
As has already been pointed out if you can get a hold of the binaries for an app (or for that matter ANY .NET application not just MVC) then it's definately game over.
Just sat in front of me here and now I have 3 applications that make it child's play to see what's inside.
Telerick - Just Decompile
IL-Spy
Are both freely downloadable in seconds, and the former of the two will take an entire compiled assembly, and actually not just reverse engineer the code, but will create me a solution file and other project assets too, allowing me to load it immediately back into Visual Studio.
Visual Studio meanwhile, will allow me to reference the binaries in another project, then let me browse into them to find out their calling structure using nothing more than the simple object browser.
You can obfuscate your assemblies, and there are plenty of apps to do this, but they still stop short of stopping you from de-compiling the code, and instead just make the reverse engineered code hard to read.
on the flip side
Even if you don't employ anything mentioned above, you can still use command line tools such as "Strings" or editors such as "Ultra Edit 32" and "Notepad++" that can display hex bytes and readable ASCII, to visually pick out interesting text strings (This approach also works well on natively compiled code too)
If your just worried about casual drive by / accidental intrusions, then the first thing you'll want to do is to make sure you DON'T keep your source code in the server folder.
It's amazing just how many production MVC sites Iv'e come accross where the developer has the active project files and development configuration actually on the server that's serving live to the internet.
Thankfully, in most cases, IIS7 is set with sensible defaults, which means that things like '*.CS' files, or 'web.config' files are refused when an attempt is made to download them.
It's by no means however an exact science, just try the following link to see what I mean!!
filetype:config inurl:web.config inurl:ftp
(Don't worry it's safe, it's just a regular Google Search link)
So, to avoid this kind of scenario of leaking documents, a few rules to follow:
Use the web publishing wizard, that will ensure that ONLY the files needed to run end up on the server
Don't point your live web based FTP root at your project root, in fact if you can don't use FTP at all
DO double check everything, and if possible get a couple of trusted friends to try and download things they shouldn't, even with a head start they should struggle
Moving on from the server config, you have a huge mountain of choices for security.
One thing I definitely don't advocate doing though, is rolling your own.
For years now .NET has had a number of very good security based systems baked into it's core, with the mainstay being "ASP.NET Membership" and the current new comer being "ASP.NET simple membership"
Each product has it's own strengths and weaknesses, but every one of them has something that the method your using doesn't and that's global protection
As your existing code stands, it's a simple password on that controller only.
However, what if I don't give it a password.
What happens if I instead, decide to try a few random url's and happen to get lucky.
eg: http://example.com/admin/banned/
and, oh look I have the banned users page up.
This is EXACTLY the type of low hanging entry point that unskilled script kiddies and web-vandals look for. They wander around from site to site, trying random and pseudo random URL's like this, and often times they do get lucky, and find an unprotected page that allows them to get just far enough in, to run an automated script to do the rest.
The scary part is, small college club sites like yours are exactly the type of thing they look for too, a lot of them do this kind of thing for the bragging rights, which they then parade in front of friends with even less skill than themselves, who then look upon them as "Hacking Heroes" because they broke into a "College Site"
If however, you employ something like ASP.NET membership, then not only are you using security that's been tried and tested, but your also placing this protection on every page in your site without having to add boiler plate code to each and every controller you write.
Instead you use simple data annotations to say "This controller is Unprotected" and "This one lets in users without admin status" letting ASP.NET apply site wide security that says "NO" to everything you don't otherwise set rules for.
Finally, if you want the last word in ASP.NET security, MVC or otherwise, then go visit Troyhunt.com I guarantee, if you weren't scared before hand, you will be afterwards.
It looks like you are sending a password via AJAX POST. To your question, my answer would be that you should consider using SSL or encrypt the password prior to sending it via POST. See this answer for an example and explanation SSL Alternative - encrypt password with JavaScript submit to PHP to decrypt
As HackedByChinese said, the actual code being stored in your compiled files (DLL) wouldn't be too big of a deal. If you want to be extra paranoid, you can also store the password in your web.config and encrypt it there. Here's an example and explanation of that How to encrypt username and password in Web.config in C# 2.0
This code is not secure at all. Your JavaScript code can be replaced with EVERYTHING user wants. So someone can just get rid of your preloadFunc. Average computer sience student will execute this code directly from console:
if (resp.Success) {
//continue loading page
//this code can be executed by hand, from console
}
And that will be all when it comes to your security.
Authentication and authorization info should go to server with every request. As a simple solution, you could use FormsAuthentication, by calling
FormsAuthentication.SetAuthCookie("admin")
in /Admin/magicCheck, only if password is correct.
Then you should decorate data retrieval methods with [Authorize] attribute to check if cookie is present.
Using SSL to secure communication between browser and server would be wise too, otherwise password travels in clear text.
Related
I am a graphic designer working on a website for my employer. At last minute, they have asked if it is possible to hide/reveal certain parts of a page dependent on whether the user types a specific email domain. After some research—given I am not an expert web developer—I figure out this bit of Javascript:
function validate()
{
var text = document.getElementById("email_input").value;
var formslist = document.getElementById ("forms");
var regx = /^([a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]{3,20})+#(email1.com||email2.com)$/;
if (regx.test(text))
{
forms.style.display = "block";
document.getElementById("errortext").style.visibility="hidden";
}
else
{
forms.style.display = "hidden";
document.getElementById("errortext").innerHTML="Our forms section requires an approved email address.";
document.getElementById("errortext").style.visibility="visible";
document.getElementById("errortext").style.color="gray";
}
}
And it works! But common sense tells me this seems too simple to be secure... How can I hide/hash/mask "email1.com" or "email2.com"? How could I decrease the of odds of someone just going into the browser's developer view and seeing the accepted values?
(Sorry if I am repeating this question. I just can't figure out the correct search terms for what I want to do!)
you can use digest method of Crypto API, and check the hashed input against the hashed email values
What you want is probably not possible using only a client-side approach, or else a robust client-side approach is probably overkill.
A one-way hash function is a cryptographically sound approach to allow the client to check input without revealing what the desired input is. You can send a hashed value H(v1) without leaking information about v1 itself, and then have the client verify if the user's input v2 satisfies H(v1) == H(v2).
However, what is the client then to do after verifying a match? If it's going to display information to the user, that same information must be sent to the client before displaying it. Though the page may be cryptographically sound in it's decision of when to show the information on the page, any modestly savvy user may find that information using debug tools in the browser without making the page's script render it properly.
One actually cryptographically sound approach is to only grant the client access to the secret display-information in a form that has been encrypted with a symmetric-key cipher using the output of a key derivation function (KDF) like Encrypt(secretData, KDF(v1)), and attempt to perform the corresponding Decrypt(secretData, KDF(v2)) to decrypt the data using the user's input v2. It would probably be simpler to just send the input to the server and have it decide whether to send the secret data at all, but if you have no server (or no server that you trust with your secrets, or no server you believe will stay online for the useful life of your client application) then this is a viable approach.
If you want this to be completely hidden from a "clever" user - you need to implement a backend validation. I don't see other good ways of doing this.
Javascript that runs in browser can be easily read by a user and translated into a more human readable form. So, even if you encode your strings with btoa() - it can be decoded with atob(). Example - https://www.w3schools.com/jsref/met_win_atob.asp
I'm pretty new to HTML, like 1 week new. I am making a web store and I want to be able to login into an "admin panel" to make it easier for me to manage my products. Add new, remove, rename etc. My problem is, I have my login information stored in the html code and I use if-statements to check the validity.
When I was testing the code, I was curious and wanted to inspect element. Unsurprisingly, there was my entire login information and anybody can have access to it.
I need to somehow hide it, or hide the login fields from users except me. But I do not know how to approach that. I thought of a few solutions like have a hidden part on the store page and if I click it a certain amount of times then it will show the fields. But I think I'm complicating it.
Any ideas are greatly appreciated. Thanks. Below is my function for logging in.
function login()
{
var username = "test username";
var password = "testpassword";
if(document.getElementById("username field").value == username && document.getElementById("password field").value == password)
{
var btn = document.createElement("BUTTON");
document.body.appendChild(btn);
<!-- hide the user name field after login -->
document.getElementById("username field").hidden = true;
<!-- hide the password field after login -->
document.getElementById("password field").hidden = true;
<!-- hide the login button after login -->
document.getElementById("login btn").hidden = true;
<!-- show a message indicating login was successfull -->
window.alert("Login successfull! Welcome back admin!")
}
else
{
window.alert("Sorry, you are not authorized to view this page.");
}
}
And this is a screenshot of the inspect element. I don't want anything too crazy like a database because I'm the only user, just a way to be able to access the admin panel without exposing myself. Thanks again.
Inspect Element Screenshot
EDIT:
I am not using my own server, I am using Wix.com to make the initial website and then using the HTML widget to create a webstore. I don't think they allow people to have any communication with their servers whatsoever.
Username and password validation should never be done on the client side. It should always be done on the server. Do not use javascript for this task. Allow your user to enter their username and password in a form, and then submit the form to a server side script to validate their credentials. Doing it on the client side will never be secure.
There's no easy solution to your particular request, but before I oblige you with the details I'd like to stress three very important points.
1: Javascript is not Safe
Javascript is a client side language, which means every piece of data you'll ever be dealing with that comes from your user can be directly modified. These include, but are not limited too, any values or attributes of HTML tags, inline Javascript, loaded image files, etc. Essentially, anything that is cached on the user's computer can be modified and might not be what you're expecting to receive.
As such, a Javascript authentication system is absolutely not safe by any definition of the word. For a local page that only you can access, it would do the job, but that begs the question of why you need authentication in the first place. Even then, as a new developer you'd be widely encouraged to never try do it anyway. There's no point practising and learning how to do something in a completely insecure way and nobody is likely to suggest it.
2: Authentication is a tricky topic
Authenticating logins is not an easy thing to do. Yes, it's easy to make a login script but it's not easy to do it properly. I would never try to discourage anyone from making something themselves nor discourage a new developer from pursuing any goal, but authentication is not something you should be learning only a week into HTML. I'm sorry if that comes across as harsh, but there are people who have been masterminding applications for years who still don't do it securely.
3: Third Party are Best
It's possible to make your own authentication system that likely only the most determined of attackers could access, but it wouldn't involve Javascript authentication. Consider Javascript to be more of a convenience to the user than a solution for the developer. Javascript can do some remarkable things, but being a trusted source of data is something it will never do. Please understand this important point, because the source code you have provided is riddled with security flaws.
--
Now, on to what you want to do. Identifying that you're the "admin" user is something you're putting a password in to do. If you could figure out you're the owner of this site before putting in your password, you wouldn't need the password, right? In short, you can't do what you want to do; not reliably, anyway. It's possible to only show those forms if you're using a particular IP, but IPs can be masked, imitated and changed, which makes it insecure.
There are several third party authentication methods that you can use to do all the heavy lifting for you. All you do is put the fields on your page and they'll handle the rest. You can use any Social Media login (Facebook, Twitter, Google Plus, etc) or you can use O Auth, which deals with all the heavy lifting of authentication for you.
I don't mean to discourage you, nor anyone else, from pursuing their own authentication methods but if I'm honest with you I think this is something way beyond your skill level that you shouldn't be considering right now.
If you serve the pages via a server, you can enforce basic HTTP auth. Should be really simple to set up and you would have the benefit of a standard of security.
Here are the Apache docs for this, for example.
There are a lot of cool tools for making powerful "single-page" JavaScript websites nowadays. In my opinion, this is done right by letting the server act as an API (and nothing more) and letting the client handle all of the HTML generation stuff. The problem with this "pattern" is the lack of search engine support. I can think of two solutions:
When the user enters the website, let the server render the page exactly as the client would upon navigation. So if I go to http://example.com/my_path directly the server would render the same thing as the client would if I go to /my_path through pushState.
Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_path the server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.
The first solution is discussed further here. I have been working on a website doing this and it's not a very nice experience. It's not DRY and in my case I had to use two different template engines for the client and the server.
I think I have seen the second solution for some good ol' Flash websites. I like this approach much more than the first one and with the right tool on the server it could be done quite painlessly.
So what I'm really wondering is the following:
Can you think of any better solution?
What are the disadvantages with the second solution? If Google in some way finds out that I'm not serving the exact same content for the Google bot as a regular user, would I then be punished in the search results?
While #2 might be "easier" for you as a developer, it only provides search engine crawling. And yes, if Google finds out your serving different content, you might be penalized (I'm not an expert on that, but I have heard of it happening).
Both SEO and accessibility (not just for disabled person, but accessibility via mobile devices, touch screen devices, and other non-standard computing / internet enabled platforms) both have a similar underlying philosophy: semantically rich markup that is "accessible" (i.e. can be accessed, viewed, read, processed, or otherwise used) to all these different browsers. A screen reader, a search engine crawler or a user with JavaScript enabled, should all be able to use/index/understand your site's core functionality without issue.
pushState does not add to this burden, in my experience. It only brings what used to be an afterthought and "if we have time" to the forefront of web development.
What your describe in option #1 is usually the best way to go - but, like other accessibility and SEO issues, doing this with pushState in a JavaScript-heavy app requires up-front planning or it will become a significant burden. It should be baked in to the page and application architecture from the start - retrofitting is painful and will cause more duplication than is necessary.
I've been working with pushState and SEO recently for a couple of different application, and I found what I think is a good approach. It basically follows your item #1, but accounts for not duplicating html / templates.
Most of the info can be found in these two blog posts:
http://lostechies.com/derickbailey/2011/09/06/test-driving-backbone-views-with-jquery-templates-the-jasmine-gem-and-jasmine-jquery/
and
http://lostechies.com/derickbailey/2011/06/22/rendering-a-rails-partial-as-a-jquery-template/
The gist of it is that I use ERB or HAML templates (running Ruby on Rails, Sinatra, etc) for my server side render and to create the client side templates that Backbone can use, as well as for my Jasmine JavaScript specs. This cuts out the duplication of markup between the server side and the client side.
From there, you need to take a few additional steps to have your JavaScript work with the HTML that is rendered by the server - true progressive enhancement; taking the semantic markup that got delivered and enhancing it with JavaScript.
For example, i'm building an image gallery application with pushState. If you request /images/1 from the server, it will render the entire image gallery on the server and send all of the HTML, CSS and JavaScript down to your browser. If you have JavaScript disabled, it will work perfectly fine. Every action you take will request a different URL from the server and the server will render all of the markup for your browser. If you have JavaScript enabled, though, the JavaScript will pick up the already rendered HTML along with a few variables generated by the server and take over from there.
Here's an example:
<form id="foo">
Name: <input id="name"><button id="say">Say My Name!</button>
</form>
After the server renders this, the JavaScript would pick it up (using a Backbone.js view in this example)
FooView = Backbone.View.extend({
events: {
"change #name": "setName",
"click #say": "sayName"
},
setName: function(e){
var name = $(e.currentTarget).val();
this.model.set({name: name});
},
sayName: function(e){
e.preventDefault();
var name = this.model.get("name");
alert("Hello " + name);
},
render: function(){
// do some rendering here, for when this is just running JavaScript
}
});
$(function(){
var model = new MyModel();
var view = new FooView({
model: model,
el: $("#foo")
});
});
This is a very simple example, but I think it gets the point across.
When I instante the view after the page loads, I'm providing the existing content of the form that was rendered by the server, to the view instance as the el for the view. I am not calling render or having the view generate an el for me, when the first view is loaded. I have a render method available for after the view is up and running and the page is all JavaScript. This lets me re-render the view later if I need to.
Clicking the "Say My Name" button with JavaScript enabled will cause an alert box. Without JavaScript, it would post back to the server and the server could render the name to an html element somewhere.
Edit
Consider a more complex example, where you have a list that needs to be attached (from the comments below this)
Say you have a list of users in a <ul> tag. This list was rendered by the server when the browser made a request, and the result looks something like:
<ul id="user-list">
<li data-id="1">Bob
<li data-id="2">Mary
<li data-id="3">Frank
<li data-id="4">Jane
</ul>
Now you need to loop through this list and attach a Backbone view and model to each of the <li> items. With the use of the data-id attribute, you can find the model that each tag comes from easily. You'll then need a collection view and item view that is smart enough to attach itself to this html.
UserListView = Backbone.View.extend({
attach: function(){
this.el = $("#user-list");
this.$("li").each(function(index){
var userEl = $(this);
var id = userEl.attr("data-id");
var user = this.collection.get(id);
new UserView({
model: user,
el: userEl
});
});
}
});
UserView = Backbone.View.extend({
initialize: function(){
this.model.bind("change:name", this.updateName, this);
},
updateName: function(model, val){
this.el.text(val);
}
});
var userData = {...};
var userList = new UserCollection(userData);
var userListView = new UserListView({collection: userList});
userListView.attach();
In this example, the UserListView will loop through all of the <li> tags and attach a view object with the correct model for each one. it sets up an event handler for the model's name change event and updates the displayed text of the element when a change occurs.
This kind of process, to take the html that the server rendered and have my JavaScript take over and run it, is a great way to get things rolling for SEO, Accessibility, and pushState support.
Hope that helps.
I think you need this: http://code.google.com/web/ajaxcrawling/
You can also install a special backend that "renders" your page by running javascript on the server, and then serves that to google.
Combine both things and you have a solution without programming things twice. (As long as your app is fully controllable via anchor fragments.)
So, it seem that the main concern is being DRY
If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
(test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
When the request is not from a google bot, you just process normally.
If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server mylink, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
For your html I'm assuming you're using normal links with some kind of hijacking (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse <a> tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.
Here are a couple of examples using phantom.js for seo:
http://backbonetutorials.com/seo-for-single-page-apps/
http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering
If you're using Rails, try poirot. It's a gem that makes it dead simple to reuse mustache or handlebars templates client and server side.
Create a file in your views like _some_thingy.html.mustache.
Render server side:
<%= render :partial => 'some_thingy', object: my_model %>
Put the template your head for client side use:
<%= template_include_tag 'some_thingy' %>
Rendre client side:
html = poirot.someThingy(my_model)
To take a slightly different angle, your second solution would be the correct one in terms of accessibility...you would be providing alternative content to users who cannot use javascript (those with screen readers, etc.).
This would automatically add the benefits of SEO and, in my opinion, would not be seen as a 'naughty' technique by Google.
Interesting. I have been searching around for viable solutions but it seems to be quite problematic.
I was actually leaning more towards your 2nd approach:
Let the server provide a special website only for the search engine
bots. If a normal user visits http://example.com/my_path the server
should give him a JavaScript heavy version of the website. But if the
Google bot visits, the server should give it some minimal HTML with
the content I want Google to index.
Here's my take on solving the problem. Although it is not confirmed to work, it might provide some insight or idea's for other developers.
Assume you're using a JS framework that supports "push state" functionality, and your backend framework is Ruby on Rails. You have a simple blog site and you would like search engines to index all your article index and show pages.
Let's say you have your routes set up like this:
resources :articles
match "*path", "main#index"
Ensure that every server-side controller renders the same template that your client-side framework requires to run (html/css/javascript/etc). If none of the controllers are matched in the request (in this example we only have a RESTful set of actions for the ArticlesController), then just match anything else and just render the template and let the client-side framework handle the routing. The only difference between hitting a controller and hitting the wildcard matcher would be the ability to render content based on the URL that was requested to JavaScript-disabled devices.
From what I understand it is a bad idea to render content that isn't visible to browsers. So when Google indexes it, people go through Google to visit a given page and there isn't any content, then you're probably going to be penalised. What comes to mind is that you render content in a div node that you display: none in CSS.
However, I'm pretty sure it doesn't matter if you simply do this:
<div id="no-js">
<h1><%= #article.title %></h1>
<p><%= #article.description %></p>
<p><%= #article.content %></p>
</div>
And then using JavaScript, which doesn't get run when a JavaScript-disabled device opens the page:
$("#no-js").remove() # jQuery
This way, for Google, and for anyone with JavaScript-disabled devices, they would see the raw/static content. So the content is physically there and is visible to anyone with JavaScript-disabled devices.
But, when a user visits the same page and actually has JavaScript enabled, the #no-js node will be removed so it doesn't clutter up your application. Then your client-side framework will handle the request through it's router and display what a user should see when JavaScript is enabled.
I think this might be a valid and fairly easy technique to use. Although that might depend on the complexity of your website/application.
Though, please correct me if it isn't. Just thought I'd share my thoughts.
Use NodeJS on the serverside, browserify your clientside code and route each http-request's(except for static http resources) uri through a serverside client to provide the first 'bootsnap'(a snapshot of the page it's state). Use something like jsdom to handle jquery dom-ops on the server. After the bootsnap returned, setup the websocket connection. Probably best to differentiate between a websocket client and a serverside client by making some kind of a wrapper connection on the clientside(serverside client can directly communicate with the server). I've been working on something like this: https://github.com/jvanveen/rnet/
Use Google Closure Template to render pages. It compiles to javascript or java, so it is easy to render the page either on the client or server side. On the first encounter with every client, render the html and add javascript as link in header. Crawler will read the html only but the browser will execute your script. All subsequent requests from the browser could be done in against the api to minimize the traffic.
This might help you : https://github.com/sharjeel619/SPA-SEO
Logic
A browser requests your single page application from the server,
which is going to be loaded from a single index.html file.
You program some intermediary server code which intercepts the client
request and differentiates whether the request came from a browser or
some social crawler bot.
If the request came from some crawler bot, make an API call to
your back-end server, gather the data you need, fill in that data to
html meta tags and return those tags in string format back to the
client.
If the request didn't come from some crawler bot, then simply
return the index.html file from the build or dist folder of your single page
application.
Let's assume that I have created my REST service smoothly and I am returning json results.
I also implemented API key for my users to communicate for my service.
Then Company A started using my service and I gave them an API key.
Then they created an HttpHandler for bridge (I am not sure what is the term here) in order not to expose API key (I am also not sure it is the right way).
For example, lets assume that my service url is as follows :
www.myservice.com/service?apikey={key_comes_here}
Company A is using this service from client side like below :
www.companyA.com/services/service1.ashx
Then they start using it on the client side.
Company A protected the api key here. That's fine.
But there is another problem here. Somebody else can still grab www.companyA.com/services/service1.ashx url and starts using my service.
What is the way of preventing others from doing that?
For the record, I am using WCF Web API in order to create my REST services.
UPDATE :
Company A's HttpHandler (second link) only looks at the host header in order to see if it is coming from www.companyA.com or not. but in can be faked easily I guess.
UPDATE 2 :
Is there any known way of implementing a Token for the url. For example, lets say that www.companyA.com/services/service1.ashx will carry a querystring parameter representing a TOKEN in order for HttpHandler to check if the request is the right one.
But there are many things here to think about I guess.
You could always require the client to authenticate, using HTTP Basic Auth or some custom scheme. If your client requires the user to login, you can at least restrict the general public from obtaining the www.companyA.com/services/service1.ashx URL, since they will need to login to find out about it.
It gets harder if you are also trying to protect the URL from unintended use by people who legitimately have access to the official client. You could try changing the service password at regular intervals, and updating the client along with it. That way a refresh of the client in-browser would pull the new password, but anyone who built custom code would be out of date. Of course, a really determined user could just write code to rip the password from the client JS programmatically when it changes, but you would at least protect against casual infringers.
With regard to the URL token idea you mentioned in update 2, it could work something like this. Imagine every month, the www.companyA.com/services/service1.ashx URL requires a new token to work, e.g. www.companyA.com/services/service1.ashx?token=January. Once it's February, 'January' will stop working. The server will have to know to only accept current month, and client will have to know to send a token (determined at the time the client web page loads from the server in the browser)
(All pseudo-code since I don't know C# and which JS framework you will use)
Server-side code:
if (request.urlVars.token == Date.now.month) then
render "This is the real data: [2,5,3,5,3]"
else
render "401 Unauthorized"
Client code (dynamic version served by your service)
www.companyA.com/client/myajaxcode.js.asp
var dataUrl = 'www.companyA.com/services/service1.ashx?token=' + <%= Date.now.month %>
// below is JS code that does ajax call using dataUrl
...
So now we have service code that will only accept the current month as a token, and client code that when you refresh in the browser gets the latest token (set dynamically as current month). Since this scheme is really predictable and could be hacked, the remaining step is to salted hash the token so no one can guess what it is going to be .
if (request.urlVars.token == mySaltedHashMethod(Date.now.month)) then
and
var dataUrl = 'www.companyA.com/services/service1.ashx?token=' + <%= mySaltedHashMethod(Date.now.month) %>
Which would leave you with a URL like www.companyA.com/services/service1.ashx?token=gy4dc8dgf3f and would change tokens every month.
You would probably want to expire faster than every month as well, which you could do my using epoch hour instead of month.
I'd be interested to see if someone out there has solved this with some kind of encrypted client code!
What you're describing is generally referred to as a "proxy" -- companyA's public page is available to anyone, and behind the scenes, it makes the right calls to your system. It's not uncommon for applications to use proxies to get around security -- for example, the same-origin policy means that your javascript can't make Ajax calls to, say, Amazon -- but if you proxy it on your own system, you can get around this.
I can't really think of a technical way to prevent this; once they've pulled data from your service, they can use that data however they want. You have legal options, of course; you can make it a term of service that proxying isn't allowed, and pull their API key if they don't comply. But most likely, if you haven't already included that in the TOS, you'd have to wait for, say, a renewal of their subscription to your service.
Presumably if they're making server-side HTTP requests to your service, those requests are all coming from the same IP address, so you could block that address. You'd probably want to tell them first, and they could certainly get around that if they wanted to.
With the second link exposed by Company A I don't think you can do much. As I understand it, you can only check whether the incoming request comes from Company A or not.
But each request issued to www.companyA.com/.. can't be distinguished from original request from Company A. Everyone they let in uses their referrer as a disguise.
There are a lot of cool tools for making powerful "single-page" JavaScript websites nowadays. In my opinion, this is done right by letting the server act as an API (and nothing more) and letting the client handle all of the HTML generation stuff. The problem with this "pattern" is the lack of search engine support. I can think of two solutions:
When the user enters the website, let the server render the page exactly as the client would upon navigation. So if I go to http://example.com/my_path directly the server would render the same thing as the client would if I go to /my_path through pushState.
Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_path the server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.
The first solution is discussed further here. I have been working on a website doing this and it's not a very nice experience. It's not DRY and in my case I had to use two different template engines for the client and the server.
I think I have seen the second solution for some good ol' Flash websites. I like this approach much more than the first one and with the right tool on the server it could be done quite painlessly.
So what I'm really wondering is the following:
Can you think of any better solution?
What are the disadvantages with the second solution? If Google in some way finds out that I'm not serving the exact same content for the Google bot as a regular user, would I then be punished in the search results?
While #2 might be "easier" for you as a developer, it only provides search engine crawling. And yes, if Google finds out your serving different content, you might be penalized (I'm not an expert on that, but I have heard of it happening).
Both SEO and accessibility (not just for disabled person, but accessibility via mobile devices, touch screen devices, and other non-standard computing / internet enabled platforms) both have a similar underlying philosophy: semantically rich markup that is "accessible" (i.e. can be accessed, viewed, read, processed, or otherwise used) to all these different browsers. A screen reader, a search engine crawler or a user with JavaScript enabled, should all be able to use/index/understand your site's core functionality without issue.
pushState does not add to this burden, in my experience. It only brings what used to be an afterthought and "if we have time" to the forefront of web development.
What your describe in option #1 is usually the best way to go - but, like other accessibility and SEO issues, doing this with pushState in a JavaScript-heavy app requires up-front planning or it will become a significant burden. It should be baked in to the page and application architecture from the start - retrofitting is painful and will cause more duplication than is necessary.
I've been working with pushState and SEO recently for a couple of different application, and I found what I think is a good approach. It basically follows your item #1, but accounts for not duplicating html / templates.
Most of the info can be found in these two blog posts:
http://lostechies.com/derickbailey/2011/09/06/test-driving-backbone-views-with-jquery-templates-the-jasmine-gem-and-jasmine-jquery/
and
http://lostechies.com/derickbailey/2011/06/22/rendering-a-rails-partial-as-a-jquery-template/
The gist of it is that I use ERB or HAML templates (running Ruby on Rails, Sinatra, etc) for my server side render and to create the client side templates that Backbone can use, as well as for my Jasmine JavaScript specs. This cuts out the duplication of markup between the server side and the client side.
From there, you need to take a few additional steps to have your JavaScript work with the HTML that is rendered by the server - true progressive enhancement; taking the semantic markup that got delivered and enhancing it with JavaScript.
For example, i'm building an image gallery application with pushState. If you request /images/1 from the server, it will render the entire image gallery on the server and send all of the HTML, CSS and JavaScript down to your browser. If you have JavaScript disabled, it will work perfectly fine. Every action you take will request a different URL from the server and the server will render all of the markup for your browser. If you have JavaScript enabled, though, the JavaScript will pick up the already rendered HTML along with a few variables generated by the server and take over from there.
Here's an example:
<form id="foo">
Name: <input id="name"><button id="say">Say My Name!</button>
</form>
After the server renders this, the JavaScript would pick it up (using a Backbone.js view in this example)
FooView = Backbone.View.extend({
events: {
"change #name": "setName",
"click #say": "sayName"
},
setName: function(e){
var name = $(e.currentTarget).val();
this.model.set({name: name});
},
sayName: function(e){
e.preventDefault();
var name = this.model.get("name");
alert("Hello " + name);
},
render: function(){
// do some rendering here, for when this is just running JavaScript
}
});
$(function(){
var model = new MyModel();
var view = new FooView({
model: model,
el: $("#foo")
});
});
This is a very simple example, but I think it gets the point across.
When I instante the view after the page loads, I'm providing the existing content of the form that was rendered by the server, to the view instance as the el for the view. I am not calling render or having the view generate an el for me, when the first view is loaded. I have a render method available for after the view is up and running and the page is all JavaScript. This lets me re-render the view later if I need to.
Clicking the "Say My Name" button with JavaScript enabled will cause an alert box. Without JavaScript, it would post back to the server and the server could render the name to an html element somewhere.
Edit
Consider a more complex example, where you have a list that needs to be attached (from the comments below this)
Say you have a list of users in a <ul> tag. This list was rendered by the server when the browser made a request, and the result looks something like:
<ul id="user-list">
<li data-id="1">Bob
<li data-id="2">Mary
<li data-id="3">Frank
<li data-id="4">Jane
</ul>
Now you need to loop through this list and attach a Backbone view and model to each of the <li> items. With the use of the data-id attribute, you can find the model that each tag comes from easily. You'll then need a collection view and item view that is smart enough to attach itself to this html.
UserListView = Backbone.View.extend({
attach: function(){
this.el = $("#user-list");
this.$("li").each(function(index){
var userEl = $(this);
var id = userEl.attr("data-id");
var user = this.collection.get(id);
new UserView({
model: user,
el: userEl
});
});
}
});
UserView = Backbone.View.extend({
initialize: function(){
this.model.bind("change:name", this.updateName, this);
},
updateName: function(model, val){
this.el.text(val);
}
});
var userData = {...};
var userList = new UserCollection(userData);
var userListView = new UserListView({collection: userList});
userListView.attach();
In this example, the UserListView will loop through all of the <li> tags and attach a view object with the correct model for each one. it sets up an event handler for the model's name change event and updates the displayed text of the element when a change occurs.
This kind of process, to take the html that the server rendered and have my JavaScript take over and run it, is a great way to get things rolling for SEO, Accessibility, and pushState support.
Hope that helps.
I think you need this: http://code.google.com/web/ajaxcrawling/
You can also install a special backend that "renders" your page by running javascript on the server, and then serves that to google.
Combine both things and you have a solution without programming things twice. (As long as your app is fully controllable via anchor fragments.)
So, it seem that the main concern is being DRY
If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
(test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
When the request is not from a google bot, you just process normally.
If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server mylink, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
For your html I'm assuming you're using normal links with some kind of hijacking (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse <a> tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.
Here are a couple of examples using phantom.js for seo:
http://backbonetutorials.com/seo-for-single-page-apps/
http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering
If you're using Rails, try poirot. It's a gem that makes it dead simple to reuse mustache or handlebars templates client and server side.
Create a file in your views like _some_thingy.html.mustache.
Render server side:
<%= render :partial => 'some_thingy', object: my_model %>
Put the template your head for client side use:
<%= template_include_tag 'some_thingy' %>
Rendre client side:
html = poirot.someThingy(my_model)
To take a slightly different angle, your second solution would be the correct one in terms of accessibility...you would be providing alternative content to users who cannot use javascript (those with screen readers, etc.).
This would automatically add the benefits of SEO and, in my opinion, would not be seen as a 'naughty' technique by Google.
Interesting. I have been searching around for viable solutions but it seems to be quite problematic.
I was actually leaning more towards your 2nd approach:
Let the server provide a special website only for the search engine
bots. If a normal user visits http://example.com/my_path the server
should give him a JavaScript heavy version of the website. But if the
Google bot visits, the server should give it some minimal HTML with
the content I want Google to index.
Here's my take on solving the problem. Although it is not confirmed to work, it might provide some insight or idea's for other developers.
Assume you're using a JS framework that supports "push state" functionality, and your backend framework is Ruby on Rails. You have a simple blog site and you would like search engines to index all your article index and show pages.
Let's say you have your routes set up like this:
resources :articles
match "*path", "main#index"
Ensure that every server-side controller renders the same template that your client-side framework requires to run (html/css/javascript/etc). If none of the controllers are matched in the request (in this example we only have a RESTful set of actions for the ArticlesController), then just match anything else and just render the template and let the client-side framework handle the routing. The only difference between hitting a controller and hitting the wildcard matcher would be the ability to render content based on the URL that was requested to JavaScript-disabled devices.
From what I understand it is a bad idea to render content that isn't visible to browsers. So when Google indexes it, people go through Google to visit a given page and there isn't any content, then you're probably going to be penalised. What comes to mind is that you render content in a div node that you display: none in CSS.
However, I'm pretty sure it doesn't matter if you simply do this:
<div id="no-js">
<h1><%= #article.title %></h1>
<p><%= #article.description %></p>
<p><%= #article.content %></p>
</div>
And then using JavaScript, which doesn't get run when a JavaScript-disabled device opens the page:
$("#no-js").remove() # jQuery
This way, for Google, and for anyone with JavaScript-disabled devices, they would see the raw/static content. So the content is physically there and is visible to anyone with JavaScript-disabled devices.
But, when a user visits the same page and actually has JavaScript enabled, the #no-js node will be removed so it doesn't clutter up your application. Then your client-side framework will handle the request through it's router and display what a user should see when JavaScript is enabled.
I think this might be a valid and fairly easy technique to use. Although that might depend on the complexity of your website/application.
Though, please correct me if it isn't. Just thought I'd share my thoughts.
Use NodeJS on the serverside, browserify your clientside code and route each http-request's(except for static http resources) uri through a serverside client to provide the first 'bootsnap'(a snapshot of the page it's state). Use something like jsdom to handle jquery dom-ops on the server. After the bootsnap returned, setup the websocket connection. Probably best to differentiate between a websocket client and a serverside client by making some kind of a wrapper connection on the clientside(serverside client can directly communicate with the server). I've been working on something like this: https://github.com/jvanveen/rnet/
Use Google Closure Template to render pages. It compiles to javascript or java, so it is easy to render the page either on the client or server side. On the first encounter with every client, render the html and add javascript as link in header. Crawler will read the html only but the browser will execute your script. All subsequent requests from the browser could be done in against the api to minimize the traffic.
This might help you : https://github.com/sharjeel619/SPA-SEO
Logic
A browser requests your single page application from the server,
which is going to be loaded from a single index.html file.
You program some intermediary server code which intercepts the client
request and differentiates whether the request came from a browser or
some social crawler bot.
If the request came from some crawler bot, make an API call to
your back-end server, gather the data you need, fill in that data to
html meta tags and return those tags in string format back to the
client.
If the request didn't come from some crawler bot, then simply
return the index.html file from the build or dist folder of your single page
application.