Elegant methods for caching search results from RESTful service?

Elegant methods for caching search results from RESTful service? - javascript

I have a RESTful web service which I access from the browser using JavaScript. As an example, say that this web service returns a list of all the Message resources assigned to me when I send a GET request to /messages/me. For performance reasons, I'd like to cache this response so that I don't have to re-fetch it every time I visit my Manage Messages web page. The cached response would expire after 5 minutes.
If a Message resource is created "behind my back", say by the system admin, it's possible that I won't know about it for up to 5 minutes, until the cached search response expires and is re-fetched. This is acceptable, because it creates no confusion for me.
However if I create a new Message resource which I know should be part of the search response, it becomes confusing when it doesn't appear on my Manage Messages page immediately. In general, when I knowingly create/delete/update a resource that invalidates a cached search response, I need that cached response to be expired/flushed immediately.
The core problem which I can't figure out:
I see no simple way of connecting the task of creating/deleting/updating a resource with the task of expiring the appropriate cached responses. In this example it seems simple, I could manually expire the cached search response whenever I create/delete/update a(ny) Message resource. But in a more complex system, keeping track of which search responses to expire under what circumstances will get clumsy quickly.

Use E-Tag and If-None-Match headers to ensure that the client is always accessing the most up-to-date information.
The down-side to this is you will always make a call to the server to find out if anything had changed. The entire message will not be re-transmitted if nothing changed, and the server will/should simply respond back with a 304 Not Modified response in that case. If the content had changed, then the new message(s) will be transmitted as a response.
If the server is responsive (10-50 ms), then most users with a decent latency (50-500ms) should see no noticeable difference.
This increases the load on the server as it will have to verify for each request whether the received E-Tag matches with the current E-Tag for that resource. Clients never assume that a resource is valid/stale/expired, they always ping the server and find out.

To quote Phil Karlton: "There are only two hard problems in Computer Science: cache invalidation and naming things."
If you are using a comprehensive data access layer, that would be the place to handle cache invalidation (although it's still not easy). You'd just tie in some cache invalidation logic to your logic for saving a Message so it clears the search cache for the assignee of the message.

The browser cache should automatically invalidate the cache when you do a POST to the same URI that you did a GET from. See this article, particularly the section on POST invalidation.

The simplest solution would be using a sever-side cache (like EhCache, for example) :)
You will have less problems with consistency (as you wouldn't need to push changes to your JavaScript) and expiration.

Related

How to deliberately refresh or crash browser tab using response from AJAX request

Yesterday we pushed code to production that included a polling mechanism (via setInterval() in Javascript) that makes an AJAX request every 15 seconds to keep the clients up-to-date with the server. Although only about 450 people were using our site at any given time today, it seems many of our users keep our site open even when they're not using it. Like, a lot of users.
Within 12 hours on a Sunday, we had effectively DDoS'd ourselves. About 3,500 people had left our site open on their browsers, meaning 200 requests per second to this PHP endpoint. With a KeepAlive of 5, this triggered our Apache server to quickly hit its MaxClients limit, which choked new connections from being established, causing random errors for existing users, etc. We raised that limit and lowered the KeepAlive time without issue, but the real fix came an hour later when we changed the setInterval() to also consider document.visibilityState == "visible", so that backgrounded tabs won't hammer our server with polling. (In case you're wondering by this point, we will be moving to silent push notifications instead of polling even sooner than we were planning after this experience).
That fix should work for new users, but it leaves us with those 3,500 users who still have our site open on their computer with the bad code that is indiscriminately hitting us with requests even when they're not using the site. We need them to get the new code ASAP to stop the DDoS, or induce their tab to freeze so that the web requests from their browser stop. We've tested a couple ideas on Chrome and Safari, but none of them worked.
The first was inducing a page refresh via PHP's header("Refresh:0");. We tried including a couple of variations of this in our endpoint, but it doesn't seem like a response header from an AJAX request can induce a page refresh. We also tried responding to the request with HTML echo '<meta http-equiv="refresh" content="0">'; but that didn't work either, possibly because the AJAX request is expecting JSON, not HTML, and changing the content type of the response wasn't enough.
The second was to crash the page by overloading the response to this endpoint with data. We tried adding multiple bin2hex(openssl_random_pseudo_bytes(5000000))s to the response as variables that get written to local storage in the browser. This did get the browser to freeze and use up to 1GB of RAM, but even with the interface completely unresponsive, the tab didn't "crash" and web requests continued going out, so this method didn't work either.
Update: a third thing we tried was doing a sleep(9999999) in the PHP file that they're hitting. Since browsers will only will make up to 6 simultaneous requests to a given domain, we figure once these clients have made 6 requests to the endpoint, further requests will not be made since these 6 will hang indefinitely. We tried pushing this to production and things didn't go well: within 30 seconds Apache was even more overloaded than before, since now the requests were piling up without finishing. So we had to restart Apache (which in-turn cancelled all the hung requests, returning us back to the prior state). We think some variation of employing the fact that a browser will only make up to 6 simultaneous requests to a domain might work, but we're not sure how to do use that fact.
What else can we try?

(I'm too new to comment, so I have to make this into an answer)
Often handling a request at server level, rather than at application level is at least an order of magnitude cheaper. Given that your application likely hits the database, restores the session, does a bunch of routing and so before getting to the point where you can reject the request.
I would still suggest deprecating the problematic url.
If you return a HTTP 410 GONE instead of a 404
and you add cache control headers, you might convince the browser to serve the results from cache instead of actually making the call.
Cache-Control: public, max-age=31536000
This assumes that you use didn't use a cache buster parameter in your polling mechanism of course. If every url is new and unique, caching won't save you.

I would suggest pushing a new version of the website with a changed url for the ajax request. After that you can add a rewrite rule to your .htaccess causing the old ajax url to return a 404 instead of being handled by your PHP application.
This should relieve the pressure.
Good luck!

Can POST request be an alternative to GET request in most of the scenarios?

I wet thru http://www.w3schools.com/tags/ref_httpmethods.asp and wondered why should i not always prefer
POST request over GET request. I can think of two scnarios where i have to use POST request instead of GET
Request. These are:-
1)Where i have requirement to bookmark the the URL
2)Where my requirement is to cache the web page (as POST request does
not cache the web page), so that next time same url is hit it can be
obtained from cache and optimize the performance.
I agree POST request is designed to create/update the resource where GET request is designed
to retrieve the resource.Though,techincally they can be used vice versa also.
So i was wondering
is it not always benefecial to use POST request over GET request(except the two requirements i mentioned
above) as Post is more secure? Is my understanding correct?

There are many reasons to use HTTP the way it was meant to be used. Here's a couple:
The value of the web is built on URLs. Every time you provide a page which is obtainable only via POST, you are denying the option to link to it, as well as to bookmark it. (Obviously a form button can still be made, but that's not as convenient.) Even if the page is some kind of “service”, there is still often value in linking — that you won't have thought of beforehand.
If the user reloads a page obtained via POST, most web browsers will warn that they are “resubmitting a form” and confirm the action. This is because in poorly designed applications this can result in things like placing a duplicate order or posting a duplicate message. Therefore, using GET for requests which do not have side effects eliminates this unnecessary warning. In fact, a useful practice for POSTs which have effects is to make the response to them be a redirect to a URL (which the browser will GET) for a page describing the results of the action (for example, if the POST posted a comment, it would then redirect to a link to the comment); this way the page can be reloaded (which could be implicit e.g. if the browser were restarted) without any ambiguity about whether it's re-executing the action.

Prevent recursive calls of XmlHttpRequest to server

I've been googling for hours for this issue, but did not find any solution.
I am currently working on this app, built on Meteor.
Now the scenario is, after the website is opened and all the assets have been loaded in browser, the browser constantly makes recursive xhr calls to server. These calls are made at the regular interval of 25 seconds.
This can be seen in the Network tab of browser console. See the Pending request of the last row in image.
I can't figure out from where it originates, and why it is invoked automatically even when the user is idle.
Now the question is, How can I disable these automatic requests? I want to invoke the requests manually, i.e. when the menu item is selected, etc.
Any help will be appriciated.
[UPDATE]
In response to the Jan Dvorak's comment:
When I type "e" in the search box, the the list of events which has name starting with letter "e" will be displayed.
The request goes with all valid parameters and the Payload like this:
["{\"msg\":\"sub\",\"id\":\"8ef5e419-c422-429a-907e-38b6e669a493\",\"name\":\"event_Coll_Search_by_PromoterName\",\"params\":[\"e\"]}"]
And this is the response, which is valid.
a["{\"msg\":\"data\",\"subs\":[\"8ef5e419-c422-429a-907e-38b6e669a493\"]}"]
The code for this action is posted here
But in the case of automatic recursive requests, the request goes without the payload and the response is just a letter "h", which is strange. Isn't it? How can I get rid of this.?

Meteor has a feature called
Live page updates.
Just write your templates. They automatically update when data in the database changes. No more boilerplate redraw code to write. Supports any templating language.
To support this feature, Meteor needs to do some server-client communication behind the scenes.
Traditionally, HTTP was created to fetch dead data. The client tells the server it needs something, and it gets something. There is no way for the server to tell the client it needs something. Later, it became needed to push some data to the client. Several alternatives came to existence:
polling:
The client makes periodic requests to the server. The server responds with new data or says "no data" immediately. It's easy to implement and doesn't use much resources. However, it's not exactly live. It can be used for a news ticker but it's not exactly good for a chat application.
If you increase the polling frequency, you improve the update rate, but the resource usage grows with the polling frequency, not with the data transfer rate. HTTP requests are not exactly cheap. One request per second from multiple clients at the same time could really hurt the server.
hanging requests:
The client makes a request to the server. If the server has data, it sends them. If the server doesn't have data, it doesn't respond until it does. The changes are picked up immediately, no data is transferred when it doesn't need to be. It does have a few drawbacks, though:
If a web proxy sees that the server is silent, it eventually cuts off the connection. This means that even if there is no data to send, the server needs to send a keep-alive response anyways to make the proxies (and the web browser) happy.
Hanging requests don't use up (much) bandwidth, but they do take up memory. Nowadays' servers can handle multiple concurrent TCP connections, so it's less of an issue than it was before. What does need to be considered is the amount of memory associated with the threads holding on to these requests - especially when the connections are tied to specific threads serving them.
Browsers have hard limits on the number of concurrent requests per domain and in total. Again, this is less of a concern now than it was before. Thus, it seems like a good idea to have one hanging request per session only.
Managing hanging requests feels kinda manual as you have to make a new request after each response. A TCP handshake takes some time as well, but we can live with a 300ms (at worst) refractory period.
Chunked response:
The client creates a hidden iFrame with a source corresponding to the data stream. The server responds with an HTTP response header immediately and leaves the connection open. To send a message, the server wraps it in a pair of <script></script> tags that the browser executes when it receives the closing tag. The upside is that there's no connection reopening but there is more overhead with each message. Moreover, this requires a callback in the global scope that the response calls.
Also, this cannot be used with cross-domain requests as cross-domain iFrame communication presents its own set of problems. The need to trust the server is also a challenge here.
Web Sockets:
These start as a normal HTTP connection but they don't actually follow the HTTP protocol later on. From the programming point of view, things are as simple as they can be. The API is a classic open/callback style on the client side and the server just pushes messages into an open socket. No need to reopen anything after each message.
There still needs to be an open connection, but it's not really an issue here with the browser limits out of the way. The browser knows the connection is going to be open for a while, so it doesn't need to apply the same limits as to normal requests.
These seem like the ideal solution, but there is one major issue: IE<10 doesn't know them. As long as IE8 is alive, web sockets cannot be relied upon. Also, the native Android browser and Opera mini are out as well (ref.).
Still, web sockets seem to be the way to go once IE8 (and IE9) finally dies.
What you see are hanging requests with the timeout of 25 seconds that are used to implement the live update feature. As I already said, the keep-alive message ("h") is used so that the browser doesn't think it's not going to get a response. "h" simply means "nothing happens".
Chrome supports web sockets, so Meteor could have used them with a fallback to long requests, but, frankly, hanging requests are not at all bad once you've got them implemented (sure, the browser connection limit still applies).

Using non persistent Http Cookies to deliver out of band data to the browser

Imagine that your web application maintains a hit counter for one or multiple pages and that it also aggressively caches those pages for anonymous visitors. This poses the problem that at least the hitcount would be out of date for those visitors because although the hitcounter is accurately maintained on the server even for those visitors, they would see the old cached page for a while.
What if the server would continue to serve them the cached page but would pass the updated counter in a non-persistent http cookie to be read by a piece of javascript in the page that would inject the updated counter into the DOM.
Opinions?

You are never going to keep track of the visitors in this manner. If you are aggressively caching pages, intermediate proxies and browsers are also going to cache your pages. And so the request may not even reach your server for you to track.
The best way to do so would be to use an approach similar to google analytics. When the page is loaded, send an AJAX request to the server. This ajax request would increment the current counter value on the server, and return the latest value. Then the client side could could show the value returned by the server using javascript.
This approach allows you to cache as aggressively as you want without losing the ability to keep track of your visitors.

you can also get the page programmatically via asp or php out the cache yourself and replace the hitcounter.

Ajax Security

We have a heavy Ajax dependent application. What are the good ways of making it sure that the request to server side scripts are not coming through standalone programs and are through an actual user sitting on a browser

There aren't any really.
Any request sent through a browser can be faked up by standalone programs.
At the end of the day does it really matter? If you're worried then make sure requests are authenticated and authorised and your authentication process is good (remember Ajax sends browser cookies - so your "normal" authentication will work just fine). Just remember that, of course, standalone programs can authenticate too.

What are the good ways of making it sure that the request to server side scripts are not coming through standalone programs and are through an actual user sitting on a browser
There are no ways. A browser is indistinguishable from a standalone program; a browser can be automated.
You can't trust any input from the client side. If you are relying on client-side co-operation for any security purpose, you're doomed.

There isn't a way to automatically block "non browser user" requests hitting your server side scripts, but there are ways to identify which scripts have been triggered by your application and which haven't.
This is usually done using something called "crumbs". The basic idea is that the page making the AJAX request should generate (server side) a unique token (which is typically a hash of unix timestamp + salt + secret). This token and timestamp should be passed as parameters to the AJAX request. The AJAX handler script will first check this token (and the validity of the unix timestamp e.g. if it falls within 5 minutes of the token timestamp). If the token checks out, you can then proceed to fulfill this request. Usually, this token generation + checking can be coded up as an Apache module so that it is triggered automatically and is separate from the application logic.
Fraudulent scripts won't be able to generate valid tokens (unless they figure out your algorithm) and so you can safely ignore them.
Keep in mind that storing a token in the session is also another way, but that won't buy any more security than your site's authentication system.

I'm not sure what you are worried about. From where I sit I can see three things your question can be related to:
First, you may want to prevent unauthorized users from making a valid request. This is resolve by using the browser's cookie to store a session ID. The session ID needs to tied to the user, be regenerated every time the user goes through the login process and must have an inactivity timeout. Anybody request coming in without a valid session ID you simply reject.
Second, you may want to prevent a third party from doing a replay attacks against your site (i.e. sniffing an inocent user's traffic and then sending the same calls over). The easy solution is to go over https for this. The SSL layer will prevent somebody from replaying any part of the traffic. This comes at a cost on the server side so you want to make sure that you really cannot take that risk.
Third, you may want to prevent somebody from using your API (that's what AJAX calls are in the end) to implement his own client to your site. For this there is very little you can do. You can always look for the appropriate User-Agent but that's easy to fake and will be probably the first thing somebody trying to use your API will think of. You can always implement some statistics, for example looking at the average AJAX requests per minute on a per user basis and see if some user are way above your average. It's hard to implement and it's only usefull if you are trying to prevent automated clients reacting faster than human can.

Is Safari a webbrowser for you?
If it is, the same engine you got in many applications, just to say those using QT QWebKit libraries. So I would say, no way to recognize it.
User can forge any request one wants - faking the headers like UserAgent any they like...
One question: why would you want to do what you ask for? What's the diffrence for you if they request from browser or from anythning else?
Can't think of one reason you'd call "security" here.
If you still want to do this, for whatever reason, think about making your own application, with a browser embedded. It could somehow authenticate to the application in every request - then you'd only send a valid responses to your application's browser.
User would still be able to reverse engineer the application though.

Interesting question.
What about browsers embedded in applications? Would you mind those?
You can probably think of a way of "proving" that a request comes from a browser, but it will ultimately be heuristic. The line between browser and application is blurry (e.g. embedded browser) and you'd always run the risk of rejecting users from unexpected browsers (or unexpected versions thereof).

As been mentioned before there is no way of accomplishing this... But there is a thing to note, useful for preventing against CSRF attacks that target the specific AJAX functionality; like setting a custom header with help of the AJAX object, and verifying that header on the server side.
And if in the value of that header, you set a random (one time use) token you can prevent automated attacks.

Develop Reference

JavaScript is the programming language of the Web.