Browser drops POST when Apache KeepAlive is on - javascript

For many years I have had apache keep-alive turned on for performance reasons. It allows connections to be reused and makes my pages load subtly faster. However, in the last several months a strange issue has started to happen.
Sometimes, a connection from a user's browser to my application gets dropped which causes data to not get saved and an error to be presented. I have done a considerable amount of testing and think that I have narrowed the problem down. It doesn't matter which browser I use. The database and server side scripting are not a factor. It only happens with POSTs not with GETs which is interesting. It goes away if I disable keep-alive.
Here is what I think is happening. I have KeepAliveTimeout set to 1 second. After 1 second, the server terminates the connection but it takes a short amount of time (lets say 100ms) for the client to realize it was terminated. So, between 1 second and 1.1 seconds, if the client attempts to reuse the connection and POST some data, that POST will fail. I've reproduced this by making a script that POSTs some data exactly at 1 second intervals, and I can see every other connection from the client getting dropped. If I change the script to POST at 0.9 second intervals or 1.1 second intervals it never drops a connection because the specific timing window is avoided. If I change KeepAliveTimeout to 2 seconds or some other number, then it just pushes out the timing window and doesn't really solve the problem.
My POSTs are coming from javascript (jquery.ajax), but I imagine it could happen from a regular form POST as well if you got the timing right.
In Safari and IE, the connection gets immediately dropped and fails. In Firefox and Chrome the browser stalls for dozens of seconds and then re-sends the request on a new connection which succeeds.
If this is just a fundamental problem with keepalive it is confusing to me why this worked for years and only started doing this in the last few months. Temporarily, I have disabled keep-alive, but I would like to find a way to use it if possible. And I am hoping that someone here knows of a solution.

Related

How to deliberately refresh or crash browser tab using response from AJAX request

Yesterday we pushed code to production that included a polling mechanism (via setInterval() in Javascript) that makes an AJAX request every 15 seconds to keep the clients up-to-date with the server. Although only about 450 people were using our site at any given time today, it seems many of our users keep our site open even when they're not using it. Like, a lot of users.
Within 12 hours on a Sunday, we had effectively DDoS'd ourselves. About 3,500 people had left our site open on their browsers, meaning 200 requests per second to this PHP endpoint. With a KeepAlive of 5, this triggered our Apache server to quickly hit its MaxClients limit, which choked new connections from being established, causing random errors for existing users, etc. We raised that limit and lowered the KeepAlive time without issue, but the real fix came an hour later when we changed the setInterval() to also consider document.visibilityState == "visible", so that backgrounded tabs won't hammer our server with polling. (In case you're wondering by this point, we will be moving to silent push notifications instead of polling even sooner than we were planning after this experience).
That fix should work for new users, but it leaves us with those 3,500 users who still have our site open on their computer with the bad code that is indiscriminately hitting us with requests even when they're not using the site. We need them to get the new code ASAP to stop the DDoS, or induce their tab to freeze so that the web requests from their browser stop. We've tested a couple ideas on Chrome and Safari, but none of them worked.
The first was inducing a page refresh via PHP's header("Refresh:0");. We tried including a couple of variations of this in our endpoint, but it doesn't seem like a response header from an AJAX request can induce a page refresh. We also tried responding to the request with HTML echo '<meta http-equiv="refresh" content="0">'; but that didn't work either, possibly because the AJAX request is expecting JSON, not HTML, and changing the content type of the response wasn't enough.
The second was to crash the page by overloading the response to this endpoint with data. We tried adding multiple bin2hex(openssl_random_pseudo_bytes(5000000))s to the response as variables that get written to local storage in the browser. This did get the browser to freeze and use up to 1GB of RAM, but even with the interface completely unresponsive, the tab didn't "crash" and web requests continued going out, so this method didn't work either.
Update: a third thing we tried was doing a sleep(9999999) in the PHP file that they're hitting. Since browsers will only will make up to 6 simultaneous requests to a given domain, we figure once these clients have made 6 requests to the endpoint, further requests will not be made since these 6 will hang indefinitely. We tried pushing this to production and things didn't go well: within 30 seconds Apache was even more overloaded than before, since now the requests were piling up without finishing. So we had to restart Apache (which in-turn cancelled all the hung requests, returning us back to the prior state). We think some variation of employing the fact that a browser will only make up to 6 simultaneous requests to a domain might work, but we're not sure how to do use that fact.
What else can we try?
(I'm too new to comment, so I have to make this into an answer)
Often handling a request at server level, rather than at application level is at least an order of magnitude cheaper. Given that your application likely hits the database, restores the session, does a bunch of routing and so before getting to the point where you can reject the request.
I would still suggest deprecating the problematic url.
If you return a HTTP 410 GONE instead of a 404
and you add cache control headers, you might convince the browser to serve the results from cache instead of actually making the call.
Cache-Control: public, max-age=31536000
This assumes that you use didn't use a cache buster parameter in your polling mechanism of course. If every url is new and unique, caching won't save you.
I would suggest pushing a new version of the website with a changed url for the ajax request. After that you can add a rewrite rule to your .htaccess causing the old ajax url to return a 404 instead of being handled by your PHP application.
This should relieve the pressure.
Good luck!

web page shows 507 insufficient storage once in a while

I have a website up and running. The website worked fine on localhost with no such errors but after I put it online it started showing 507 insufficient storage page whenever two or three users used the same page at the same time.
For example there is a webpage chat.php which runs an ajax request to update the chats every 700 milliseconds. Side by side two ajax requests keep checking for messages and notifications. These ajax requests are completed using javascript's setInterval method. When this page is accessed concurrently by two or more users the page does not load and shows the error and sometimes the page shows 429 too many requests error. So at the same time maximum 4 requests can occur at the user's end and that too if the scripts run at the same time. Could this occur because of the limited entry processes? The hosting provides me with 10 limited entry processes by default. Please reply and leave a comment if you want me to post the setInterval method code even though I think the problem is something else here.
For example there is a webpage chat.php which runs an ajax request to update the chats every 700 milliseconds.
These ajax requests are completed using javascript's setInterval method.
When this page is accessed concurrently by two or more users the page does not load and shows the error and sometimes the page shows 429 too many requests error.
So at the same time maximum 4 requests can occur at the user's end and that too if the scripts run at the same time.
The hosting provides me with 10 limited entry processes by default.
Please take some time to read through (your own) quotes.
You stating that you AJAX the te server every 700ms and you do so by using setInterval. There is a maximum of 4 requests per user and 10 in total. If there is 2 or more visiters stuff goes haywire.
I think multiple things may be causing issues here:
You hit the 10 requests limit because of multiple users.
When 2 users hit 4 requests your at 8, if anythings else does a requests on the server you very quickly hit the maximum of 10. With 3 users with 4 requests your at 12 which according to your question hits your limit.
You might be DOSsing your own servers.
Using setInterval to do AJAX requests is bad. really bad. The problem is that if you request your server every 700ms and the server needs more than those 700ms to respond you'll stacking up requests. You will eventually hit whatever the limit is with just one user. (although in certain cases the browser might protect you).
How to fix
I think 10 connections (if it's actually 10 connections, which is unclear to me) is very low. However you should refactor your code to avoid using setInterval. You should use something like Promise to keep track of when a requests ends before scheduling the new one so you prevent stacks of requests piling up. Keep as a rule of thumb that you should never use setInterval unless you have a very good reason to do so. It's almost always better to use some more intelligent scheduling.
You also might want to look into being more efficient with those requests, can you merge the call to check for messages and notifications?

Prevent recursive calls of XmlHttpRequest to server

I've been googling for hours for this issue, but did not find any solution.
I am currently working on this app, built on Meteor.
Now the scenario is, after the website is opened and all the assets have been loaded in browser, the browser constantly makes recursive xhr calls to server. These calls are made at the regular interval of 25 seconds.
This can be seen in the Network tab of browser console. See the Pending request of the last row in image.
I can't figure out from where it originates, and why it is invoked automatically even when the user is idle.
Now the question is, How can I disable these automatic requests? I want to invoke the requests manually, i.e. when the menu item is selected, etc.
Any help will be appriciated.
[UPDATE]
In response to the Jan Dvorak's comment:
When I type "e" in the search box, the the list of events which has name starting with letter "e" will be displayed.
The request goes with all valid parameters and the Payload like this:
["{\"msg\":\"sub\",\"id\":\"8ef5e419-c422-429a-907e-38b6e669a493\",\"name\":\"event_Coll_Search_by_PromoterName\",\"params\":[\"e\"]}"]
And this is the response, which is valid.
a["{\"msg\":\"data\",\"subs\":[\"8ef5e419-c422-429a-907e-38b6e669a493\"]}"]
The code for this action is posted here
But in the case of automatic recursive requests, the request goes without the payload and the response is just a letter "h", which is strange. Isn't it? How can I get rid of this.?
Meteor has a feature called
Live page updates.
Just write your templates. They automatically update when data in the database changes. No more boilerplate redraw code to write. Supports any templating language.
To support this feature, Meteor needs to do some server-client communication behind the scenes.
Traditionally, HTTP was created to fetch dead data. The client tells the server it needs something, and it gets something. There is no way for the server to tell the client it needs something. Later, it became needed to push some data to the client. Several alternatives came to existence:
polling:
The client makes periodic requests to the server. The server responds with new data or says "no data" immediately. It's easy to implement and doesn't use much resources. However, it's not exactly live. It can be used for a news ticker but it's not exactly good for a chat application.
If you increase the polling frequency, you improve the update rate, but the resource usage grows with the polling frequency, not with the data transfer rate. HTTP requests are not exactly cheap. One request per second from multiple clients at the same time could really hurt the server.
hanging requests:
The client makes a request to the server. If the server has data, it sends them. If the server doesn't have data, it doesn't respond until it does. The changes are picked up immediately, no data is transferred when it doesn't need to be. It does have a few drawbacks, though:
If a web proxy sees that the server is silent, it eventually cuts off the connection. This means that even if there is no data to send, the server needs to send a keep-alive response anyways to make the proxies (and the web browser) happy.
Hanging requests don't use up (much) bandwidth, but they do take up memory. Nowadays' servers can handle multiple concurrent TCP connections, so it's less of an issue than it was before. What does need to be considered is the amount of memory associated with the threads holding on to these requests - especially when the connections are tied to specific threads serving them.
Browsers have hard limits on the number of concurrent requests per domain and in total. Again, this is less of a concern now than it was before. Thus, it seems like a good idea to have one hanging request per session only.
Managing hanging requests feels kinda manual as you have to make a new request after each response. A TCP handshake takes some time as well, but we can live with a 300ms (at worst) refractory period.
Chunked response:
The client creates a hidden iFrame with a source corresponding to the data stream. The server responds with an HTTP response header immediately and leaves the connection open. To send a message, the server wraps it in a pair of <script></script> tags that the browser executes when it receives the closing tag. The upside is that there's no connection reopening but there is more overhead with each message. Moreover, this requires a callback in the global scope that the response calls.
Also, this cannot be used with cross-domain requests as cross-domain iFrame communication presents its own set of problems. The need to trust the server is also a challenge here.
Web Sockets:
These start as a normal HTTP connection but they don't actually follow the HTTP protocol later on. From the programming point of view, things are as simple as they can be. The API is a classic open/callback style on the client side and the server just pushes messages into an open socket. No need to reopen anything after each message.
There still needs to be an open connection, but it's not really an issue here with the browser limits out of the way. The browser knows the connection is going to be open for a while, so it doesn't need to apply the same limits as to normal requests.
These seem like the ideal solution, but there is one major issue: IE<10 doesn't know them. As long as IE8 is alive, web sockets cannot be relied upon. Also, the native Android browser and Opera mini are out as well (ref.).
Still, web sockets seem to be the way to go once IE8 (and IE9) finally dies.
What you see are hanging requests with the timeout of 25 seconds that are used to implement the live update feature. As I already said, the keep-alive message ("h") is used so that the browser doesn't think it's not going to get a response. "h" simply means "nothing happens".
Chrome supports web sockets, so Meteor could have used them with a fallback to long requests, but, frankly, hanging requests are not at all bad once you've got them implemented (sure, the browser connection limit still applies).

Do AJAX applications that use POST requests always fail in Internet Explorer?

I have recently discovered that problems with intermittent failures for users running my application using Internet Explorer is due to a bug in Internet Explorer. The bug is in the HTTP stack, and should be affecting all applications using POST requests from IE. The result is a failure characterized by a request that seems to hang for about 5 minutes (depending on server type and configuration), then fail from the server end. The browser application will error out of the post request after the server gives up. I will explain the IE bug in detail below.
As far as I can tell this will happen with any application using XMLHttpRequest to send POST requests to the server, if the request is sent at the wrong moment. I have written a sample program that attempts to send the POSTS at just those times. It attempts to send continuous POSTs to the server at the precise moment the server closes the connections. The interval is derived from the Keep-Alive header sent by the server.
I am finding that when running from IE to a server with a bit of latency (i.e. not on the same LAN), the problem occurs after only a few POSTs. When it happens, IE locks up so hard that it has to be force closed. The ticking clock is an indication that the browser is still responding.
You can try it by browsing to: http://pubdev.hitech.com/test.post.php. Please take care that you don't have any important unsaved information in any IE session when you run it, because I am finding that it will crash IE.
The full source can be retrieved at: http://pubdev.hitech.com/test.post.php.txt. You can run it on any server that has php and is configured for persistent connections.
My questions are:
What are other people's experiences with this issue?
Is there a known strategy for working around this problem (other than "use another browser")?
Does Microsoft have better information about this issue than the article I found (see below)?
The problem is that web browsers and servers by default use persistent connections as described in RFC 2616 section 8.1 (see http://www.ietf.org/rfc/rfc2616.txt). This is very important for performance--especially for AJAX applications--and should not be disabled. There is however a small timing hole where the browser may start to send a POST on a previously used connection at the same time the server decides the connection is idle and decides to close it. The result is that the browser's HTTP stack will get a socket error because it is using a closed socket. RFC 2616 section 8.1.4 anticipates this situation, and states, "...clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction..."
Internet Explorer does resend the POST when this happens, but when it does it mangles the request. It sends the POST headers, including the Content-Length of the data as posted, but it does not send the data. This is an improper request, and the server will wait an unspecified amount of time for the promised data before failing the request with an error. I have been able to demonstrate this failure 100% of the time using a C program that simulates an HTTP server, which closes the socket of an incoming POST request without sending a response.
Microsoft seems to acknowledge this failure in http://support.microsoft.com/kb/895954. They say that it affects IE versions 6 through 9. That provide a hotfix for this problem, that has shipped with all versions of IE since IE 7. The hotfix does not seem satisfactory for the following reasons:
It is not enabled unless you use regedit to add a key called FEATURE_SKIP_POST_RETRY_ON_INTERNETWRITEFILE_KB895954 to the registry. This is not something I would expect my users to have to do.
The hotfix does not actually fix the broken POST. Instead, if the socket gets closed as anticipated by the RFC, it simply errors out immediately without trying to resent the POST. The application still fails--it just fails sooner.
The following example is a self contained php program that demonstrates the bug. It attempts to send continuous POSTs to the server at the precise moment the server closes the connections. The interval is derived from the Keep-Alive header sent by the server.
We've encountered this problem with IE on a regular basis. There is no good solution. The only solution that is guaranteed to solve the problem is to ensure that the web server keepalive timeout is higher than the browser keepalive timeout (by default with IE this is 60s). Any situation where the web server is set to a lower value can result in IE attempting to reuse the connection and sending a request that gets rejected with a TCP RST because the socket has been closed. If the web server keepalive timeout value is higher than IE's keepalive timeout then IE's reuse of the connections ensure that the socket won't be closed. With high latency connections you'll have to consider the latency time as the time spent in-transit could be an issue.
Keep in mind however, that increasing the keepalive on the server means that an idle connection is using server sockets for that much longer. So you may need to size the server to handle a large number of inactive idle connections. This can be a problem as it may result in a burst of load to the server that the server isn't able to handle.
Another thing to keep in mind. You note that the RFC section 8.1.4 states :"...clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction..."
You forgot a very important part. Here's the full text:
Client software SHOULD reopen the
transport connection and retransmit the aborted sequence of requests
without user interaction so long as the request sequence is
idempotent (see section 9.1.2). Non-idempotent methods or sequences
MUST NOT be automatically retried, although user agents MAY offer a
human operator the choice of retrying the request(s). Confirmation by
user-agent software with semantic understanding of the application
MAY substitute for user confirmation. The automatic retry SHOULD NOT
be repeated if the second sequence of requests fails
An HTTP POST is non-idempotent as defined by 9.1.2. Thus the behavior of the registry hack is actually technically correct per the RFC.
No, generally POST works in IE. It may be an issue, what you are saying,
but it isn't such a major issue to deserve this huge a post.
And when you issue POST ajax request, to make sure every browser inconsistency is covered, just use jquery.
One more thing:
Noone sane will tell you to "use another browser" because IE is widely used and needs to be taken care of (well, except IE6 and for some, maybe even some newer versions)
So, POST has to work in IE, but to make yourself covered for unexpected buggy behavior, use jquery and you can sleep well.
I have never encountered this issue. And our clients mostly runs IE6.
I suspect you've configured your keep-alive timer too long. Most people configure it to be under 1 second because persistent connections are only meant to speed up page loading not service Ajax calls.
If you have keep-alive configured too long you'll face much more severe problems than IE crashing - your server will run out file descriptors to open sockets!*
* note: Incidentally, opening and not closing connections to HTTP servers is a well known DOS attack that tries to force the server to reach its max open socket limit. Which is why most server admins also configure connection timeouts to avoid having sockets open for too long.

Javascript timers & Ajax polling/scheduling

I've been looking for a simpler way than Comet or Long-Polling to push some very basic ajax updates to the browser.
In my research, I've seen that people do in fact use Javascript timers to send Ajax calls at set intervals. Is this a bad approach? It almost seems too easy. Also consider that the updates I'll be sending are not critical data, but they will be monitoring a process that may run for several hours.
As an example - Is it reliable to use this design to send an ajax call every 10 seconds for 3 hours?
Thanks, Brian
Generally, using timers to update content on a page via Ajax is at least as robust as relying on a long-lived stream connection like Comet. Firewalls, short DHCP leases, etc., can all interrupt a persistent connection, but polling will re-establish a client connection on each request.
The trade-off is that polling often requires more resources on the server. Even a handful of clients polling for updates every 10 seconds can put a lot more load on your server than normal interactive users, who are more likely to load new pages only every few minutes, and will spend less time doing so before moving to another site. As one data point, a simple Sinatra/Ajax toy application I wrote last year had 3-5 unique visitors per day to the normal "text" pages, but its Ajax callback URL quickly became the most-requested portion of any site on the server, including several sites with an order of magnitude (or more) higher traffic.
One way to minimize load due to polling is to separate the Ajax callback server code from the general site code, if at all possible, and run it in its own application server process. That "service middleware" service can handle polling callbacks, rather than giving up a server thread/Apache listener/etc. for what effectively amounts to a question of "are we there yet?"
Of course, if you only expect to have a small number (say, under 10) users using the poll service at a time, go ahead and start out running it in the same server process.
I think that one thing that might be useful here is that polling at an unchanging interval is simple, but is often unnecessary or undesirable.
One method that I've been experimenting with lately is having positive and negative feedback on the poll. Essentially, an update is either active (changes happened) or passive (no newer changes were available, so none were needed). Updates that are passive increase the polling interval. Updates that are active set the polling interval back to the baseline value.
So for example, on this chat that I'm working on, different users post messages. The polling interval starts off at the high value of 5 seconds. If other site users are chatting, you get updated every 5 secs about it. If activity slows down, and no-one is chatting since the latest message was displayed, the polling interval gets slower and slower by about a second each time, eventually capping at once every 3 minutes. If, an hour later, someone sends a chat message again, the polling interval suddenly drops back to 5 second updates and starts slowing.
High activity -> frequent polling. Low activity -> eventually very infrequent polling.

Categories

Resources