Socket.IO detection of client's browser sometimes failing - javascript

I am working on a real time tracking application that uses Node.js and Socket.IO. In my tracking code that goes on a target site I have some code that grabs the user agent string of the browser and sends it back to the server. This USUALLY works fine but there are a few times where this data is set to undefined (this is where it happens).
For now, I just have a huge try/catch block on the server so it doesn't crash when running a method I've defined to detect what browser it is (it crashes when it tries to run the match() method). I'm assuming this is happening either from bots or from some other browser that has no user agent or has been tampered with. Am I wrong on that? Are there other reasons?
Does Socket.IO provide anything for browser detection? Either way I know I need to make the browser detection feature more robust but I'm just getting this project off the ground.
If there's no better way to do this, am I better off just checking to see if the data that was sent to the server is undefined and consider it as an "Other" browser?
See the difference in total connections and total browser numbers? At the moment, there's a difference of a little over 100. If this browser tracking issue wasn't happening the numbers should be exactly the same (because EVERY connection would have a browser, resolution, operating system, and a URL).

I'm not very familiar with socket.io, but it looks like you can get the request headers in socket.handshake.headers (Socket.io's Wiki – Authorizing). I don't know which browsers run JavaScript but don't have navigator.userAgent set, but maybe they'll send the User-Agent HTTP Header?

This actually had nothing to do with bots or tampered data like I thought it did. My problem was due to a rare race condition where the client would connect to the server but disconnect before sending the data to the server (the "beacon" event I have set up is where the client sends the data and the server receives it). So when the client went to disconnect, my server would look up the client but would return a undefined result because the data was never sent and stored in the first place. A rewarding experience but what a pain!

Related

Does server sent events is "Busy Wait"?

I'm building a web site using flask and I wish to do a Push to the client. I've followed real-time-events-python and I was able to create the website.
One thing that I've noticed is that when accessing the Javascript console, there is a GET every 500ms, so I'm wondering if the EventSource of javascript actually sends a GET to the server periodically to see if there are any updates, causing it to be a Busy Wait.
For information, I'm using Flask (python framework) for developing the website and chrome to access it.
Server Sent Events specification
According to the link you provided, yes, the browser sends GETs as an implementation for server-sent events:
The actual protocol for Server-Sent Events is very simple. The client
will open a standard connection to the server and make a GET request.
It expects the server to hold open the socket and send new events by
prefixing them with data: and terminating with two newline characters.
So, on the server side, the connection is supposed to remain open, while data is still being streamed through it. Keep in mind that Server-Sent Events allows for automatic reconnection, so if you are experiencing a lot of reconnects (which I imagine is what all those gets are, unless your client-side code is not written correctly), you should check to make sure that your server side is not closing the connection, which causes the browser to reopen the connection.
As for the "busy wait", if I'm understanding you correctly, you don't need to worry about this. This is handled by the browser, so your code doesn't block while waiting for something.

Prevent recursive calls of XmlHttpRequest to server

I've been googling for hours for this issue, but did not find any solution.
I am currently working on this app, built on Meteor.
Now the scenario is, after the website is opened and all the assets have been loaded in browser, the browser constantly makes recursive xhr calls to server. These calls are made at the regular interval of 25 seconds.
This can be seen in the Network tab of browser console. See the Pending request of the last row in image.
I can't figure out from where it originates, and why it is invoked automatically even when the user is idle.
Now the question is, How can I disable these automatic requests? I want to invoke the requests manually, i.e. when the menu item is selected, etc.
Any help will be appriciated.
[UPDATE]
In response to the Jan Dvorak's comment:
When I type "e" in the search box, the the list of events which has name starting with letter "e" will be displayed.
The request goes with all valid parameters and the Payload like this:
["{\"msg\":\"sub\",\"id\":\"8ef5e419-c422-429a-907e-38b6e669a493\",\"name\":\"event_Coll_Search_by_PromoterName\",\"params\":[\"e\"]}"]
And this is the response, which is valid.
a["{\"msg\":\"data\",\"subs\":[\"8ef5e419-c422-429a-907e-38b6e669a493\"]}"]
The code for this action is posted here
But in the case of automatic recursive requests, the request goes without the payload and the response is just a letter "h", which is strange. Isn't it? How can I get rid of this.?
Meteor has a feature called
Live page updates.
Just write your templates. They automatically update when data in the database changes. No more boilerplate redraw code to write. Supports any templating language.
To support this feature, Meteor needs to do some server-client communication behind the scenes.
Traditionally, HTTP was created to fetch dead data. The client tells the server it needs something, and it gets something. There is no way for the server to tell the client it needs something. Later, it became needed to push some data to the client. Several alternatives came to existence:
polling:
The client makes periodic requests to the server. The server responds with new data or says "no data" immediately. It's easy to implement and doesn't use much resources. However, it's not exactly live. It can be used for a news ticker but it's not exactly good for a chat application.
If you increase the polling frequency, you improve the update rate, but the resource usage grows with the polling frequency, not with the data transfer rate. HTTP requests are not exactly cheap. One request per second from multiple clients at the same time could really hurt the server.
hanging requests:
The client makes a request to the server. If the server has data, it sends them. If the server doesn't have data, it doesn't respond until it does. The changes are picked up immediately, no data is transferred when it doesn't need to be. It does have a few drawbacks, though:
If a web proxy sees that the server is silent, it eventually cuts off the connection. This means that even if there is no data to send, the server needs to send a keep-alive response anyways to make the proxies (and the web browser) happy.
Hanging requests don't use up (much) bandwidth, but they do take up memory. Nowadays' servers can handle multiple concurrent TCP connections, so it's less of an issue than it was before. What does need to be considered is the amount of memory associated with the threads holding on to these requests - especially when the connections are tied to specific threads serving them.
Browsers have hard limits on the number of concurrent requests per domain and in total. Again, this is less of a concern now than it was before. Thus, it seems like a good idea to have one hanging request per session only.
Managing hanging requests feels kinda manual as you have to make a new request after each response. A TCP handshake takes some time as well, but we can live with a 300ms (at worst) refractory period.
Chunked response:
The client creates a hidden iFrame with a source corresponding to the data stream. The server responds with an HTTP response header immediately and leaves the connection open. To send a message, the server wraps it in a pair of <script></script> tags that the browser executes when it receives the closing tag. The upside is that there's no connection reopening but there is more overhead with each message. Moreover, this requires a callback in the global scope that the response calls.
Also, this cannot be used with cross-domain requests as cross-domain iFrame communication presents its own set of problems. The need to trust the server is also a challenge here.
Web Sockets:
These start as a normal HTTP connection but they don't actually follow the HTTP protocol later on. From the programming point of view, things are as simple as they can be. The API is a classic open/callback style on the client side and the server just pushes messages into an open socket. No need to reopen anything after each message.
There still needs to be an open connection, but it's not really an issue here with the browser limits out of the way. The browser knows the connection is going to be open for a while, so it doesn't need to apply the same limits as to normal requests.
These seem like the ideal solution, but there is one major issue: IE<10 doesn't know them. As long as IE8 is alive, web sockets cannot be relied upon. Also, the native Android browser and Opera mini are out as well (ref.).
Still, web sockets seem to be the way to go once IE8 (and IE9) finally dies.
What you see are hanging requests with the timeout of 25 seconds that are used to implement the live update feature. As I already said, the keep-alive message ("h") is used so that the browser doesn't think it's not going to get a response. "h" simply means "nothing happens".
Chrome supports web sockets, so Meteor could have used them with a fallback to long requests, but, frankly, hanging requests are not at all bad once you've got them implemented (sure, the browser connection limit still applies).

Do AJAX applications that use POST requests always fail in Internet Explorer?

I have recently discovered that problems with intermittent failures for users running my application using Internet Explorer is due to a bug in Internet Explorer. The bug is in the HTTP stack, and should be affecting all applications using POST requests from IE. The result is a failure characterized by a request that seems to hang for about 5 minutes (depending on server type and configuration), then fail from the server end. The browser application will error out of the post request after the server gives up. I will explain the IE bug in detail below.
As far as I can tell this will happen with any application using XMLHttpRequest to send POST requests to the server, if the request is sent at the wrong moment. I have written a sample program that attempts to send the POSTS at just those times. It attempts to send continuous POSTs to the server at the precise moment the server closes the connections. The interval is derived from the Keep-Alive header sent by the server.
I am finding that when running from IE to a server with a bit of latency (i.e. not on the same LAN), the problem occurs after only a few POSTs. When it happens, IE locks up so hard that it has to be force closed. The ticking clock is an indication that the browser is still responding.
You can try it by browsing to: http://pubdev.hitech.com/test.post.php. Please take care that you don't have any important unsaved information in any IE session when you run it, because I am finding that it will crash IE.
The full source can be retrieved at: http://pubdev.hitech.com/test.post.php.txt. You can run it on any server that has php and is configured for persistent connections.
My questions are:
What are other people's experiences with this issue?
Is there a known strategy for working around this problem (other than "use another browser")?
Does Microsoft have better information about this issue than the article I found (see below)?
The problem is that web browsers and servers by default use persistent connections as described in RFC 2616 section 8.1 (see http://www.ietf.org/rfc/rfc2616.txt). This is very important for performance--especially for AJAX applications--and should not be disabled. There is however a small timing hole where the browser may start to send a POST on a previously used connection at the same time the server decides the connection is idle and decides to close it. The result is that the browser's HTTP stack will get a socket error because it is using a closed socket. RFC 2616 section 8.1.4 anticipates this situation, and states, "...clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction..."
Internet Explorer does resend the POST when this happens, but when it does it mangles the request. It sends the POST headers, including the Content-Length of the data as posted, but it does not send the data. This is an improper request, and the server will wait an unspecified amount of time for the promised data before failing the request with an error. I have been able to demonstrate this failure 100% of the time using a C program that simulates an HTTP server, which closes the socket of an incoming POST request without sending a response.
Microsoft seems to acknowledge this failure in http://support.microsoft.com/kb/895954. They say that it affects IE versions 6 through 9. That provide a hotfix for this problem, that has shipped with all versions of IE since IE 7. The hotfix does not seem satisfactory for the following reasons:
It is not enabled unless you use regedit to add a key called FEATURE_SKIP_POST_RETRY_ON_INTERNETWRITEFILE_KB895954 to the registry. This is not something I would expect my users to have to do.
The hotfix does not actually fix the broken POST. Instead, if the socket gets closed as anticipated by the RFC, it simply errors out immediately without trying to resent the POST. The application still fails--it just fails sooner.
The following example is a self contained php program that demonstrates the bug. It attempts to send continuous POSTs to the server at the precise moment the server closes the connections. The interval is derived from the Keep-Alive header sent by the server.
We've encountered this problem with IE on a regular basis. There is no good solution. The only solution that is guaranteed to solve the problem is to ensure that the web server keepalive timeout is higher than the browser keepalive timeout (by default with IE this is 60s). Any situation where the web server is set to a lower value can result in IE attempting to reuse the connection and sending a request that gets rejected with a TCP RST because the socket has been closed. If the web server keepalive timeout value is higher than IE's keepalive timeout then IE's reuse of the connections ensure that the socket won't be closed. With high latency connections you'll have to consider the latency time as the time spent in-transit could be an issue.
Keep in mind however, that increasing the keepalive on the server means that an idle connection is using server sockets for that much longer. So you may need to size the server to handle a large number of inactive idle connections. This can be a problem as it may result in a burst of load to the server that the server isn't able to handle.
Another thing to keep in mind. You note that the RFC section 8.1.4 states :"...clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction..."
You forgot a very important part. Here's the full text:
Client software SHOULD reopen the
transport connection and retransmit the aborted sequence of requests
without user interaction so long as the request sequence is
idempotent (see section 9.1.2). Non-idempotent methods or sequences
MUST NOT be automatically retried, although user agents MAY offer a
human operator the choice of retrying the request(s). Confirmation by
user-agent software with semantic understanding of the application
MAY substitute for user confirmation. The automatic retry SHOULD NOT
be repeated if the second sequence of requests fails
An HTTP POST is non-idempotent as defined by 9.1.2. Thus the behavior of the registry hack is actually technically correct per the RFC.
No, generally POST works in IE. It may be an issue, what you are saying,
but it isn't such a major issue to deserve this huge a post.
And when you issue POST ajax request, to make sure every browser inconsistency is covered, just use jquery.
One more thing:
Noone sane will tell you to "use another browser" because IE is widely used and needs to be taken care of (well, except IE6 and for some, maybe even some newer versions)
So, POST has to work in IE, but to make yourself covered for unexpected buggy behavior, use jquery and you can sleep well.
I have never encountered this issue. And our clients mostly runs IE6.
I suspect you've configured your keep-alive timer too long. Most people configure it to be under 1 second because persistent connections are only meant to speed up page loading not service Ajax calls.
If you have keep-alive configured too long you'll face much more severe problems than IE crashing - your server will run out file descriptors to open sockets!*
* note: Incidentally, opening and not closing connections to HTTP servers is a well known DOS attack that tries to force the server to reach its max open socket limit. Which is why most server admins also configure connection timeouts to avoid having sockets open for too long.

How exactly does Server-Sent Events work?

I am trying to get into the web push technology so I started looking around. I have basically found 2 technologies, that is Websockets and SSE. After ruling out Websockets because of lack of perl support, I wanted to try out the more native SSE-approach.
Now, trying to get SSE to work is a real pain in the arse. Every documentation has conflicting information and there does not seem to be a general consensus on how SSE works. Some say hat you need an <event-listen src="events.pm"> tag, others say you only need an EventSource JS object. Even with the EventSource object, I found around 4 possible implementations and none of them seem to work.
Here is what I have. I have an events.pm, which uses mod-perl. If you call that file, it returns data: I haz a websocket. That is sent with the content-type application/x-dom-event-stream.
The HTML and JS files have been rewritten so often with different implementations that I have given up on them. Can you guys please give me a working example?
Also: I do not understand how you can send specific messages to the client. Sending a predefined message seems to be fine. However, if I imagine a situation where someone sends me a message, I do not understand how exactly that information ('there is a new message for you') is transmitted to that exact browser that needs that information. Every post I found on this is vague at best.
EDIT
Basically, what I need is a way to say 'hey, are you allowed to get this notification? show me your id/session/token first!' on a per connected client basis. I wonder if it is at all possible with SSE.
I would say that Server-sent events are described pretty good in Stream Updates with Server-Sent Events article, namely the Event Stream Format. With Firefox 6 SSEs are now supported by majority of modern browsers (unfortunately except IE).
I do not understand how you can send specific messages to the client.
Sending a predefined message seems to be fine. However, if I imagine a
situation where someone sends me a message, I do not understand how
exactly that information ('there is a new message for you') is
transmitted to that exact browser that needs that information.
Clients are connected with the SSE server by keep-alive event stream, uni-directional connection which isn't closed until browser/tab is closed, user invokes close method or network connection is lost.
Basically, what I need is a way to say 'hey, are you allowed to get
this notification? show me your id/session/token first!' on a per
connected client basis. I wonder if it is at all possible with SSE.
This kind of authentication logic would be required to do on the server side, so it's dependent on the language or platform you are using. Unless there is library for that in your environment you would probably have to write it.
The spec has changed/evolved. Current version uses EventSource object. The HTML element is gone.
The current MIME type is text/event-stream. The x-dom-event-stream was used only in early implementations (Opera in 2006).
To identify/authenticate clients you can either use cookies or connect to an event stream URL that includes some kind of authentication token (session ID), and then you'll know that a particular TCP/IP connection is associated with that user.
http://html5doctor.com/server-sent-events/
Adding to #porneL's answer; you can tell EventSource to include cookies to the request it sends. Just create the EventSource object like this:
var evtsrc = new EventSource('./url_of/event_stream/',{withCredentials:true});
I tested this with latest version of chrome and firefox.
Source: http://www.sitepoint.com/server-sent-events/
edit: apparently this is not necessary / useless. See #Tamlyn's comment.

Ajax Security

We have a heavy Ajax dependent application. What are the good ways of making it sure that the request to server side scripts are not coming through standalone programs and are through an actual user sitting on a browser
There aren't any really.
Any request sent through a browser can be faked up by standalone programs.
At the end of the day does it really matter? If you're worried then make sure requests are authenticated and authorised and your authentication process is good (remember Ajax sends browser cookies - so your "normal" authentication will work just fine). Just remember that, of course, standalone programs can authenticate too.
What are the good ways of making it sure that the request to server side scripts are not coming through standalone programs and are through an actual user sitting on a browser
There are no ways. A browser is indistinguishable from a standalone program; a browser can be automated.
You can't trust any input from the client side. If you are relying on client-side co-operation for any security purpose, you're doomed.
There isn't a way to automatically block "non browser user" requests hitting your server side scripts, but there are ways to identify which scripts have been triggered by your application and which haven't.
This is usually done using something called "crumbs". The basic idea is that the page making the AJAX request should generate (server side) a unique token (which is typically a hash of unix timestamp + salt + secret). This token and timestamp should be passed as parameters to the AJAX request. The AJAX handler script will first check this token (and the validity of the unix timestamp e.g. if it falls within 5 minutes of the token timestamp). If the token checks out, you can then proceed to fulfill this request. Usually, this token generation + checking can be coded up as an Apache module so that it is triggered automatically and is separate from the application logic.
Fraudulent scripts won't be able to generate valid tokens (unless they figure out your algorithm) and so you can safely ignore them.
Keep in mind that storing a token in the session is also another way, but that won't buy any more security than your site's authentication system.
I'm not sure what you are worried about. From where I sit I can see three things your question can be related to:
First, you may want to prevent unauthorized users from making a valid request. This is resolve by using the browser's cookie to store a session ID. The session ID needs to tied to the user, be regenerated every time the user goes through the login process and must have an inactivity timeout. Anybody request coming in without a valid session ID you simply reject.
Second, you may want to prevent a third party from doing a replay attacks against your site (i.e. sniffing an inocent user's traffic and then sending the same calls over). The easy solution is to go over https for this. The SSL layer will prevent somebody from replaying any part of the traffic. This comes at a cost on the server side so you want to make sure that you really cannot take that risk.
Third, you may want to prevent somebody from using your API (that's what AJAX calls are in the end) to implement his own client to your site. For this there is very little you can do. You can always look for the appropriate User-Agent but that's easy to fake and will be probably the first thing somebody trying to use your API will think of. You can always implement some statistics, for example looking at the average AJAX requests per minute on a per user basis and see if some user are way above your average. It's hard to implement and it's only usefull if you are trying to prevent automated clients reacting faster than human can.
Is Safari a webbrowser for you?
If it is, the same engine you got in many applications, just to say those using QT QWebKit libraries. So I would say, no way to recognize it.
User can forge any request one wants - faking the headers like UserAgent any they like...
One question: why would you want to do what you ask for? What's the diffrence for you if they request from browser or from anythning else?
Can't think of one reason you'd call "security" here.
If you still want to do this, for whatever reason, think about making your own application, with a browser embedded. It could somehow authenticate to the application in every request - then you'd only send a valid responses to your application's browser.
User would still be able to reverse engineer the application though.
Interesting question.
What about browsers embedded in applications? Would you mind those?
You can probably think of a way of "proving" that a request comes from a browser, but it will ultimately be heuristic. The line between browser and application is blurry (e.g. embedded browser) and you'd always run the risk of rejecting users from unexpected browsers (or unexpected versions thereof).
As been mentioned before there is no way of accomplishing this... But there is a thing to note, useful for preventing against CSRF attacks that target the specific AJAX functionality; like setting a custom header with help of the AJAX object, and verifying that header on the server side.
And if in the value of that header, you set a random (one time use) token you can prevent automated attacks.

Categories

Resources