I have implemented pubnub to create a socket connection for receiving real-time messages.
There is one thing which I noticed in my developer tools is that - the pubnub heartbeat state shows pending for a particular interval, mostly between 4.3-5 min.
After going through their documents, I realised the timeout can be modified and the default value is 320 seconds. After implementing this feature for my website I can notice some lag, I am not sure if it is pubnub who is causing the issue.
Please let me understand the the idea behind the pending state. Also, if it has an impact on memory. If yes, then how is the impact related to increase or decrease in the heartbeat interval ?
FYI, my pubnub settings only consist of publisher key, subscriber key, uuid and ssl (true)
PubNub Subscribe Connection and Long Poll Cycle
You are seeing the heartbeat query param but that is not the "presence heartbeat" API. That is the subscribe long poll connection which will remain open until:
a message is published on one of the channels you are subscribed to
or, if no messages were published on one of the subscribe channel after 280s, the connection is closed (200 response with no messages) and the SDK will open a new subscribe connection.
PENDING Connection
PENDING just means the subscribe connection is Open and waiting for messages to be published. This is expected.
I highly recommend that you do not change this value unless there is a good reason. Did you make it longer or shorter?
Shorter long poll has little value and practically no harm, technically speaking, but will result in more subscribe/edge transactions.
Longer long poll has an actual technical downside in that your client will disconnect after the 280s expiration but will not reconnect until the end of the new custom expiration time that you set for the client.
The only time you should set the value shorter is if you have an ISP that proactively closes "idle" (pending) connections quicker than 280s. This is very rare but it does happen.
And you will likely see that the subscribe connection gets CANCELED. This happens when the client app changes its channel subscription list: subscribe to a new channel or unsubscribe from an existing channel.
No Impact on Memory
But you are asking if there is some sort of impact on memory. The answer to that is - it should NOT have a negative impact. If you follow Nicolas Fodor's answer/advice, you might be able to confirm that but 1000's of customers into this, we have not had any memory issues with our JavaScript SDK related to this. Just be sure you are using the latest version of our SDKs and report any bugs/issues you find to PubNub Support with full details.
Presence Heartbeat
One more thing about the heartbeat query param value - it typically defaults to 300 (seconds) which is only important when you are using PubNub Presence. If the PubNub server doesn't hear from a client within that 300 second (or whatever it is set to) period, a presence timeout event, on behalf of that client, is sent to anyone listening for presence events. A timeout is like a delayed leave event.
See also:
Connection Managment Docs
Detect and Manage Presence Events
Simple way to find out would be to check performances under load testing before and after the parameter change and without changing any other parameter. If a cause is established you can then vary the parameter value to assess the elasticity of the side effect.
Related
a query related to firebase presence app using javascript sdk. I have seen that there is 60 seconds buffer after the internet is disconnected after which the entries are removed from the firebase real tim edatabase. Is it possible to configure this time from 60 seconds to lets say 30 seconds? . I basically want the entries to be removed from the presence as soon as the internet is disconnected. If not immediately then at least sooner than a minute alteast.Does anybody have any idea on this?
There are two ways an onDisconnect handler can be triggered on the Firebase servers:
A clean disconnect is when the client has time to inform the server that it is about to disconnect, and in that case the server executes the onDisconnect handlers immediately.
A dirty disconnect is when the client disconnects without informing the server. In this case the server detects that the client is gone when the socket from that client times out, and then it executes the onDisconnect handlers.
Detecting the dirty disconnects takes some times (upwards to a few minutes) and there's no way for you to configure it.
If you want more granular presence detection, the common approach is to periodically write a timestamp into the database from the client. This indicates when the client was last active, and can be used by the other clients to then indicate the liveliness of that user/client.
I discovered SSE (Server Sent Events) pretty late, but I can't seem to figure out some use cases for it, so that it would be more efficient than using setInterval() and ajax.
I guess, if we'd have to update the data multiple times per second then having one single connection created would produce less overhead. But, except this case, when would one really choose SSE?
I was thinking of this scenario:
A new user comment from the website is added in the database
Server periodically queries DB for changes. If it finds new comment, send notification to client with SSE
Also, this SSE question came into my mind after having to do a simple "live" website change (when someone posts a comment, notify everybody who is on the site). Is there really another way of doing this without periodically querying the database?
Nowadays web technologies are used to implmement all sort of applications, including those which need to fetch constant updates from the server.
As an example, imagine to have a graph in your web page which displays real time data. Your page must refresh the graph any time there is new data to display.
Before Server Sent Events the only way to obtain new data from the server was to perform a new request every time.
Polling
As you pointed out in the question, one way to look for updates is to use setInterval() and an ajax request. With this technique, our client will perform a request once every X seconds, no matter if there is new data or not. This technique is known as polling.
Events
Server Sent Events on the contrary are asynchronous. The server itself will notify to the client when there is new data available.
In the scenario of your example, you would implement SSE such in a way that the server sends an event immediately after adding the new comment, and not by polling the DB.
Comparison
Now the question may be when is it advisable to use polling vs SSE. Aside from compatibility issues (not all browsers support SSE, although there are some polyfills which essentially emulate SSE via polling), you should focus on the frequency and regularity of the updates.
If you are uncertain about the frequency of the updates (how often new data should be available), SSE may be the solution because they avoid all the extra requests that polling would perform.
However, it is wrong to say in general that SSE produce less overhead than polling. That is because SSE requires an open TCP connection to work. This essentially means that some resources on the server (e.g. a worker and a network socket) are allocated to one client until the connection is over. With polling instead, after the request is answered the connection may be reset.
Therefore, I would not recommend to use SSE if the average number of connected clients is high, because this could create some overhead on the server.
In general, I advice to use SSE only if your application requires real time updates. As real life example, I developed a data acquisition software in the past and had to provide a web interface for it. In this case, a lot of graphs were updated every time a new data point was collected. That was a good fit for SSE because the number of connected clients was low (essentially, only one), the user interface should update in real-time, and the server was not flooded with requests as it would be with polling.
Many applications do not require real time updates, and thus it is perfectly acceptable to display the updates with some delay. In this case, polling with a long interval may be viable.
I'm working on node application that would monitor user's online status. It uses socket.io to update online status for users we "observe" (as in, the users we are aware of on the page we're at). What I would like to introduce now is idle status, which would basically mean that after X time of inactivity (as in no request) the status would change from online to idle.
I do monitor all the sockets thus I know when connection was made, so I thought of using this.
My idea is to use setTimeout on every connection for this particular uses (clearing out the previous one if exists) and in setTimeout I would simply change user's status to idle and emit that status change to observers.
What I'm concerned about is performance and scalability of setting and clearning the timeout on every connection. So the question is, are there any issues in terms of the two above with such approach? Is there a better thing of doing it, perhaps a library that is better at handling such things?
I've a web sockets based chat application (HTML5).
Browser opens a socket connection to a java based web sockets server over wss.
When browser connects to server directly (without any proxy) everything works well.
But when the browser is behind an enterprise proxy, browser socket connection closes automatically after approx 2 minutes of no-activity.
Browser console shows "Socket closed".
In my test environment I have a Squid-Dansguardian proxy server.
IMP: this behaviour is not observed if the browser is connected without any proxy.
To keep some activity going, I embedded a simple jquery script which will make an http GET request to another server every 60 sec. But it did not help. I still get "socket closed" in my browser console after about 2 minutes of no action.
Any help or pointers are welcome.
Thanks
This seems to me to be a feature, not a bug.
In production applications there is an issue related with what is known as "half-open" sockets - see this great blog post about it.
It happens that connections are lost abruptly, causing the TCP/IP connection to drop without informing the other party to the connection. This can happen for many different reasons - wifi signals or cellular signals are lost, routers crash, modems disconnect, batteries die, power outages...
The only way to detect if the socket is actually open is to try and send data... BUT, your proxy might not be able to safely send data without interfering with your application's logic*.
After two minutes, your Proxy assume that the connection was lost and closes the socket on it's end to save resources and allow new connections to be established.
If your proxy didn't take this precaution, on a long enough timeline all your available resources would be taken by dropped connections that would never close, preventing access to your application.
Two minutes is a lot. On Heroku they set the proxy for 50 seconds (more reasonable). For Http connections, these timeouts are often much shorter.
The best option for you is to keep sending websocket data within the 2 minute timeframe.
The Websocket protocol resolves this issue by implementing an internal ping mechanism - use it. These pings should be sent by the server and the browser responds to them with a pong directly (without involving the javascript application).
The Javascript API (at least on the browser) doesn't let you send ping frames (it's a security thing I guess, that prevents people from using browsers for DoS attacks).
A common practice by some developers (which I think to be misconstructed) is to implement a JSON ping message that is either ignored by the server or results in a JSON pong.
Since you are using Java on the server, you have access to the Ping mechanism and I suggest you implement it.
I would also recommend (if you have control of the Proxy) that you lower the timeout to a more reasonable 50 seconds limit.
* The situation during production is actually even worse...
Because there is a long chain of intermediaries (home router/modem, NAT, ISP, Gateways, Routers, Load Balancers, Proxies...) it's very likely that your application can send data successfully because it's still "connected" to one of the intermediaries.
This should start a chain reaction that will only reach the application after a while, and again ONLY if it attempts to send data.
This is why Ping frames expect Pong frames to be returned (meaning the chain of connection is intact.
P.S.
You should probably also complain about the Java application not closing the connection after a certain timeout. During production, this oversight might force you to restart your server every so often or experience a DoS situation (all available file handles will be used for the inactive old connections and you won't have room for new connections).
check the squid.conf for a request_timeout value. You can change this via the request_timeout. This will affect more than just web sockets. For instance, in an environment I frequently work in, a perl script is hit to generate various configurations. Execution can take upwards of 5-10 minutes to complete. The timeout value on both our httpd and the squid server had to be raised to compensate for this.
Also, look at the connect_timeout value as well. That's defaulted to one minute..
I was thinking about extending the functionality of node.js server running Socket.io which I am currently using to allow a client (iOS app) to communicate with a server, so that it could have persistent session data between connection.
On initial connection the server passes the client a session id which the client will store and pass to the server later on if it reconnects after disconnecting, this would allow the client to resume its session without having to re-provide the server with certain information about its current state (obviously when it comes to actual implementation it will be more secure than this).
I want to make it so that the session eventually expires, so it has a max lifetime or if it hasn't been continued after a certain time it times-out. To do this I was thinking of using timers for each session. Im not actually sure how node.js or javascript timers (setTimeout) work in the background and am concerned that having 1000s of session timers could lead to a lot of memory/cpu usage. Could this be a potential issue, should I have a garbage collector that cycles every minute or so and deletes expired session data? What is the kind of most optimal way in terms of least impact on performance method I can do to accomplish this, or are timers already exactly that?
They are used frequently for timeouts, and are very efficient in cpu.
// ten_thousand_timeouts.js
for (var i=0;i<=10000;i++) {
(function(i){
setTimeout(function(){
console.log(i);
},1000)
})(i)
}
With 10,000 the results of logs only took .336 seconds and the act of logging it to the console took most of that time.
//bash script
$> time node ten_thousand_timeouts.js
1
...
9999
10000
real 0m1.336s
user 0m0.275s
sys 0m0.146s
I cannot imagine this being an issue for your use case.