Real-time pubsub chat with history via websockets

Real-time pubsub chat with history via websockets - javascript

I'm interested in creating what Disqus have done with their commenting system: http://highscalability.com/blog/2014/5/7/update-on-disqus-its-still-about-realtime-but-go-demolishes.html
The most impressive part of infrastructure is Nginx push stream module:
Still runs on 5 machines Nginx machines.
Uses NginxPushStream, which supprts EventSource, WebSocket, Long
Polling, and Forever Iframe.
All users are connected to these machines.
On a normal day each machine sees 3200 connections/s, 1 million
connections, 150K packets/s TX and 130K packets/s RX, 150 mbits/s TX
and 80 mbits/s RC, with <15ms delay end-to-end (which is faster than
Javascript can render a comment)
Had many issues with resource exhaustion at first. The configuration
for Nginx and the OS are given that help alleviate the problems,
tuning them to handle a scenario with many connections moving little
data.
Obviously this Nginx module doesn't support data storage. Only in-memory mechanism supported by push_stream_store_messages directive, but as author said:
The main target to stored messages is to deliver the message to a
subscriber that was offline when the message was published.
It is clear that Disqus don't publish messages to Nginx directly but rather though the Go backend that manages to store messages in Redis followed by publishing via internal POST to Nginx which keeps subscribers.
Does anyone have an experience with fetching messages history through Redis on a page load before even we use push stream module for newer messages? Do you have to push old messages history in one go or render as plain HTML followed by pubsub for new messages that appear after page is loaded?
The logic needs to be decoupled as much as possible. I don't intend to introduce blocking mechanism between user and Nginx for real time message communication. Would that be a good solution below?
Client pushes the message from the web page (via websockets)
Ajax request goes straight to push stream location and on Ajax complete callback it requests backend to store message in Redis (in the other direction it gets locked with backend)
Once user refreshes the page backend fetches Redis list and display the history
User can see the history and can post new messages
It requires only 2 backend requests to be developed: accept message and store in Redis, fetch the data and display on page load. Preferably light non-blocking backend required, like Lua module or even HTTP interface to Redis called Webdis.
I'd like to know smart people opinion on that mechanism from architecture point of view, no code example expected.

Related

What is the Best Approach for Notifying a Next Js Frontend about a Completed Process from a Backend?

Situation:
I'm building a Next Js frontend that communicates with a Spring Boot backend through a Next Js API (BFF). The backend performs a lengthy process (5 minutes) and returns status code 202 to indicate that the process has been accepted and is in progress asynchronously. Now, I need to notify the Next Js API and frontend client when the process is completed. In this case, what is the best approach for this notification?
Possibilities that I found:
Firebase Cloud Message
WebSocket
Server-Sent Events (SSE)
Short/Long Pooling
What would be the best approach?
Due to the fact that cloud providers usually have short timeouts and costs are based on time of each operation, I wondered what the best approach would be to this scenario.
Analysing each possibility:
Please feel free to correct any wrong analyses I made when comparing. And feel free to add more possibilities.
Firebase Cloud Message:
Firebase Cloud Message seems like a good option as it will only notify the client when there's a message to deliver, and there won't be any problems with timeouts or costs.
However, it's usually used for push notifications, and the delivery of messages is not guaranteed, making it a less suitable option for this scenario.
Another positive thing about Cloud Message is that it is not necessary for the BFF to receive the message, because the Cloud Message will not tell the origin of the publisher.
WebSocket:
In this approach, the Next Js API establishes a persistent connection with the Spring Boot backend using WebSockets. The backend can then send a message to the API to notify it when the process is finished.
The biggest issue with this approach is the need to open a WebSocket connection with each client on the Next Js API, which is not so good for scalability as the first option.
Additionally, you have a bigger problem here, popular Next Js hosting providers like Vercel do not support WebSockets as they are stateless and have a maximum execution duration, making it impossible to maintain a WebSocket connection.
PS: I'm using Vercel, so it's not an available option.
Server-Sent Events (SSE):
In this approach, the Next Js API establishes a persistent connection with the Spring Boot backend using SSE. The backend can then send an event to the API to notify it when the process is finished.
However, this approach relies on the same problem as WebSocket, as the Next JS API will have to maintain a connection open for each client.
Also, some hosting providers may not support SSE for the same reason as WebSocket.
Short/Long Pooling
In this approach, the Next Js frontend periodically sends a request to the Next Js API (BFF) to check the status of the process in the Spring Boot backend.
This is the least performant and cost-effective option as each request counts as a separate request and is charged by the cloud provider. And as an example of a performance problem, if you try to put a pooling of 60 seconds in an effort to avoid too many charges, but the application finishes the process in 63 seconds, you will have to wait 120 seconds for the result instead of 63 seconds.
Looking at the options, however, it seems to be the only viable option, because it will ensure delivery and is supported by all cloud providers.
Questions
Are there any other possibilities for this scenario that I may have missed?
What is the best approach for this scenario and why?

Socket.io and Node.Js multiple servers

I'm new to Web Sockets in general, but get the main concept.
I am trying to build a simple multiplayer game and would like to have a server selection where I can run sockets on multiple IPs and it will connect the client through that, to mitigate connections in order to improve performance, this is hypothetical in the case of there being thousands of players at once, but would like some insight into how this would work and if there are any resources I can use to integrate this before hand, in order to prevent extra work at a later date. Is this at all possible, as I understand it Node.Js runs on a server and uses the Socket.io dependencies to create sockets within that, so I can't think of a possible solution to route it through another server unless I had multiple sites running it separately.

The first question I have is this:
Are you hosting on AWS or in a local datacenter?
The reason I ask is because SOCKET.io requires sticky sessions to work properly across multiple servers. Due to the fact that SOCKET.io will attempt to upgrade each connection, and because that upgrade request must reach the original server that authorized the session, you'll need to route websocket (TCP) connections back to that original server via sticky sessions. Unfortunately AWS makes this extremely tricky and will require you to learn how to:
A) Modify elastic load balancer policies to forward protocol information
B) Split apart TCP connections from standard web requests using something like HA PROXY or NGINX. This is necessary in order to handle web socket UPGRADE requests properly, as you will be setting TCP to sticky and web requests to round-robin.
C) Attach your socket.io configuration to a common storage source, like Redis (elasticache).
Once you've figured out what's needed for AWS (or if you've got full control over request routing at your local datacenter), you'll want to architect your SOCKET application to use multicast rooms rather than direct socket messaging.
Example:
To send a message to users in game #4444, emit a message to room 'games:4444', rather than direct to the user's socket.
If your socket instance is configured using REDIS, REDIS will automatically take care of maintaining lists of people who are connected to your 'games:4444' channel. Otherwise you'll need to maintain the list yourself using a database or other shared mechanism.
Other than that, there are plenty of resources online that can help you figure out each step along the way. I'd start with understanding something like HA PROXY and how it can help split apart your SOCKETS from your web requests.

web socket connection closed when behind proxy

I've a web sockets based chat application (HTML5).
Browser opens a socket connection to a java based web sockets server over wss.
When browser connects to server directly (without any proxy) everything works well.
But when the browser is behind an enterprise proxy, browser socket connection closes automatically after approx 2 minutes of no-activity.
Browser console shows "Socket closed".
In my test environment I have a Squid-Dansguardian proxy server.
IMP: this behaviour is not observed if the browser is connected without any proxy.
To keep some activity going, I embedded a simple jquery script which will make an http GET request to another server every 60 sec. But it did not help. I still get "socket closed" in my browser console after about 2 minutes of no action.
Any help or pointers are welcome.
Thanks

This seems to me to be a feature, not a bug.
In production applications there is an issue related with what is known as "half-open" sockets - see this great blog post about it.
It happens that connections are lost abruptly, causing the TCP/IP connection to drop without informing the other party to the connection. This can happen for many different reasons - wifi signals or cellular signals are lost, routers crash, modems disconnect, batteries die, power outages...
The only way to detect if the socket is actually open is to try and send data... BUT, your proxy might not be able to safely send data without interfering with your application's logic*.
After two minutes, your Proxy assume that the connection was lost and closes the socket on it's end to save resources and allow new connections to be established.
If your proxy didn't take this precaution, on a long enough timeline all your available resources would be taken by dropped connections that would never close, preventing access to your application.
Two minutes is a lot. On Heroku they set the proxy for 50 seconds (more reasonable). For Http connections, these timeouts are often much shorter.
The best option for you is to keep sending websocket data within the 2 minute timeframe.
The Websocket protocol resolves this issue by implementing an internal ping mechanism - use it. These pings should be sent by the server and the browser responds to them with a pong directly (without involving the javascript application).
The Javascript API (at least on the browser) doesn't let you send ping frames (it's a security thing I guess, that prevents people from using browsers for DoS attacks).
A common practice by some developers (which I think to be misconstructed) is to implement a JSON ping message that is either ignored by the server or results in a JSON pong.
Since you are using Java on the server, you have access to the Ping mechanism and I suggest you implement it.
I would also recommend (if you have control of the Proxy) that you lower the timeout to a more reasonable 50 seconds limit.
* The situation during production is actually even worse...
Because there is a long chain of intermediaries (home router/modem, NAT, ISP, Gateways, Routers, Load Balancers, Proxies...) it's very likely that your application can send data successfully because it's still "connected" to one of the intermediaries.
This should start a chain reaction that will only reach the application after a while, and again ONLY if it attempts to send data.
This is why Ping frames expect Pong frames to be returned (meaning the chain of connection is intact.
P.S.
You should probably also complain about the Java application not closing the connection after a certain timeout. During production, this oversight might force you to restart your server every so often or experience a DoS situation (all available file handles will be used for the inactive old connections and you won't have room for new connections).

check the squid.conf for a request_timeout value. You can change this via the request_timeout. This will affect more than just web sockets. For instance, in an environment I frequently work in, a perl script is hit to generate various configurations. Execution can take upwards of 5-10 minutes to complete. The timeout value on both our httpd and the squid server had to be raised to compensate for this.
Also, look at the connect_timeout value as well. That's defaulted to one minute..

Advice on which technology to use for real time notifications

I have X amount of activity sensors connected to a server that inserts data to a database everytime a sensor is triggered. What I'm trying to do is create a web interface with a blue print of the facility (svg) and whenever a sensor is triggered, besides the db insert, I want it to show some sort of alert in my blue print. For that I need to keep an open connection to the server I think.
I was thinking of using web sockets, but it might be overkill since I only need to retrieve data from the server. But running an ajax call every second doesn't sound very efficient either. Are there any other alternatives?
Thank you

Some potential choices include:
WebSocket
Adobe® Flash® Socket
AJAX long polling
AJAX multipart streaming
Forever Iframe
JSONP Polling
Which actual transport you end up using will depend on the your requirements for browser support and what technology you are using on the server to handle these requests. The transport choice may also depend on your network topology - what types of load balancers you need to integrate with, proxies, etc.
There are many libraries available on both the client and server sides, many of which support more than one of these transports.
For example (not an exhaustive list):
socket.io for nodejs
WebSocket
Adobe® Flash® Socket
AJAX long polling
AJAX multipart streaming
Forever Iframe
JSONP Polling
SignalR for an asp/.net backend
WebSockets
Server-Sent Events
ForeverFrame
Long Polling
Atmosphere for a java backend
WebSockets
Server Side Events (SSE)
Long-Polling
Forever frame
JSONP
IMO - Websockets is NOT overkill for this type of problem and would lend itself nicely to this type of application.

Without specifically discussing frameworks or knowing what is running in the backend of your server(s), we have a few options to consider for the frontend:
Websockets
Websockets are designed for bidirectional communication, although it is kind of shocking how many users are surfing the web in a browser that doesn't support websockets. I always recommend a fallback for this, such as the other methods listed below.
SSE
SSE is an HTML5 spec and is still shaky at best. Try scrolling on a page while when an SSE event fires... It may be a little easier on the backend, put it sometimes hangs on the client side since it runs inside the same thread that the DOM is running in.
Long Polling
Keeps your connection open. It doesn't scale well with PHP, but performs swimmingly with Python+Twisted on the backend, or Node.Js
Good Old Ajax
Keep your requests small, and you still have a scalable solution. Yes, a full GET request is the most expensive, but is supported in just about every browser rolled out the past ten years. It is also worth noting that GET requests are easy to scale horizontally with more hardware.
In a perfect world:
You would break up your application into a few components, operating behind a reverse proxy such as Nginx. Then use Node.Js + Socket.IO handle the realtime aspects of your app.
Another option would be to use small Ajax requests, and offer websocket support for the browsers that support it. This is advice specifically for PHP in the backend.

WebSocket is certainly not overkill. On the contrary. With websockets, you have a bi-directional communication channel; this means, that the server can initiate communication whenever it seems fit (e.g. when sensor data changes).
In a previous project, I have used node.js together with socket.io, to monitor 50+ sensors. Data was updated in real-time in a browser. The data was visualized using smoothie.js.
Whenever a sensor value was updated, it was communicated to the browser. Some sensors only updated once a minute, others once a second, ...
Polling would have been overkill, because it would retrieve all data for all sensors, even from those that were not updated yet.

I had a similar problem and did a lot of research on this. As I understand it, there are three main options:
Short polling: Have an endpoint that your javascript client pings every second. This is the worst option, because the pings add latency up to one second to your communication, and depending on how you implement, the endpoint could query the database every second, adding unnecessary overhead.
Long polling: Have an endpoint that your javascript client pings that holds the connection until a) the event occurs or b) the connection times out. If the endpoint returns a response, the client gets the event information. If the endpoint does not return a response, no event has occurred, and the client sends a new request. This is a good option because the events can immediately trigger the response to the client, assuming you have an asynchronous interprocess communication layer (like 0MQ) to send the message without any sort of polling.
Websocket: Have your javascript client connect to a websocket server, which will send a message to your client immediately upon the event trigger.
I think a websocket is your best option, because it accommodates immediate communication of the event without all the request/response overhead. And most importantly, this is exactly what websockets are designed to do! As such, you will probably have to write the least amount of custom code with this solution.

There are two great commercial services that might work for you.
Firebase - a javascript hierarchical database and realtime
messaging/ synchronization platform, uses websockets and has other fallbacks
PubNub - a real time message passing and queue system, uses websockets

Updating Messages with Browser Pulling Messages from Server

I am tasked with creating a web page (think twitter) that updates when new messages are added to the database. When a message is removed from the database, it also must be removed from the client. It is possible that multiple clients can be accessing the same messages at the same time. Other actions can occur, such as a stop command issued on the server. Once this happens all the messages on the server will stop showing.
What I am looking for is an architecture for solving this problem.
Technologies that I am using are .Net 4.5, ASP.Net MVC and KnockoutJs. Nodejs could be used, but I’d need to know the benefit of using nodejs over using SignalR.
My currently implementation is using a javascript timer which is polling the server every 30 seconds for new messages. It works, but the polling feels dirty.

Can't comment on ASP.NET - but have used Node.js together with Knockout for this. I have used both WebSockets (via socket.io library) and also Server Sent Events (SSE) to push updates to the client model.
Sounds like SSE would be a good fit in this case. The key is whatever your database technology is should support emitting changes events to your node middle-ware so that you can send this to the browser.

After more research, the polling method is the optimal solution for the technologies in involved.
The crux of the problem is there is no notification of a new message, which would prompt a change in the system. Currently, a new message is received when it is committed to the database. SQL Server does not have a notification mechanism (this is not 100 percent true, but it not a dependency I wish to take on). In long the run, the optimal system would be to implement a publisher/subscribe motel using SignalR or nodejs which would deliver real-time messages to the client. For this to happen it would require a complete re-architecture of the application.

Develop Reference

JavaScript is the programming language of the Web.