I’m currently writing a web application that has a bunch of microservices. I’m currently exploring how to properly communicate between all these services and I’ve decided to stick with a message bus, or more specifically Apache Kafka.
However, I have a few questions that I’m not sure how to conceptually get around.
I’m using an API Gateway-service as the main entry to the application. It acts as the main proxy to forward operations to the applicable microservices.
Consider the following scenario:
User sends a POST-request to the API Gateway with some information.
The Gateway produces a new message and publishes it to a Kafka topic.
Subscribed microservices pick up the message in the topic and processes the data.
So, how am I now supposed to respond to the client from the Gateway? What if I need some data from that microservice? Feels like that HTTP request could timeout. Should I stick with websockets between the client and API Gateway instead?
And also, if the client sends a GET request to fetch some data, how am I supposed to approach that using Kafka?
Thanks.
Let's say you're going to create an order. This is how it should work:
Traditionally we used to have an auto-increment field or a sequence in the RDBMS table to create an order id. However, this means order id is not generated until we save the order in DB. Now, when writing data in Kafka, we're not immediately writing to the DB and Kafka cannot generate order id. Hence you need to use some scalable id generation utility like Twitter Snowflake or something with the similar architecture so that you can generate an order id even before writing the order in Kafka
Once you have the order id, write a single event message on Kafka topic atomically (all-or-nothing). Once this is successfully done, you can send back a success response to the client. Do not write to multiple topics at this stage as you'll lose atomicity by writing to multiple topics. You can always have multiple consumer groups that write the event to multiple other topics. One consumer group should write the data in some persistent DB for querying
You now need to address the read-your-own-write i.e. immediately after receiving success response the user would want to see the order. But your DB is probably not yet updated with the order data. To acheive this, write the order data to a distributed cache like Redis or Memcached immediately after writing the order data to Kafka and before returning the success response. When the user reads the order, the cached data is returned
Now you need to keep the cache updated with the latest order status. That you can always do with a Kafka consumer reading the order status from a Kafka topic
To ensure that you don't need to keep all orders in cache memory. You can evict data based on LRU. If while reading an order, the data is not on cache, it will be read from the DB and written to the cache for future requests
Finally, if you want to ensure that the ordered item is reserved for the order so that no one else can take, like booking a flight seat, or the last copy of a book, you need a consensus algorithm. You can use Apache Zookeeper for that and create a distribured lock on the item
Do you have an option to create more endpoints in the gateway?
I would have the POST endpoint dedicated just for producing the message to the Kafka queue, which the other microservice will consume. And as a returned object from the endpoint, it'll contain some sort of reference or id to get the status of the message.
And create another GET endpoint in the gateway where you can retrieve the status of the message with the reference of the message you got when you created it.
Related
I am using the Google Cloud Firebase Realtime database to store messages. The messages are saved in
users/$userID/messages/$topic[0..n]/message[0..n]
I am using the official JS library AngularFire. I am listening on new messages via the following code:
this.observable = this.db.list(`users/${user.uid}/topics/${topic}/`).valueChanges();
I can now subscribe to the observable. Imagine the user has 1 million messages in a given topic. Whenever I add a new message, I receive the 1 million messages in the callback.
My question is, how much data is actually transferred behind the scenes if I modify or add a new message? I know the library keeps a local copy of the database.
On top, how do I find out which message got modified? Or do I need to figure this out myself?
If you have an existing listener, and a new message is added, only that new message is sent to that client. You can easily verify this for yourself by looking at the web socket traffic in the network tab of your browser's developer tools.
But I would recommend to use a query and limit to reduce the number of messages retrieved, as it seems unlikely any user will read 1 million messages, and it's wasteful to retrieve (much) more data than the user will see.
Based on the AngularFire documentation on querying lists, that should be something like:
this.db.list(`users/${user.uid}/topics/${topic}/`,
ref => ref.orderByKey().limitToLast(20)
).valueChanges()
Hi so first I apologize if my query may seem unclear, it’s first trying to do what I’m doing and I haven’t full idea around the intricacies and lingo lol.
So basically I’m running a NodeJs web server with React handling my front end. I’ve got Express to help manipulate user sessions and I just came by Server-Sent-Events as a way to send one-way messages(which is what I need to do). So far I’m able to send updates and messages via cURL on the terminal and running JS scripts, however these updates/messages go to every active client session but I want/need to be able to send these messages to specific active client sessions/connections.
Example: 5 client connections are established (session IDs A,B,C,D,E), now I want to send an alert message to session E only and manually.
I’m still green with NodeJs/Express and the concept of SSEs however I’m learning as I go for this pet project.
Send help
What you want is how SSE works. It is a dedicated connection between a client and a server process.
however these updates/messages go to every active client session
If that is what you see then your node script is running the exact same code for each client.
I think your question might be higher up - how to organize the data messaging? That is too big a topic for a single StackOverflow question, because it will depend on so many factors specific to your use case.
But one way would be to have an SQL database, with one record for each user. The node script polls that database table and if the record for the current user changes, it sends the new data to them. Then to send data to user E, you just edit the database record for user E.
Long story short, I have been developing a Discord Bot that requires a query to the database every time a message is sent in a server. It will then perform an action depending on the message etc. The query is asynchronous, therefore it will not block another message from being handled.
However in terms of scalability, I do not believe querying a database every time a message is sent is very speedy and could become a problem. Is there a better solution? I am unaware of a way to store data within a particular discord server, which would likely solve my issue.
My main idea is to have heap storage, where the most recently active servers (ie sent messages recently), their data is queried into the heap, and when they are inactive, it is removed from the heap. Is this a good solution? Or is it better to just keep querying every time?
You could create a cache and every time you fetch or insert something into your database you can write this into the cache.
Then, if you need some data you can check if it's in the cache and if not, get it from the database and store it in the cache right after.
This prevents unnecessary access to the database because the database is only accessed if your bot does not have the required data stored locally.
Note:
The cache will only be cleared when you restart the bot. But of course, you can also clear it after a certain amount of time or by other triggers.
If you need an example, you can take a look at my guildMemberAdd event and the corresponding config command
I have built a web application using AngularJS (front-end) and PHP/MySQL (back-end).
I was wondering if there is a way to "watch" the MySQL database (without Node.js), so if one user adds some data to it, the changes are synced to other users too.
E.g. I know Firebase does that, but it's object oriented database and I am unable to do the advanced queries there like I do with SQL.
I was thinking to use $interval and $http and do ajax requests, so that way I could detect changes in the database. Well, that's possible, but it'll then do thousands of http requests to the server everyday and plus interpret php on each request.
I believe nothing is impossible, I just need an idea to do this, which I don't have, so that's why I am asking for a help here.
If you want a form of "real-time communication" you'll likely have to incorporate some form of long-polling from the client. Unless you use web sockets, but that's a big post about a bunch of different things. You're right to be concerned about bandwidth and demand on the DB though. So here's my suggestion:
If you don't have experience with web sockets then log your events in a separate table/view and use the pub/sub method to subscribe entities to an event, and broadcast that event to the table. Then long-poll against the watcher view to see when changes may have occurred. If one did occur then you query for the exact value.
Another option would be to use some query system with "deciders" that hold messages. Take a look at Amazon's SQS platform for a better explanation of how this could work. Basically you have a queue that holds messages and a decider chooses where to store the message using some hash or sorting method (to reduce run time). When the client requests an update, the decider finds any messages that would apply based on the hash/sort and returns them. Then you just have to decide how and when to destruct the messages.
The second option would require a lot more tinkering though, so it's really about your preference. I think what you'll find the difficulty to be is that most solutions have to deal with the fact that the message has to be delivered 1 or More times and you'll need to track when someone received the message and if it can now be deleted from the queue/event table or if you still need to wait. Otherwise you'll consume a lot of memory.
I wrote a web page where there is a zone for user's comments.
Any authenticated users could post a comment.
As many users could post comments almost simultaneously, I want the comments list to be auto-refreshed.
Thus, I think about using WebSockets.
My thought are about a good/best practice for this use case:
Once a comment is posted, should WebSockets process read the current comments list on database and send a Json response containing all the new comments? This would allow the client to directly append the new comments on the DOM (JS).
Or should WebSocket just check the database (or queue if using a message queue (Redis, RabbitMQ etc..) for instance) and act just like: "Hey, I have new comments, click here if you want to see them !". This solution would only signal the presence of new comments, without bringing all those comments to the client. The workflow of retrieving the events would then involve by the client (by clicking on this sentence for instance) e.g using the traditional Ajax direction: client => server.
It is highly possible that a user posts a comment, then navigates to another page of the website. Therefore, a websocket response containing the whole new comments would be useless. A simple notification would then be possible, as most of known websites do for instance with the "+1" counter or more relevant to the "comments" scenario: "1 new comment available".
Which way should I choose?
I think to decide which data to push is mostly a matter of UI usability / user experience, as opposed to which technology is being used to interact with the server. We should avoid changing the UI with server pushed data in a way that would surprise the user in a negative way, for example having the comment feed constantly growing without any intervention from him.
But in the case of a realtime chart, it's probably better to push the data directly into the chart, that would be what the user expects.
In the case of the comment feed the reason why most sites go with the 'click to load' approach is because of user experience, so I think that is probably the best approach.
I use a combination of both....
In some pages the websocket communication contains the actual data--sort of like a stock ticker update.
And in other cases, the websocket communication just says -- all users viewing xyz data--refresh it. And then the browsers performs an ajax to obtain the new data and the grid is smartly refreshed in such a way that only the changed cells are modified on screen using innerHTML and the new rows are added and deleted rows are removed.
In cases like stackoverflow, it makes sense to show a message, "Got new stuff to show--want to see it?"
When I establish the websocket in the browser, I pass a page Id in the url and the cookies are passed too. So websocket server knows--the user cookie and the page which is being viewed.
Then in the database (or middle tier logic) communicates to the websocket server with messages such as: This message is for users viewing 'xyz' page: smartly refresh grid 'abc'. And the websocket server broadcasts the message.
Because the protocol allows you to pass anything you like, you have the ability to make it anyway you like.
My advise it to do what's best in each particular situation.