Task queuing in RabbitMQ - javascript

I'm trying to setup task queuing with rabbitMQ, coupled to Node.js & React.js. I'm having trouble understanding how the task management actually works, and I cannot find a good example online.
I'm trying to send a task (like generate a lot of images on a user click) and send it to a queue so it does not block user navigation. Could anyone try to guide me through the process ?
I have my rabbitMQ server up and running, and am able to send/receive messages. I'm just having trouble converting this to a task management tool (like sending/receiving task-related data). Any help/examples are welcome here!

Here is an example about how The Grid are "Handling resource-intensive tasks with work queues (task queues) in RabbitMQ"; where all computationally intensive work at The Grid (such as image analysing, and image processing) are off-loaded as tasks/jobs in RabbitMQ. Instead of having a web server waiting for a result immediately, it is free to keep processing other requests.
RabbitMQ task queues are also used to distribute time-consuming tasks among multiple workers, and the main idea behind using task queues (for them) is to avoid doing a resource-intensive task immediately and having to wait for it to complete. A task can also be schedule to be done later.
Another example is the architecture behind CloudAMQP. It is built upon multiple small microservices, where RabbitMQ is used as messaging system. RabbitMQ is responsible for distributing events/tasks to the services that listen for them - where you have the option to send a message without having to know if another service is able to handle it immediately or not. Tasks can simply wait in the queue until the responsible service is ready.

Related

What is the Best Approach for Notifying a Next Js Frontend about a Completed Process from a Backend?

Situation:
I'm building a Next Js frontend that communicates with a Spring Boot backend through a Next Js API (BFF). The backend performs a lengthy process (5 minutes) and returns status code 202 to indicate that the process has been accepted and is in progress asynchronously. Now, I need to notify the Next Js API and frontend client when the process is completed. In this case, what is the best approach for this notification?
Possibilities that I found:
Firebase Cloud Message
WebSocket
Server-Sent Events (SSE)
Short/Long Pooling
What would be the best approach?
Due to the fact that cloud providers usually have short timeouts and costs are based on time of each operation, I wondered what the best approach would be to this scenario.
Analysing each possibility:
Please feel free to correct any wrong analyses I made when comparing. And feel free to add more possibilities.
Firebase Cloud Message:
Firebase Cloud Message seems like a good option as it will only notify the client when there's a message to deliver, and there won't be any problems with timeouts or costs.
However, it's usually used for push notifications, and the delivery of messages is not guaranteed, making it a less suitable option for this scenario.
Another positive thing about Cloud Message is that it is not necessary for the BFF to receive the message, because the Cloud Message will not tell the origin of the publisher.
WebSocket:
In this approach, the Next Js API establishes a persistent connection with the Spring Boot backend using WebSockets. The backend can then send a message to the API to notify it when the process is finished.
The biggest issue with this approach is the need to open a WebSocket connection with each client on the Next Js API, which is not so good for scalability as the first option.
Additionally, you have a bigger problem here, popular Next Js hosting providers like Vercel do not support WebSockets as they are stateless and have a maximum execution duration, making it impossible to maintain a WebSocket connection.
PS: I'm using Vercel, so it's not an available option.
Server-Sent Events (SSE):
In this approach, the Next Js API establishes a persistent connection with the Spring Boot backend using SSE. The backend can then send an event to the API to notify it when the process is finished.
However, this approach relies on the same problem as WebSocket, as the Next JS API will have to maintain a connection open for each client.
Also, some hosting providers may not support SSE for the same reason as WebSocket.
Short/Long Pooling
In this approach, the Next Js frontend periodically sends a request to the Next Js API (BFF) to check the status of the process in the Spring Boot backend.
This is the least performant and cost-effective option as each request counts as a separate request and is charged by the cloud provider. And as an example of a performance problem, if you try to put a pooling of 60 seconds in an effort to avoid too many charges, but the application finishes the process in 63 seconds, you will have to wait 120 seconds for the result instead of 63 seconds.
Looking at the options, however, it seems to be the only viable option, because it will ensure delivery and is supported by all cloud providers.
Questions
Are there any other possibilities for this scenario that I may have missed?
What is the best approach for this scenario and why?

Managing multiple long-running tasks concurrently in JS (Node.js)

Golang developer here, trying to learn JS (Node.js).
I'm used to working with goroutines in Go, which for the sake of simplicity let's assume are just threads (actually they're not exactly threads, more like Green Threads, but bear with me!).
Imagine now that I want to create some kind of service that can run some endlessTask which, for example, could be a function that receives data from a websocket and keeps an internal state updated, which can be queried later on. Now, I want to be able to serve multiple users at the same time and each of them can also stop their specific ongoing task at some point. In Go, I could just spawn a goroutine for my endlessTask, store some kind of session in the request dispatcher to keep track to which user each task belongs.
How can I implement something like this in JS? I looked through Node.js API documentation and I found some interesting things:
Cluster: doesn't seem to be exactly what I'm looking for
Child processes: could work, but I'd be spawning 1 process per client/user and the overhead would be huge I think
Worker threads: that's more like it, but the documentation states that they "are useful for performing CPU-intensive JavaScript operations" and "Node.js built-in asynchronous I/O operations are more efficient than Workers can be"
I'm not sure how I could handle this scenario without multi-threading or multi-processing. Would the worker threads solution be viable in this case?
Any input or suggestion would be appreciated. Thanks!
Imagine now that I want to create some kind of service that can run some endlessTask which, for example, could be a function that receives data from a websocket and keeps an internal state updated
So, rather than threads, you need to be thinking in terms of events and event handlers since that's the core of the nodejs architecture, particularly for I/O. So, if you want to be able to read incoming webSocket data and update some internal state when it arrives, all you do is set up an event handler for the incoming webSocket data. That event handler will then get called any time there's data waiting to be read and the interpreter is back to the event loop.
You don't have to create any thread structure for that or any type of loop or anything like that. Just add the right event handler and let it call you when there's incoming data available.
Now, I want to be able to serve multiple users at the same time and each of them can also stop their specific ongoing task at some point.
Just add an event listener to each webSocket and your nodejs server will easily serve multiple users. When the user disconnects their webSocket, the listener automatically goes away with it. There's nothing else to do or cleanup in that regard unless you want to update the internal state, in which case you can also listen for the disconnect event.
In Go, I could just spawn a goroutine for my endlessTask, store some kind of session in the request dispatcher to keep track to which user each task belongs.
I don't know goroutines but there are lots of options for storing the user state. If it's just info that you need to be able to get to when you already have the webSocket and don't need it to persist beyond that, then you can just add the state directly to the webSocket object. That object will be available anytime you get a webSocket event so you can always have it there to update when there's incoming data. You can also put the state other places (a database, Map object that's indexed by socket or by username of by whatever you need to be able to look it up by) - it really depends what exactly the state is.
I'm not sure how I could handle this scenario without multi-threading or multi-processing. Would the worker threads solution be viable in this case?
What you have described doesn't sound like anything that would require clustering, child processes or worker threads unless something you're doing with the data is CPU intensive. Just using event listeners for incoming data on each webSocket will let nodejs' very efficient and asynchronous I/O handling kick into gear. This is one of the things it is best at.
Keep in mind that I/O in nodejs may be a little inside-out from one what you're used to. You don't create a blocking read loop waiting for incoming data on the webSocket. Instead, you just set up an event listener for incoming data and it will call you when incoming data is available.
The time you would involve clustering, child processes or Worker Threads are when you have more CPU processing in your Javascript to process the incoming data than a single core can handle. I would only go there if/when you've proven you have a scalability issue with the CPU usage in your nodejs server. Then, you'd want to pursue an archicture that adds just a few other processes or threads to share the load (not one per connection). If you have specific CPU heavy processes (custom encryption or compresssion are classic examples), then it you may help to create a few other processes or Worker Threads that just handle a work queue for the CPU-heavy work. Or if it's just increasing the overall CPU cycles available to process incoming data, then you would probably go to clustering and just let each incoming webSocket get assigned to a cluster and still use the same event handling logic previously described, but now you have the webSockets split across several processes so you have more CPU to throw at them.

How to run concurrent infinite jobs

I have multiple jobs (functions) that process data in my DB.
They should be run indefinitely and concurrently. I was wondering about the
best way to run them. Should I write a bash file that starts node somejob.js for each job or should I use node workers from a JavaScript file, or some other method altogether?
While I don't know the best way to do such a job, but I have recently worked on a similar problem.
Since this is a broad question, but for illustration, I will give you an example of a mailing service.
I was asked to make a mailing service which can be used by other services,
to queue non similar raw emails with custom templates until it is sent.
A program (or worker) which I named dispatcher runs indefinitely, and checks for queued email
in DB. Fetched atmost n queued email (our email service has some threshold).
and send them concurrently, wait for some seconds and do that again.
To run dispatcher indefinitely I have used async.forever.
To run concurrent jobs, I have used async.map.
It seems you can do this in JS itself instead of using bash file (for some cron stuff). You can find lot of other useful methods here async

How to use pdfkit npm in async manner

I have written an application in node.js which takes input from user and generates pdfs file based on few templates.
I am using pdfkit npm for this purpose. My application is running in production. But my application is very slow, below are the reasons :
What problem I am facing :
It is working in sync manner. I can explain it by giving an example- Suppose a request come to the application to generate a pdf, is starts processing and after processing it returns back the response with generated pdf url. But if multiple request comes to the server it process each request one by one(in sync manner).
All request in queue have to wait untill the previous one is finished.
Maximum time my application gives Timeout or Internal Server Error.
I can not change the library, why ?
There are 40 templates I have written in js for pdfkit. And each template is of 1000 - 3000 lines.
If I will change the lib, i have to rewrite those templates according to new library.
It will take many months to rewrite and test it properly.
What solution I am using now :
I am managing a queue now, once a request come it got queued and a satisfactory message send back in response to the user.
Why this solution is not feasible ?
User should be provided valid pdf url upon success of request. But in queue approach, user is getting only a confirmation message. And pdf is being processed later in queue.
What kind of solution I am seeking now ?
Any way through which I can make this application multi-threaded/asynchronous, So that it will be capable of handling multiple request on a time without blocking the resource?
Please save my life.
I hate to break it to you, but doing computation in the order tasks come in is a pretty fundamental part of node. It sounds like loading these templates is a CPU-bound task, and since Node is single-threaded, it knocks these off the queue in the order they come in.
On the other hand, any framework would have a similar problem. Node being single-threading means its actually very efficient, because it doesn't lose cycles to context switching.
How many PDF-generations can your program handle at once? What type of hardware are you running this on? If it's failing on a few requests a second, then there's probably a programming fix.
For node, the more things you can make asynchronous the better. For example, any time you're reading a file in, it should be asynchronous.
Can you post the code for one of your PDF-creating request functions?

Can I use cluster in a node.js express app still spawn child_process workers for specific requests

EDIT:
I'm simplifying my question, because while #saintedlama's response is helpful information, it is tangental to what I'm trying to understand about using more than a single node process.
The crux of it is: How do I, or can I, manage manually spawned child processes, given the app is using already running using node's native cluster module?
Original question(s) below
I have an express.js app, the main function is to accept http requests and serve http responses via some MongoDB queries. For performance, the app uses node's native cluster module, and is spawned across available CPUs as worker processes at app start.
I now have some specific queries that may be long running - connecting to external services and APIs.
Is it worth spawning these specific queries to their own workers (using node's child_process )?
And if so, how will this be affected by the existing use of cluster?
Alternatively (or as well) if I set up a persistent worker queue using something like Monq or Agenda - and given I'm using cluster, how can I control which process handles the queue?
Spawning long running queries to some forked worker may, depending on the work done in node.js JavaScript processing, not yield any benefit.
Node.js does all IO processing (the queries are IO) in a dedicated thread (pool) in the background. So your node.js JavaScript process is not blocked while the database system processes the query.
In case that you're doing a lot of query result post processing in JavaScript it may yield benefits, since while post processing in JavaScript code the JavaScript processor is blocked.
Using a job queue for doing these queries async has benefits since you can start developing with job processors in the same process and you will later have the possibility to easily scale out by deploying job processors to dedicated machines in your environment. But: Be careful with this approach since having very large query results may slow down your job queue.

Categories

Resources