How to run concurrent infinite jobs - javascript

I have multiple jobs (functions) that process data in my DB.
They should be run indefinitely and concurrently. I was wondering about the
best way to run them. Should I write a bash file that starts node somejob.js for each job or should I use node workers from a JavaScript file, or some other method altogether?

While I don't know the best way to do such a job, but I have recently worked on a similar problem.
Since this is a broad question, but for illustration, I will give you an example of a mailing service.
I was asked to make a mailing service which can be used by other services,
to queue non similar raw emails with custom templates until it is sent.
A program (or worker) which I named dispatcher runs indefinitely, and checks for queued email
in DB. Fetched atmost n queued email (our email service has some threshold).
and send them concurrently, wait for some seconds and do that again.
To run dispatcher indefinitely I have used async.forever.
To run concurrent jobs, I have used async.map.
It seems you can do this in JS itself instead of using bash file (for some cron stuff). You can find lot of other useful methods here async

Related

How to use pdfkit npm in async manner

I have written an application in node.js which takes input from user and generates pdfs file based on few templates.
I am using pdfkit npm for this purpose. My application is running in production. But my application is very slow, below are the reasons :
What problem I am facing :
It is working in sync manner. I can explain it by giving an example- Suppose a request come to the application to generate a pdf, is starts processing and after processing it returns back the response with generated pdf url. But if multiple request comes to the server it process each request one by one(in sync manner).
All request in queue have to wait untill the previous one is finished.
Maximum time my application gives Timeout or Internal Server Error.
I can not change the library, why ?
There are 40 templates I have written in js for pdfkit. And each template is of 1000 - 3000 lines.
If I will change the lib, i have to rewrite those templates according to new library.
It will take many months to rewrite and test it properly.
What solution I am using now :
I am managing a queue now, once a request come it got queued and a satisfactory message send back in response to the user.
Why this solution is not feasible ?
User should be provided valid pdf url upon success of request. But in queue approach, user is getting only a confirmation message. And pdf is being processed later in queue.
What kind of solution I am seeking now ?
Any way through which I can make this application multi-threaded/asynchronous, So that it will be capable of handling multiple request on a time without blocking the resource?
Please save my life.
I hate to break it to you, but doing computation in the order tasks come in is a pretty fundamental part of node. It sounds like loading these templates is a CPU-bound task, and since Node is single-threaded, it knocks these off the queue in the order they come in.
On the other hand, any framework would have a similar problem. Node being single-threading means its actually very efficient, because it doesn't lose cycles to context switching.
How many PDF-generations can your program handle at once? What type of hardware are you running this on? If it's failing on a few requests a second, then there's probably a programming fix.
For node, the more things you can make asynchronous the better. For example, any time you're reading a file in, it should be asynchronous.
Can you post the code for one of your PDF-creating request functions?

Task queuing in RabbitMQ

I'm trying to setup task queuing with rabbitMQ, coupled to Node.js & React.js. I'm having trouble understanding how the task management actually works, and I cannot find a good example online.
I'm trying to send a task (like generate a lot of images on a user click) and send it to a queue so it does not block user navigation. Could anyone try to guide me through the process ?
I have my rabbitMQ server up and running, and am able to send/receive messages. I'm just having trouble converting this to a task management tool (like sending/receiving task-related data). Any help/examples are welcome here!
Here is an example about how The Grid are "Handling resource-intensive tasks with work queues (task queues) in RabbitMQ"; where all computationally intensive work at The Grid (such as image analysing, and image processing) are off-loaded as tasks/jobs in RabbitMQ. Instead of having a web server waiting for a result immediately, it is free to keep processing other requests.
RabbitMQ task queues are also used to distribute time-consuming tasks among multiple workers, and the main idea behind using task queues (for them) is to avoid doing a resource-intensive task immediately and having to wait for it to complete. A task can also be schedule to be done later.
Another example is the architecture behind CloudAMQP. It is built upon multiple small microservices, where RabbitMQ is used as messaging system. RabbitMQ is responsible for distributing events/tasks to the services that listen for them - where you have the option to send a message without having to know if another service is able to handle it immediately or not. Tasks can simply wait in the queue until the responsible service is ready.

Can I use cluster in a node.js express app still spawn child_process workers for specific requests

EDIT:
I'm simplifying my question, because while #saintedlama's response is helpful information, it is tangental to what I'm trying to understand about using more than a single node process.
The crux of it is: How do I, or can I, manage manually spawned child processes, given the app is using already running using node's native cluster module?
Original question(s) below
I have an express.js app, the main function is to accept http requests and serve http responses via some MongoDB queries. For performance, the app uses node's native cluster module, and is spawned across available CPUs as worker processes at app start.
I now have some specific queries that may be long running - connecting to external services and APIs.
Is it worth spawning these specific queries to their own workers (using node's child_process )?
And if so, how will this be affected by the existing use of cluster?
Alternatively (or as well) if I set up a persistent worker queue using something like Monq or Agenda - and given I'm using cluster, how can I control which process handles the queue?
Spawning long running queries to some forked worker may, depending on the work done in node.js JavaScript processing, not yield any benefit.
Node.js does all IO processing (the queries are IO) in a dedicated thread (pool) in the background. So your node.js JavaScript process is not blocked while the database system processes the query.
In case that you're doing a lot of query result post processing in JavaScript it may yield benefits, since while post processing in JavaScript code the JavaScript processor is blocked.
Using a job queue for doing these queries async has benefits since you can start developing with job processors in the same process and you will later have the possibility to easily scale out by deploying job processors to dedicated machines in your environment. But: Be careful with this approach since having very large query results may slow down your job queue.

How Nodejs's internal threadpool works exactly?

I have read a lot of article about how NodeJs works. But I still can not figure out exactly how the internal threads of Nodejs proceed IO operations.
In this answer https://stackoverflow.com/a/20346545/1813428 , he said there are 4 internal threads in the thread pool of NodeJs to process I/O operations . So what if I have 1000 request coming at the same time , every request want to do I/O operations like retrieve an enormous data from the database . NodeJs will deliver these request to those 4 worker threads respectively without blocking the main thread . So the maximum number of I/O operations that NodeJs can handle at the same time is 4 operations. Am I wrong?.
If I am right , where will the remaining requests will handle?. The main single thread is non blocking and keep driving the request to corresponding operators , so where will these requests go while all the workers thread is full of task? .
In the image below , all of the internal worker threads are full of task , assume all of them need to retrieve a lot of data from the database and the main single thread keep driving new requests to these workers, where will these requests go? Does it have a internal task queuse to store these requests?
The single, per-process thread pool provided by libuv creates 4 threads by default. The UV_THREADPOOL_SIZE environment variable can be used to alter the number of threads created when the node process starts, up to a maximum value of 1024 (as of libuv version 1.30.0).
When all of these threads are blocked, further requests to use them are queued. The API method to request a thread is called uv_queue_work.
This thread pool is used for any system calls that will result in blocking IO, which includes local file system operations. It can also be used to reduce the effect of CPU intensive operations, as #Andrey mentions.
Non-blocking IO, as supported by most networking operations, don't need to use the thread pool.
If the source code for the database driver you're using is available and you're able to find reference to uv_queue_work then it is probably using the thread pool.
The libuv thread pool documentation provides more technical details, if required.
In the image below , all of the internal worker threads are full of task , assume all of them need to retrieve a lot of data from the database and the main single thread keep driving new requests to these workers
This is not how node.js use those threads.
As per Node.js documentation, the threads are used like this:
All requests and responses are "handled" in the main thread. Your callbacks (and code after await) simply take turns to execute. The "loop" between the javascript interpreter and the "event loop" is usually just a while loop.
Apart from worker_threads that you yourself start there are only 4 things node.js use threads for: waiting for DNS response, disk I/O, the built-in crypto library and the built-in zip library. Worker_threads are the only places where node.js execute javascript outside the main thread. All other use of threads execute C/C++ code.
If you are want to know more then I've written several answers to related questions:
Node js architecture and performance
how node.js server is better than thread based server
node js - what happens to incoming events during callback excution
Does javascript process using an elastic racetrack algorithm
Is there any other way to implement a "listening" function without an infinite while loop?
no, main use case for thread pool is offloading CPU intensive operations. IO is performed in one thread - you don't need multiple threads if you are waiting external data in parallel, and event loop is exactly a technique to organise execution flow so that you wait for as much as possible in parallel
Example:
You need to send 100 emails with a question (y/n) and another one with number of answered "y". It takes about 30 second to write email and two hours on average for reply + 10 seconds to read response. You start by writing all 100 emails ( 50 minutes of time ), then you wait alert sound which wakes you up every time reply arrives, and as you receive answers you increase number of "y". in ~2 hours and 50 minutes you're done. This is example of async IO and event loop ( no thread pools )
Blocking example: send email, wait for answer, repeat. Takes 4 days (two if you can clone another you )
Async thread pool example: each response is in the language you don't know. You have 4 translator friends. You email text to them, and they email translated text back to you (Or, more accurate: you print text and put it into "needs translation" folder. Whenever translator available, text is pulled from folder)

Moving node.js server javascript processing to the client

I'd like some opinions on the practical implications of moving processing that would traditionally be done on the server to be handled instead by the client in a node.js web app.
Example case study:
The user uploads a CSV file containing a years worth of their bank statement entries. We want to parse the file, categorise each entry and calculate cumulative values for each category so that we can store the newly categorised statement in a db and display spending analysis to the user.
The entries are categorised by matching strings in the descriptions. There are many categories and many entries and it takes a fair amount of time to process.
In our node.js server, we can happily free up the event loop whilst waiting for network responses and so on, but if there is any data crunching or similar processing, the server will be blocked from responding to requests, and this seems unavoidable.
Traditionally, the CSV file would be passed to the server, the server would process, save in db, and send back the output of the processing.
It seems to make sense in our single threaded node.js server that this processing is handled by the browser, and the output displayed and sent to server to be stored. Of course the client will have to wait while this is done, but their processing will not be preventing the server from responding to requests from other clients.
I'm interested to see if anyone has had experience build apps using this model.
So, the question is.. are there any issues in getting browsers rather than the server to handle, wherever possible, any processing that will block the event loop? Is this a good/sensible/viable approach to node.js application development?
I don't think trusting client processed data is a good idea.
Instead you should look into creating a work queue that a separate process listens on, separating the CPU intensive tasks from your node.js process handling HTTP requests.
My proposed data flow would be:
HTTP upload request
App server (save raw file somewhere the worker process can access)
Notification to 'csv' work queue
Worker processes uploaded csv file.
Although perfectly possible, simply shifting the processing to the client machine does not solve the basic problem.
Now the client's event loop is blocked, preventing the user from interacting with the browser. Browsers tend to detect this problem and stop execution of the page's script altogether. Something your users will certainly hate.
There is no way around either delegating or splitting up the work-load.
Using a second process (for example a 2nd node instance) for doing the number crunching server-side has the added benefit of allowing the operating system to use a 2nd CPU core. Ideally you run as many Node instances as you have CPU cores in the server and balance your work-load between them. Have a look at the diode module for some inspiration on how to implement multi-process communication in node.

Categories

Resources