I am trying to read multiple JSON files simultaneously and create a single array using the data available in the files and do some processing with the created data array in the Node.js server.
I would like to read these files and do the processing tasks simultaneously using web workers.
I read a few interesting tutorials and articles about the subject, but no one clearly explains how to process simultaneous tasks using web workers.
They talk about running a single separated task from the main thread. But I need to do multiple tasks at once.
I also know that creating multiple workers is not recommended according to the documentation of Node.js.
Maybe I have a misunderstanding of how the web worker is functioning or with the implementation in order to perform multiple tasks.
I also tried this great library Thread.js - https://threads.js.org/ still the documentation is unclear about running multiple tasks.
Can anyone please explain what is the way of implementing this kind of work with best practice along with the pros and cons?
I would prefer implementing the vanilla JS solution other than using a library so the explanation would also be a reference to readers.
Also if possible someone can explain the usage of the Thread.js library as well for future reference.
Thank you very much.
As I'm sure you have read, the node is single-threaded, so running transactions in parallel is not going to work, even with worker threads as they are not designed to run in parallel.
A worker thread is more for longer, more process intense functions that you want to pass off and not block the main event loop, so if you think of it in terms of uploading and processing an image.. well we don't really want to hang up the entire event loop while the image is processed, so we can pass it off to a worker thread and it will tell the event loop when it's done, and it will return the response.
I think what you may be looking to do is just create a promise, so you would have a promise and say an array of the JSON file name like ["file1.JSON", "file2.JSON"] Then in your promise you would loop over, read the contents and 'return' the JSON object, insert or concat the main array variable.
Once the promise resolves, you would use the
.then(()=>{ //Do you processing of the full array })
Here's an example with a library (node-worker-threads-pool).
Thread/worker management is a complex endeavor, and I would not recommend trying to have some generic solution. Even the library I'm suggesting may not be correct.
// sample.js
const { StaticPool } = require('node-worker-threads-pool');
const start = async function () {
const staticPool = new StaticPool({
size: 4,
task: async function(n) {
const sleep = async function (ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
console.log(`thread ${n} started`);
await sleep(1000 * n);
return n + 1
}
});
// start 4 workers, each will run asynchronously and take a longer time to finish
for (let index = 0; index < 4; index++) {
staticPool.exec(index)
.then((result) => {
console.log(`result from thread pool for thread ${index}: ${result}`);
})
.catch((err) => console.error(`Error: ${err}`));
}
}
start();
I ran this in npm using node sample.js
As discussed in the other answer, it may not be useful (in terms of performance) to do this, but this example shows how it can be done.
The library also has examples where you give the tasks specific work.
Related
I have a question regarding this topic:
bcrypt.compare() is asynchronous, does that necessarily mean that delays are certain to happen?
Since I'm not allowed to put comments because of my membership level I had to open new topic.
My question is what are the downsides or is there any for using bcrypt.compareSync() instead of the async version of bcrypt.compare().
compareSync() definitely gives the correct result. So why not use it and use the compare() wrapped in Promises? Is it going to halt the nodeJS from serving other users?
The reason to use the async methods instead of the sync ones are explained in the readme of the project quite well.
Why is async mode recommended over sync mode?
If you are using bcrypt on a simple script, using the sync mode is perfectly fine. However, if you are using bcrypt on a server, the async mode is recommended. This is because the hashing done by bcrypt is CPU intensive, so the sync version will block the event loop and prevent your application from servicing any other inbound requests or events. The async version uses a thread pool which does not block the main event loop.
https://github.com/kelektiv/node.bcrypt.js#why-is-async-mode-recommended-over-sync-mode
So if you are using this in a webapplication or other environment where you don't want to block the main thread you should use the async version.
Node.js native methods have Sync attached methods like fs.writeFileSync, crypto.hkdfSync, child_process.execSync. JavaScript in the browser is implemented asynchronously with all native functions that require thread blocking, but Sync methods in Node.js actually block threads until the task is complete.
When using Callback or Promise in Node.js, if only asynchronous logic is executed internally, it becomes possible to manage asynchronous tasks while proceeding with other tasks without stopping the main thread (using count for Callbak, Promise.all).
Sync method runs the next line after work, so it is easy to identify the order of execution and easy to code. However, the main thread is blocked, so you can't do more than one task at a time.
Think about the next example.
const syncFunc = () => {
for (let i = 0; i < 100; i++) fs.readFileSync(`/files/${i}.txt`);
console.log('sync done');
};
const promiseFunc = async () => {
await Promise.all(Array.from({length: 100}, (_,i) => fs.promises.readFile(`/files/${i}.txt`)));
console.log('promise done');
};
The promise function ends much faster when there is no problem reading all 100 txt files.
This Sync feature applies equally to libraries made of C language. If you look at the following code, you can see the difference in implementation in C++.
compare
compareSync
In conclusion, I think it's a matter of choice. There is no problem using Sync method if the code you make is logic that goes on a single thread that doesn't matter if the main thread is blocked(like simple macro). However, if you are making logic where performance issues such as servers are important and the main thread should not stop as much as possible for thread or asynchronous management, you can choose Promise or Callback.
Just learning promises.
Javascript is single threaded right?
So when it uses fetch api to make http requests it all happens in one thread?
How does it manage concurrency with PromisePool then?
var p = Promise(...)
p.then(
...//stuff1
)
p.then(
//stuff2
)
Then two then above cannot run on multiple threads right? Just in one thread?
Thanks
Javascript is single threaded right?
No. That's a common over-simplification.
JavaScript runs a main event loop, which can do only one thing at a time.
Generally all your JavaScript will run on that one event loop, so only one piece of JS will run at a time.
However, many JavaScript functions call code which isn't JavaScript. Take fetch in a browser, for example. The responsibility for making the HTTP request is taken care of by the browser outside the main event loop so it can be making multiple requests and waiting for the responses while the JS program continues to run other tasks.
Web Workers (browsers) and Worker Threads (Node.js) are tools to let you move JS code outside the main event loop.
These can be implemented using threads.
I have some code which searches the file system for audio files, and then extracts the metadata from them. Once all the metadata is collected, it is passed on for further processing.
My current implementation uses a for loop with await so that only one file is being processed for metadata at once.
My first attempt tried to do them in parallel and attempting to read hundreds of audio files simultaneously used up all the RAM on my system.
I could switch to Promise Pool and read, for example, 4 files at a time (1 per CPU core) to get the best of both worlds.
Javascript is single threaded right?
Yes, one piece of JavaScript code always runs in one agent and one agent only can execute one function at a time.
So when it uses fetch api to make http requests it all happens in one thread?
No, not really. While your JavaScript code can not run in parallel to other JavaScript code, the browser can do other things in parallel (such as rendering the page, garbage collection, ...) including doing requests for you. Once the response comes back, the result gets handed over back into JavaScript through resolving the promise.
How does it manage concurrency?
Whilst two functions cannot run concurrently, two async functions can (or Promise chains), because they are composed of multiple execution units, and once of them finishes the other one can take over:
(async function first() {
await undefined;
console.log(2);
await undefined;
console.log(4);
})();
(async function second() {
console.log(1);
await undefined;
console.log(3);
})();
So JavaScript does have some form of concurrency, but on the level of functions (or blocks in an async function) and not on the level of single instructions.
To get real parallel execution, you need multiple agents (WebWorkers, ServiceWorkers), they can then also share memory in a limited way.
I have a large array(say over 1 000 000 elements that I would like to sort asynchronously so that it doesn't block the execution of the rest of my program.
I'm fairly new to JavaScript, so I was wondering if this would work.
var sortFunction = function(arr){
return new Promise(resolve, reject){
arr.sort();
resolve(arr);
}
}
sortFunction(hugeArray).then(function(arr){
//do something
})
This is actually possible, but experimental. You need to offload the work to a separate thread. You need to use the sparsely supported but useful SharedArrayBuffer and offload the work to a web worker.
You have to use both the web worker and the SharedArrayBuffer, only using a web worker won't help you because serializing will be too expensive. It has to be a zero-cost copy operation
Here is an example gist on how to perform this.
So I started a little project in Node.js to learn a bit about it. It's a simple caching proxy for arch linux's package system as node provides most of the heavy lifting.
This has two "main" phases, server setup and serving.
Then serving has two main phases, response setup and response.
The "main" setup involves checking some files, loading some config from files. loading some json from a web address. Then launching the http server and proxy instance with this info.
setup logger/options - read config - read mirrors - read webmirror
start serving
Serving involves checking the request to see if the file exists, creating directories if needed, then providing a response.
check request - check dir - check file
proxy request or serve file
I keep referring to them as synchronisation points but searches don't lead to many results. Points where a set of async tasks have to be finished before the process can complete a next step. Perl's AnyEvent has conditional variables which I guess is what I'm trying to do, without the blocking.
To start with I found I was "cheating" and using the synchronous versions of any functions where provided but that had to stop with the web requests, so I started restructuring things. Immediately most search's led to using async or step to control the flow. To start with I was trying lots of series/parallel setups but running into issues if there were any async calls underneath the functions would "complete" straight away and the series would finish.
After much wailing and gnashing of teeth, I ended up with a "waiter" function using async.until that tests for some program state to be set by all the tasks finishing before launching the next function.
// wait for "test" to be true, execute "run",
// bail after "count" tries, waiting "sleep" ms between tries;
function waiter( test, run, count, sleep, message ) {
var i=0;
async.until(
function () {
if ( i > count ) { return true; }
logger.debug('waiting for',message, test() );
return test();
},
function (callback) {
i++;
setTimeout(callback, sleep );
},
function (err) {
if ( i > count ) {
logger.error('timeout for', message, count*sleep );
return;
}
run()
}
);
}
It struck me as being rather large and ugly and requiring a module to implement for something that I thought was standard, so I am wondering what's a better way. Am I still thinking in a non-async way? Is there something simple in Node I have overlooked? Is there a standard way of doing this?
I imagine with this setup, if the program get's complex there's going to be a lot of nesting functions to describe the flow of the program and I'm struggling to see a good way to lay it all out.
any tips would be appreciated.
You can't really make everything to be synchronous. Nodejs is designed to perform asynchronously (which may of course torment you at times). But there are a few ways techniques to make it work in a synchronous way (provided the pseudo-code is well-thought and code is designed carefully):
Using callbacks
Using events
Using promises
Callbacks and events are easy to use and understand. But with these, sometimes the code can get real messy and hard to debug.
But with promises, you can avoid all that. You can make dependency chains, called 'promises' (for instance, perform Promise B only when Promise A is complete).
Earlier versions of node.js had implementation of promises. They promised to do some work and then had separate callbacks that would be executed for success and failure as well as handling timeouts.
But in later versions, that was removed. By removing them from the core node.js, it created possibility of building up modules with different implementations of promises that can sit on top of the core. Some of these are node-promise, futures, and promises.
See these links for more info:
Framework
Promises and Futures
Deferred Promise - jQuery
.:Disclaimer:.
Please note that I am trying to achieve something with node.js which might go against its design principles. However as node.js goes against the norm of "javascript is for client side", I am also trying something different here. Please bear with me.
.:Background:.
I have a requirement where Java Scripts need to be narrative (read from beginning to end) for simplistic scripts for simplistic users. I will also offer an Async scripting ability for more advanced users.
I understand the event driven methodology of node.js, and have a solution working based on Async callbacks. Now I am trying to simplify this for our more basic scripting requirements.
.:The Question:.
Is there a way to run a script (using its own sandbox) where execution can be paused while a result is being delivered.
E.g.
var util = require('util'),
vm = require('vm'),
user = require('user');
vm.runInThisContext('user.WaitForProceed(); console.log('Waiting over');');
console.log('Finished!');
User is my own module that will do the waiting. Basically I want it to sit there and block on that line in the vm object until it has received a result back. After which it will continue onto the console.log line.
The output of this example unimportant as it is also achievable through callbacks. The narrative nature of the example script is more important for this solution.
J
There is no way to pause the execution in Node. At least they tell us so :)
There are some libraries which support an advanced flow control, like Invoke, maybe this could help, but I understands, that's not what you asked for :)
Also you could implement a busy-loop using nextTick()
Or you could implement a blocking call in C(++) and provide it as a library. I never did this.
One last way is to readFileSync() to a namedpipe which closes on a certian event.
As you already mentioned, it's against the language principes, therefor these solutions are all hacky.
If you really want to sleep the execution of the Node process, you can. However, as you state, it seems that you're fully aware of the implications.
Here is an NPM module to do this:
https://github.com/ErikDubbelboer/node-sleep
You can use it like so:
var sleep = require('sleep');
sleep.sleep(1); //sleep for 1 sec
sleep.usleep(2000000); //sleep for 2 sec
I partially write this for future visitors that arrive by Google search: You should not use the aforementioned technique. If you decide that you must, be aware that this will block your Node process and it won't be able to do any additional work until the sleep period is over. Additionally, you will violate every expectation of any user who is aware that it's Node.js programs, as Node.js programs are suppose to be non-blocking.
Good luck.
If you realy want to pause execution while waiting for result you may try to work with node-sync. It build on node-fibers. Your application needs to be executed with node-fibers script instead of node. node-sync adds sync method to Function.prototype that allows to run it syncroniously.
Also you need to wrap your call in the fiber (thread) so as not to block the event-loop.
var Sync = require('sync');
var someAsyncFunction = function(a, b, callback) {
setTimeout(function() {
var result = a + b;
callback(undefined, result);
}, 1000);
};
// Run in a fiber
Sync(function(){
// This code runs in separate fiber and does not block the event loop
// First argument is 'this' context
var result = someAsyncFunction.sync(null, 2, 3);
// Waiting one second
console.log(result); // Will output 5
});
// Event loop here
Please be careful with it. You need to understand that is not the node way.