Nodejs async & sync reading & writing of files - when to use which?

Nodejs async & sync reading & writing of files - when to use which? - javascript

It's a very general question, but I don't quite understand. When would I prefer one over the other? I don't seem to understand what situations might arise, which would clearly favour one over the other. Are there strong reasons to avoid x / use x?

When would I prefer one over the other?
In a server intended to scale and serve the needs of many users, you would only use synchronous I/O during server initialization. In fact, require() itself uses synchronous I/O. In all other parts of your server that handle incoming requests once the server is already up and running, you would only use asynchronous I/O.
There are other uses for node.js besides creating a server. For example, suppose you want to create a script that will parse through a giant file and look for certain words or phrases. And, this script is designed to run by itself to process one file and it has no persistent server functionality and it has no particular reason to do I/O from multiple sources at once. In that case, it's perfectly OK to use synchronous I/O. For example, I created a node.js script that helps me age backup files (removing backup files that meet some particular age criteria) and my computer automatically runs that script once a day. There was no reason to use asynchronous I/O for that type of use so I used synchronous I/O and it made the code simpler to write.
I don't seem to understand what situations might arise, which would clearly favour one over the other. Are there strong reasons to avoid x / use x?
Avoid ever using synchronous I/O in the request handlers of a server. Because of the single threaded nature of Javascript in node.js, using synchronous I/O blocks the node.js Javascript thread so it can only do one thing at a time (which is death for a multi-user server) whereas asynchronous I/O does not block the node.js Javascript thread (allowing it to potentially serve the needs of many users).
In non-multi-user situations (code that is only doing one thing for one user), synchronous I/O may be favored because writing the code is easier and there may be no advantages to using asynchronous I/O.
I thought of an electron application with nodejs, which is simply reading a file and did not understand what difference that would make really, if my software really just has to wait for that file to load anyways.
If this is a single user application and there's nothing else for your application to be doing while waiting for the file to be read into memory (no sockets to be responding to, no screen updates, no other requests to be working on, no other file operations to be running in parallel), then there is no advantage to using asynchronous I/O so synchronous I/O will be just fine and likely a bit simpler to code.

When would I prefer one over the other?
Use the non-Sync versions (the async ones) unless there's literally nothing else you need your program to do while the I/O is pending, in which case the Sync ones are fine; see below for details...
Are there strong reasons to avoid x / use x?
Yes. NodeJS runs your JavaScript code on a single thread. If you use the Sync version of an I/O function, that thread is blocked waiting on I/O and can't do anything else. If you use the async version, the I/O can continue in the background while the JavaScript thread gets on with other work; the I/O completion will be queued as a job for the JavaScript thread to come back to later.
If you're running a foreground Node app that doesn't need to do anything else while the I/O is pending, you're probably fine using Sync calls. But if you're using Node for processing multiple things at once (like web requests), best to use the async versions.
In a comment you added under the question you've said:
I thought of an electron application with nodejs, which is simply reading a file and did not understand what difference that would make really, if my software really just has to wait for that file to load anyways.
I have virtually no knowledge of Electron, but I note that it uses a "main" process to manage windows and then a "rendering" process per window (link). That being the case, using Sync functions will block the relevant process, which may affect application or window responsiveness. But I don't have any deep knowledge of Electron (more's the pity).
Until somewhat recently, using async functions meant using lots of callback-heavy code which was hard to compose:
// (Obviously this is just an example, you wouldn't actually read and write a file this way, you'd use streaming...)
fs.readFile("file1.txt", function(err, data) {
if (err) {
// Do something about the error...
} else {
fs.writeFile("file2.txt", data, function(err) {
if (err) {
// Do something about the error...
} else {
// All good
});
}
});
Then promises came along and if you used a promisified* version of the operation (shown here with pseudonyms like fs.promisifiedXYZ), it still involved callbacks, but they were more composable:
// (See earlier caveat, just an example)
fs.promisifiedReadFile("file1.txt")
.then(function(data) {
return fs.promisifiedWriteFile("file2.txt", data);
})
.then(function() {
// All good
})
.catch(function(err) {
// Do something about the error...
});
Now, in recent versions of Node, you can use the ES2017+ async/await syntax to write synchronous-looking code that is, in fact, asynchronous:
// (See earlier caveat, just an example)
(async () => {
try {
const data = await fs.promisifiedReadFile("file1.txt");
fs.promisifiedWriteFile("file2.txt", data);
// All good
} catch (err) {
// Do something about the error...
}
})();
Node's API predates promises and has its own conventions. There are various libraries out there to help you "promisify" a Node-style callback API so that it uses promises instead. One is promisify but there are others.

Related

Is it ever better to use Node's filesystem sync methods over the same async methods?

This is a question about performance more than anything else.
Node exposes three different types of methods to accomplish various filesystem tasks:
Promises API (async)
Callback API (async)
Synchronous API (sync)
I've read more articles and stackoverflow answers than I can count, all of which claiming to never need the sync methods.
I recently wrote a script which required a couple directories to be made if they didn't already exist. During this, I noticed that if I used the async/await methods (primarily fs.promises.mkdir and fs.promises.access), the event loop would simply continue to the next async bit of code, regardless of the fact that the next bits require those directories. This is expected behavior, after all, it's async.
I understand this could be solved with a nice little callback hell sesh, but that isn't the question, whereas the idea that the promises api can be used over all other methods is.
The question then becomes:
Is it ever better to use Node's filesystem sync methods over the same async methods?
Is it ever truly required in situations like this to block the process?
Or said differently:
Is it possible to completely avoid sync methods and ONLY use the promises api (NOT promises + callbacks)?
It seems like using the sync methods (given my situation above, where the directories are required to be there before any other call is made) can be EXTREMELY useful to write readable, clear code, even though it may negatively impact performance.
With that being said, there's an overwhelming level of information to say that the sync api is completely useless and never required.
Again, this purely caters to the promises api. Yes, callbacks and promises are both async, but the difference between the job and message queues makes the both api's completely different in this context.
PS: For additonal context on examples, I've provided a code sample so you don't have to imagine my example ;)
Thanks! :)
// Checks if dir exists, if not, creates it. (not the actual code, just an example)
// Sync version
if (!fs.existsSync(dirPath)) {
fs.mkdirSync(dirPath);
}
// Async version
try {
await fs.promises.access(dirPath);
} catch {
await fs.promises.mkdir(dirPath);
}

It depends on the situation. The main benefit of the sync methods is that they allow for easier consumption of their results, and the main disadvantage is that they prevent all other code from executing while working.
If you find yourself in a situation where other code not being able to respond to events is not an issue, you might consider it to be reasonable to use the sync methods - if the code in question has no chance of or reason for running in parallel with anything else.
For example, you would definitely not want to use the sync methods inside, say, a server handling a request.
If your code requires reading some configuration files (or creating some folders) when the script first runs, and there aren't enough of them such that parallelism would be a benefit, you can consider using the sync methods.
That said, even if your current implementation doesn't require parallelism, something to keep in mind is that, if the situation changes and you find that you do actually need to allow for parallel processing, you won't have to make any changes to your existing code if you had started out by using the promise-based methods in the first place - and if you understand the language, using the Promises properly should be pretty easy, so if there's a chance of that, you might consider using the Promises anyway.

Writing custom, true Asynchronous functions in Javascript/Node

How do the NodeJS built in functions achieve their asynchronicity?
Am I able to write my own custom asynchronous functions that execute outside of the main thread? Or do I have to leverage the built in functions?

Just a side note, true asynchronous doesn't really mean anything. But we can assume you mean parallelism?.
Now depending on what your doing, you might find there is little to no benefit in using threads in node. Take for example: nodes file system, as long as you don't use the sync versions, it's going to automatically run multiple requests in parallel, because node is just going to pass these requests to worker threads.
It's the reason when people say Node is single threaded, it's actually incorrect, it's just the JS engine that is. You can even prove this by looking at the number of threads a nodeJs process takes using your process monitor of choice.
So then you might ask, so why do we have worker threads in node?. Well the V8 JS engine that node uses is pretty fast these days, so lets say you wanted to calculate PI to a million digits using JS, you could do this in the main thread without blocking. But it would be a shame not to use those extra CPU cores that modern PC's have and keep the main thread doing other things while PI is been calculated inside another thread.
So what about File IO in node, would this benefit been in a worker thread?.. Well this depends on what you do with the result of the file-io, if you was just reading and then writing blocks of data, then no there would be no benefit, but if say you was reading a file and then doing some heavy calculations on these files with Javascript (eg. some custom image compression etc), then again a worker thread would help.
So in a nutshell, worker threads are great when you need to use Javascript for some heavy calculations, using them for just simple IO may in fact slow things down, due to IPC overheads.
You don't mention in your question what your trying to run in parallel, so it's hard to say if doing so would be of benefit.

Javascript is mono-thread, if you want to create 'thread' you can use https://nodejs.org/api/worker_threads.html.
But you may have heard about async function and promises in javascript, async function return a promise by default and promise are NOT thread. You can create async function like this :
async function toto() {
return 0;
}
toto().then((d) => console.log(d));
console.log('hello');
Here you will display hello then 0
but remember that even the .then() will be executed after it's a promise so that not running in parallel, it will just be executed later.

bcrypt.compare() or bcrypt.compareSync()

I have a question regarding this topic:
bcrypt.compare() is asynchronous, does that necessarily mean that delays are certain to happen?
Since I'm not allowed to put comments because of my membership level I had to open new topic.
My question is what are the downsides or is there any for using bcrypt.compareSync() instead of the async version of bcrypt.compare().
compareSync() definitely gives the correct result. So why not use it and use the compare() wrapped in Promises? Is it going to halt the nodeJS from serving other users?

The reason to use the async methods instead of the sync ones are explained in the readme of the project quite well.
Why is async mode recommended over sync mode?
If you are using bcrypt on a simple script, using the sync mode is perfectly fine. However, if you are using bcrypt on a server, the async mode is recommended. This is because the hashing done by bcrypt is CPU intensive, so the sync version will block the event loop and prevent your application from servicing any other inbound requests or events. The async version uses a thread pool which does not block the main event loop.
https://github.com/kelektiv/node.bcrypt.js#why-is-async-mode-recommended-over-sync-mode
So if you are using this in a webapplication or other environment where you don't want to block the main thread you should use the async version.

Node.js native methods have Sync attached methods like fs.writeFileSync, crypto.hkdfSync, child_process.execSync. JavaScript in the browser is implemented asynchronously with all native functions that require thread blocking, but Sync methods in Node.js actually block threads until the task is complete.
When using Callback or Promise in Node.js, if only asynchronous logic is executed internally, it becomes possible to manage asynchronous tasks while proceeding with other tasks without stopping the main thread (using count for Callbak, Promise.all).
Sync method runs the next line after work, so it is easy to identify the order of execution and easy to code. However, the main thread is blocked, so you can't do more than one task at a time.
Think about the next example.
const syncFunc = () => {
for (let i = 0; i < 100; i++) fs.readFileSync(`/files/${i}.txt`);
console.log('sync done');
};
const promiseFunc = async () => {
await Promise.all(Array.from({length: 100}, (_,i) => fs.promises.readFile(`/files/${i}.txt`)));
console.log('promise done');
};
The promise function ends much faster when there is no problem reading all 100 txt files.
This Sync feature applies equally to libraries made of C language. If you look at the following code, you can see the difference in implementation in C++.
compare
compareSync
In conclusion, I think it's a matter of choice. There is no problem using Sync method if the code you make is logic that goes on a single thread that doesn't matter if the main thread is blocked(like simple macro). However, if you are making logic where performance issues such as servers are important and the main thread should not stop as much as possible for thread or asynchronous management, you can choose Promise or Callback.

Will Node.js get blocked when processing large file uploads?

Will Node.js get blocked when processing large file uploads?
Since Node.js only has one thread, is that true that when doing large file uploads all other requests will get blocked?
If so, how should I handle file uploads in nodejs?

All the I/O operations is handled by Node.js is using multiple threads internally; it's the programming interface to that I/O functionality that's single threaded, event-based, and asynchronous.
So the big upload of your example is performed by a separate thread that's managed by Node.js, and when that thread completes its work, your callback is put onto the event loop queue.
When you do CPU intensive task it blocks. Let's say we have a task compute() which needs to run almost continuously, and does some CPU intensive calculations.
Answer to the main question "How should I handle file uploads in nodejs?" Check in your code (or the library) where you save file on the server, is it dependent on writefile() or writeFileSync()?If it is using writefile() then its asynchronous; But if it is writeFileSync() its is synchronous version.
Updates: In response to a comment:
"the answer "No, it won't block" is correct but explanation is
completely wrong. JS is in one thread AND I/O is in one (same)
thread. Event loop / asynchronous processing / callbacks make this possible. No multiple threads required. " - by andrey-sidorov
There is no async API for file operations so Node.js uses a thread pool for that. You can see it in the code of libuv. You can look at the source for fs.readFile in lib/fs.js, you’ll see binding.read. Whenever you see binding in Node’s core modules you’re looking at a portal into the land of C++. This binding is made available using NODE_SET_METHOD(target, "read", Read). If you know any C, you might think this is a macro – it was originally, but it’s now a function.
Going back to ASYNC_CALL in Read, one of the arguments is read: the syscall read. But wait, doesn't this function block?
Yes, but that’s not the end of the story. An Introduction to libuv denotes the following:
"The libuv filesystem operations are different from socket operations. Socket operations use the non-blocking operations provided by the operating system. Filesystem operations use blocking functions internally, but invoke these functions in a thread pool and notify watchers registered with the event loop when application interaction is required."
Summary:
Node API method writeFile() is asynchronous, but that doesn’t necessarily mean it’s non-blocking underneath. As the libuv book points out, socket (network) code is non-blocking, but filesystems are more complicated. Some things are event-based (kqueue), others use thread pool (as in this case).
Consider going through C code on which Node.js is developed, for more information:
Unix fs.c
Windows fs.c

That depends on the functions used for accomplishing that task. If you are using asynchronous functions, then Node.js will not block. But there are also synchronous functions, for example fs.readFileSync (FileSystem Doc), that will block execution.
Just take care and choose asynchronous functions. This way Node.js will keep running while slow tasks/waits are completed by external libraries. Once those tasks are completed, the Event Loop will take care of the result and execute your callbacks.
You can read more about the Event Loop here: Understanding the Node.js event loop

That is the exact reason node.js being asynchronous.
Most (all?) functions in node.js involving Input/Output operations (where bottleneck is some other device than CPU or RAM) the operation happens on a sepparate thread, letting your node.js server do some other code while waiting.

Struggling with async synchronisation in node.js

So I started a little project in Node.js to learn a bit about it. It's a simple caching proxy for arch linux's package system as node provides most of the heavy lifting.
This has two "main" phases, server setup and serving.
Then serving has two main phases, response setup and response.
The "main" setup involves checking some files, loading some config from files. loading some json from a web address. Then launching the http server and proxy instance with this info.
setup logger/options - read config - read mirrors - read webmirror
start serving
Serving involves checking the request to see if the file exists, creating directories if needed, then providing a response.
check request - check dir - check file
proxy request or serve file
I keep referring to them as synchronisation points but searches don't lead to many results. Points where a set of async tasks have to be finished before the process can complete a next step. Perl's AnyEvent has conditional variables which I guess is what I'm trying to do, without the blocking.
To start with I found I was "cheating" and using the synchronous versions of any functions where provided but that had to stop with the web requests, so I started restructuring things. Immediately most search's led to using async or step to control the flow. To start with I was trying lots of series/parallel setups but running into issues if there were any async calls underneath the functions would "complete" straight away and the series would finish.
After much wailing and gnashing of teeth, I ended up with a "waiter" function using async.until that tests for some program state to be set by all the tasks finishing before launching the next function.
// wait for "test" to be true, execute "run",
// bail after "count" tries, waiting "sleep" ms between tries;
function waiter( test, run, count, sleep, message ) {
var i=0;
async.until(
function () {
if ( i > count ) { return true; }
logger.debug('waiting for',message, test() );
return test();
},
function (callback) {
i++;
setTimeout(callback, sleep );
},
function (err) {
if ( i > count ) {
logger.error('timeout for', message, count*sleep );
return;
}
run()
}
);
}
It struck me as being rather large and ugly and requiring a module to implement for something that I thought was standard, so I am wondering what's a better way. Am I still thinking in a non-async way? Is there something simple in Node I have overlooked? Is there a standard way of doing this?
I imagine with this setup, if the program get's complex there's going to be a lot of nesting functions to describe the flow of the program and I'm struggling to see a good way to lay it all out.
any tips would be appreciated.

You can't really make everything to be synchronous. Nodejs is designed to perform asynchronously (which may of course torment you at times). But there are a few ways techniques to make it work in a synchronous way (provided the pseudo-code is well-thought and code is designed carefully):
Using callbacks
Using events
Using promises
Callbacks and events are easy to use and understand. But with these, sometimes the code can get real messy and hard to debug.
But with promises, you can avoid all that. You can make dependency chains, called 'promises' (for instance, perform Promise B only when Promise A is complete).
Earlier versions of node.js had implementation of promises. They promised to do some work and then had separate callbacks that would be executed for success and failure as well as handling timeouts.
But in later versions, that was removed. By removing them from the core node.js, it created possibility of building up modules with different implementations of promises that can sit on top of the core. Some of these are node-promise, futures, and promises.
See these links for more info:
Framework
Promises and Futures
Deferred Promise - jQuery

Develop Reference

JavaScript is the programming language of the Web.