So I started a little project in Node.js to learn a bit about it. It's a simple caching proxy for arch linux's package system as node provides most of the heavy lifting.
This has two "main" phases, server setup and serving.
Then serving has two main phases, response setup and response.
The "main" setup involves checking some files, loading some config from files. loading some json from a web address. Then launching the http server and proxy instance with this info.
setup logger/options - read config - read mirrors - read webmirror
start serving
Serving involves checking the request to see if the file exists, creating directories if needed, then providing a response.
check request - check dir - check file
proxy request or serve file
I keep referring to them as synchronisation points but searches don't lead to many results. Points where a set of async tasks have to be finished before the process can complete a next step. Perl's AnyEvent has conditional variables which I guess is what I'm trying to do, without the blocking.
To start with I found I was "cheating" and using the synchronous versions of any functions where provided but that had to stop with the web requests, so I started restructuring things. Immediately most search's led to using async or step to control the flow. To start with I was trying lots of series/parallel setups but running into issues if there were any async calls underneath the functions would "complete" straight away and the series would finish.
After much wailing and gnashing of teeth, I ended up with a "waiter" function using async.until that tests for some program state to be set by all the tasks finishing before launching the next function.
// wait for "test" to be true, execute "run",
// bail after "count" tries, waiting "sleep" ms between tries;
function waiter( test, run, count, sleep, message ) {
var i=0;
async.until(
function () {
if ( i > count ) { return true; }
logger.debug('waiting for',message, test() );
return test();
},
function (callback) {
i++;
setTimeout(callback, sleep );
},
function (err) {
if ( i > count ) {
logger.error('timeout for', message, count*sleep );
return;
}
run()
}
);
}
It struck me as being rather large and ugly and requiring a module to implement for something that I thought was standard, so I am wondering what's a better way. Am I still thinking in a non-async way? Is there something simple in Node I have overlooked? Is there a standard way of doing this?
I imagine with this setup, if the program get's complex there's going to be a lot of nesting functions to describe the flow of the program and I'm struggling to see a good way to lay it all out.
any tips would be appreciated.
You can't really make everything to be synchronous. Nodejs is designed to perform asynchronously (which may of course torment you at times). But there are a few ways techniques to make it work in a synchronous way (provided the pseudo-code is well-thought and code is designed carefully):
Using callbacks
Using events
Using promises
Callbacks and events are easy to use and understand. But with these, sometimes the code can get real messy and hard to debug.
But with promises, you can avoid all that. You can make dependency chains, called 'promises' (for instance, perform Promise B only when Promise A is complete).
Earlier versions of node.js had implementation of promises. They promised to do some work and then had separate callbacks that would be executed for success and failure as well as handling timeouts.
But in later versions, that was removed. By removing them from the core node.js, it created possibility of building up modules with different implementations of promises that can sit on top of the core. Some of these are node-promise, futures, and promises.
See these links for more info:
Framework
Promises and Futures
Deferred Promise - jQuery
Related
This is a question about performance more than anything else.
Node exposes three different types of methods to accomplish various filesystem tasks:
Promises API (async)
Callback API (async)
Synchronous API (sync)
I've read more articles and stackoverflow answers than I can count, all of which claiming to never need the sync methods.
I recently wrote a script which required a couple directories to be made if they didn't already exist. During this, I noticed that if I used the async/await methods (primarily fs.promises.mkdir and fs.promises.access), the event loop would simply continue to the next async bit of code, regardless of the fact that the next bits require those directories. This is expected behavior, after all, it's async.
I understand this could be solved with a nice little callback hell sesh, but that isn't the question, whereas the idea that the promises api can be used over all other methods is.
The question then becomes:
Is it ever better to use Node's filesystem sync methods over the same async methods?
Is it ever truly required in situations like this to block the process?
Or said differently:
Is it possible to completely avoid sync methods and ONLY use the promises api (NOT promises + callbacks)?
It seems like using the sync methods (given my situation above, where the directories are required to be there before any other call is made) can be EXTREMELY useful to write readable, clear code, even though it may negatively impact performance.
With that being said, there's an overwhelming level of information to say that the sync api is completely useless and never required.
Again, this purely caters to the promises api. Yes, callbacks and promises are both async, but the difference between the job and message queues makes the both api's completely different in this context.
PS: For additonal context on examples, I've provided a code sample so you don't have to imagine my example ;)
Thanks! :)
// Checks if dir exists, if not, creates it. (not the actual code, just an example)
// Sync version
if (!fs.existsSync(dirPath)) {
fs.mkdirSync(dirPath);
}
// Async version
try {
await fs.promises.access(dirPath);
} catch {
await fs.promises.mkdir(dirPath);
}
It depends on the situation. The main benefit of the sync methods is that they allow for easier consumption of their results, and the main disadvantage is that they prevent all other code from executing while working.
If you find yourself in a situation where other code not being able to respond to events is not an issue, you might consider it to be reasonable to use the sync methods - if the code in question has no chance of or reason for running in parallel with anything else.
For example, you would definitely not want to use the sync methods inside, say, a server handling a request.
If your code requires reading some configuration files (or creating some folders) when the script first runs, and there aren't enough of them such that parallelism would be a benefit, you can consider using the sync methods.
That said, even if your current implementation doesn't require parallelism, something to keep in mind is that, if the situation changes and you find that you do actually need to allow for parallel processing, you won't have to make any changes to your existing code if you had started out by using the promise-based methods in the first place - and if you understand the language, using the Promises properly should be pretty easy, so if there's a chance of that, you might consider using the Promises anyway.
How do the NodeJS built in functions achieve their asynchronicity?
Am I able to write my own custom asynchronous functions that execute outside of the main thread? Or do I have to leverage the built in functions?
Just a side note, true asynchronous doesn't really mean anything. But we can assume you mean parallelism?.
Now depending on what your doing, you might find there is little to no benefit in using threads in node. Take for example: nodes file system, as long as you don't use the sync versions, it's going to automatically run multiple requests in parallel, because node is just going to pass these requests to worker threads.
It's the reason when people say Node is single threaded, it's actually incorrect, it's just the JS engine that is. You can even prove this by looking at the number of threads a nodeJs process takes using your process monitor of choice.
So then you might ask, so why do we have worker threads in node?. Well the V8 JS engine that node uses is pretty fast these days, so lets say you wanted to calculate PI to a million digits using JS, you could do this in the main thread without blocking. But it would be a shame not to use those extra CPU cores that modern PC's have and keep the main thread doing other things while PI is been calculated inside another thread.
So what about File IO in node, would this benefit been in a worker thread?.. Well this depends on what you do with the result of the file-io, if you was just reading and then writing blocks of data, then no there would be no benefit, but if say you was reading a file and then doing some heavy calculations on these files with Javascript (eg. some custom image compression etc), then again a worker thread would help.
So in a nutshell, worker threads are great when you need to use Javascript for some heavy calculations, using them for just simple IO may in fact slow things down, due to IPC overheads.
You don't mention in your question what your trying to run in parallel, so it's hard to say if doing so would be of benefit.
Javascript is mono-thread, if you want to create 'thread' you can use https://nodejs.org/api/worker_threads.html.
But you may have heard about async function and promises in javascript, async function return a promise by default and promise are NOT thread. You can create async function like this :
async function toto() {
return 0;
}
toto().then((d) => console.log(d));
console.log('hello');
Here you will display hello then 0
but remember that even the .then() will be executed after it's a promise so that not running in parallel, it will just be executed later.
I have a question regarding this topic:
bcrypt.compare() is asynchronous, does that necessarily mean that delays are certain to happen?
Since I'm not allowed to put comments because of my membership level I had to open new topic.
My question is what are the downsides or is there any for using bcrypt.compareSync() instead of the async version of bcrypt.compare().
compareSync() definitely gives the correct result. So why not use it and use the compare() wrapped in Promises? Is it going to halt the nodeJS from serving other users?
The reason to use the async methods instead of the sync ones are explained in the readme of the project quite well.
Why is async mode recommended over sync mode?
If you are using bcrypt on a simple script, using the sync mode is perfectly fine. However, if you are using bcrypt on a server, the async mode is recommended. This is because the hashing done by bcrypt is CPU intensive, so the sync version will block the event loop and prevent your application from servicing any other inbound requests or events. The async version uses a thread pool which does not block the main event loop.
https://github.com/kelektiv/node.bcrypt.js#why-is-async-mode-recommended-over-sync-mode
So if you are using this in a webapplication or other environment where you don't want to block the main thread you should use the async version.
Node.js native methods have Sync attached methods like fs.writeFileSync, crypto.hkdfSync, child_process.execSync. JavaScript in the browser is implemented asynchronously with all native functions that require thread blocking, but Sync methods in Node.js actually block threads until the task is complete.
When using Callback or Promise in Node.js, if only asynchronous logic is executed internally, it becomes possible to manage asynchronous tasks while proceeding with other tasks without stopping the main thread (using count for Callbak, Promise.all).
Sync method runs the next line after work, so it is easy to identify the order of execution and easy to code. However, the main thread is blocked, so you can't do more than one task at a time.
Think about the next example.
const syncFunc = () => {
for (let i = 0; i < 100; i++) fs.readFileSync(`/files/${i}.txt`);
console.log('sync done');
};
const promiseFunc = async () => {
await Promise.all(Array.from({length: 100}, (_,i) => fs.promises.readFile(`/files/${i}.txt`)));
console.log('promise done');
};
The promise function ends much faster when there is no problem reading all 100 txt files.
This Sync feature applies equally to libraries made of C language. If you look at the following code, you can see the difference in implementation in C++.
compare
compareSync
In conclusion, I think it's a matter of choice. There is no problem using Sync method if the code you make is logic that goes on a single thread that doesn't matter if the main thread is blocked(like simple macro). However, if you are making logic where performance issues such as servers are important and the main thread should not stop as much as possible for thread or asynchronous management, you can choose Promise or Callback.
It's a very general question, but I don't quite understand. When would I prefer one over the other? I don't seem to understand what situations might arise, which would clearly favour one over the other. Are there strong reasons to avoid x / use x?
When would I prefer one over the other?
In a server intended to scale and serve the needs of many users, you would only use synchronous I/O during server initialization. In fact, require() itself uses synchronous I/O. In all other parts of your server that handle incoming requests once the server is already up and running, you would only use asynchronous I/O.
There are other uses for node.js besides creating a server. For example, suppose you want to create a script that will parse through a giant file and look for certain words or phrases. And, this script is designed to run by itself to process one file and it has no persistent server functionality and it has no particular reason to do I/O from multiple sources at once. In that case, it's perfectly OK to use synchronous I/O. For example, I created a node.js script that helps me age backup files (removing backup files that meet some particular age criteria) and my computer automatically runs that script once a day. There was no reason to use asynchronous I/O for that type of use so I used synchronous I/O and it made the code simpler to write.
I don't seem to understand what situations might arise, which would clearly favour one over the other. Are there strong reasons to avoid x / use x?
Avoid ever using synchronous I/O in the request handlers of a server. Because of the single threaded nature of Javascript in node.js, using synchronous I/O blocks the node.js Javascript thread so it can only do one thing at a time (which is death for a multi-user server) whereas asynchronous I/O does not block the node.js Javascript thread (allowing it to potentially serve the needs of many users).
In non-multi-user situations (code that is only doing one thing for one user), synchronous I/O may be favored because writing the code is easier and there may be no advantages to using asynchronous I/O.
I thought of an electron application with nodejs, which is simply reading a file and did not understand what difference that would make really, if my software really just has to wait for that file to load anyways.
If this is a single user application and there's nothing else for your application to be doing while waiting for the file to be read into memory (no sockets to be responding to, no screen updates, no other requests to be working on, no other file operations to be running in parallel), then there is no advantage to using asynchronous I/O so synchronous I/O will be just fine and likely a bit simpler to code.
When would I prefer one over the other?
Use the non-Sync versions (the async ones) unless there's literally nothing else you need your program to do while the I/O is pending, in which case the Sync ones are fine; see below for details...
Are there strong reasons to avoid x / use x?
Yes. NodeJS runs your JavaScript code on a single thread. If you use the Sync version of an I/O function, that thread is blocked waiting on I/O and can't do anything else. If you use the async version, the I/O can continue in the background while the JavaScript thread gets on with other work; the I/O completion will be queued as a job for the JavaScript thread to come back to later.
If you're running a foreground Node app that doesn't need to do anything else while the I/O is pending, you're probably fine using Sync calls. But if you're using Node for processing multiple things at once (like web requests), best to use the async versions.
In a comment you added under the question you've said:
I thought of an electron application with nodejs, which is simply reading a file and did not understand what difference that would make really, if my software really just has to wait for that file to load anyways.
I have virtually no knowledge of Electron, but I note that it uses a "main" process to manage windows and then a "rendering" process per window (link). That being the case, using Sync functions will block the relevant process, which may affect application or window responsiveness. But I don't have any deep knowledge of Electron (more's the pity).
Until somewhat recently, using async functions meant using lots of callback-heavy code which was hard to compose:
// (Obviously this is just an example, you wouldn't actually read and write a file this way, you'd use streaming...)
fs.readFile("file1.txt", function(err, data) {
if (err) {
// Do something about the error...
} else {
fs.writeFile("file2.txt", data, function(err) {
if (err) {
// Do something about the error...
} else {
// All good
});
}
});
Then promises came along and if you used a promisified* version of the operation (shown here with pseudonyms like fs.promisifiedXYZ), it still involved callbacks, but they were more composable:
// (See earlier caveat, just an example)
fs.promisifiedReadFile("file1.txt")
.then(function(data) {
return fs.promisifiedWriteFile("file2.txt", data);
})
.then(function() {
// All good
})
.catch(function(err) {
// Do something about the error...
});
Now, in recent versions of Node, you can use the ES2017+ async/await syntax to write synchronous-looking code that is, in fact, asynchronous:
// (See earlier caveat, just an example)
(async () => {
try {
const data = await fs.promisifiedReadFile("file1.txt");
fs.promisifiedWriteFile("file2.txt", data);
// All good
} catch (err) {
// Do something about the error...
}
})();
Node's API predates promises and has its own conventions. There are various libraries out there to help you "promisify" a Node-style callback API so that it uses promises instead. One is promisify but there are others.
I am attempting to create a library to make API calls to a web application (jira, if you care to know) I have my api calls working no problem, but I am looking to make the code a bit more readable and use-able. I have tried searching for my needs, but it turns out I am not sure what I need to be searching for.
I am having an issue with Asynchronous calls that depend on each other, I understand that I have to wait until the callback is ran to run my next item, but I am not sure of the best way to design this.
I really would like to make Chaining a feature of my api, which I would hope to look like this:
createProject(jsonProjectStuff)
.setLeadUser("myusername")
.createBoard("boardName")
.setBoardPermissions(boardPermissionJSONvar)
.addRole(personRoleJSONvar);
with this example, everything would have to wait on the createProject as it will return the project. createBoard doesn't rely on the project normally, but used in this context it should be "assigned" to the project made, setting the board permissions only relies on the createBoard to work. addRole is specific to the project again.
the questions I have are:
Is this possible to switch context like this and keep data in-between them without the need to run the function from the response hard coded?
If this is possible, is this a good idea? If not I am open to other schemes.
I can think of a couple ways to make it work, including registering the function calls with a dependency tree and then fulfilling promises as we go, although that is mostly conceptual for me at this point as I am trying to decide the best.
Edit 2/19/2016
So I have looked into this more and I have decided on a selective "then" only when it creating a new item doesn't relate directly to the parent.
//Numbers are ID, string is Name
copyProject(IDorName)
.setRoles(JSONItem)
.setOwner("Project.Owner")
.setDefaultEmail("noreply#fake.com")
.then(
copyBoard(IDorName)
.setName("Blah blah Name {project.key}"),
saveFilterAs(IDorName, "Board {project.key}",
"project = {project.key} ORDER BY Rank ASC")
.setFilterPermissions({shareValuesJSON})
)
I like this solution a lot, the only thing I am unsure of how to do is the string "variables", I suppose it could be "Blah blah Name " + this.project.key
either way I am unsure of how to give copyBoard or saveFilterAs access to it via the "then" function.
Any thoughts?
I've been using Nightmare (a headless browser) lately.
It has a fluent API that uses a nice design pattern.
Calling the API doesn't directly execute the actions, it only queues them and when you are ready to execute you must call the end function which returns a promise. The promise is resolved when the queue has completed its async execution.
For example, in your situation
createProject(jsonProjectStuff)
.setLeadUser("myusername")
.createBoard("boardName")
.setBoardPermissions(boardPermissionJSONvar)
.addRole(personRoleJSONvar)
.end() // Execute the queue of operations.
.then() => {
// All operations completed.
))
.catch(err => {
// An error occurred.
});
I feel like this pattern is quite elegant. It allows you to have a fluent API to build a sequence of actions. Then when you are ready to execute said operations you call end (or whatever). The sequence of operations are then completed asynchronously and you use the promise to handle completion and errors.