While debugging a set-up to control promise concurrency for some CPU intensive asynchronous tasks, I came across the following behavior that I can not understand.
My route to control the flow was to first create an array of Promises, and after that control the execution of these Promises by explicitly awaiting them. I produced the following snippet that shows the unexpected behavior:
import fs from 'node:fs';
import { setTimeout } from 'node:timers/promises';
async function innerAsync(n: number) {
return fs.promises.readdir('.').then(value => {
console.log(`Executed ${n}:`, value);
return value;
});
}
async function controllerAsync() {
const deferredPromises = new Array<Promise<string[]>>();
for (const n of [1, 2, 3]) {
const defer = innerAsync(n);
deferredPromises.push(defer);
}
const result = deferredPromises[0];
return result;
}
const result = await controllerAsync();
console.log('Resolved:', result);
await setTimeout(1000);
This snippet produces the following result (assuming 2 files are in .):
Executed 1: [ 'test.txt', 'test2.txt' ]
Resolved: [ 'test.txt', 'test2.txt' ]
Executed 2: [ 'test.txt', 'test2.txt' ]
Executed 3: [ 'test.txt', 'test2.txt' ]
Observed
The promises defined at line 15 are all being executed, regardless if they are returned/awaited (2 and 3).
Question
What causes the other, seemingly unused, promises to be executed? How can I make sure in this snippet only the first Promise is executed, where the others stay pending?
Node v19.2
Typescript v4.9.4
StackBlitz
What causes the other, seemingly unused, promises to be executed?
A promise is a tool used to watch something and provide an API to react to the something being complete.
Your innerAsync function calls fs.promises.readdir. Calling fs.promises.readdir makes something happen (reading a directory).
The then callback is called as a reaction when reading that directory is complete.
(It isn't clear why you marked innerAsync as async and then didn't use await inside it.)
If you don't use then or await, then that doesn't change the fact you have called fs.promises.readdir!
How can I make sure in this snippet only the first Promise is executed, where the others stay pending?
Focus on the function which does the thing, not the code which handles it being complete.
Or an analogy:
If you tell Alice to go to the shops and get milk, but don't tell Bob to put the milk in the fridge when Alice gets back, then you shouldn't be surprised that Alice has gone to the shops and come back with milk.
Promises aren't "executed" at all. A promise is a way to observe an asynchronous process that is already in progress. By the time you have a promise, the process it's reporting the result of is already underway.¹ It's the call to fs.readdir that starts the process, not calling then on a promise. (And even if it were, you are calling then on the promise from fs.readdir right away, in your innerAsync function.)
If you want to wait to start the operations, wait to call the methods starting the operations and giving you the promises. I'd show an edited version of your code, but it's not clear to me what it's supposed to do, particularly since controllerAsync only looks at the first element of the array and doesn't wait for anything to complete before returning.
A couple of other notes:
It's almost never useful to combine async/await with explicit calls to .then as in innerAsync. That function would be more clearly written as:
async function innerAsync(n: number) {
const value = await fs.promises.readdir(".");
console.log(`Executed ${n}:`, value);
return value;
}
From an order-of-operations perspective, that does exactly the same thing as the version with the explicit then.
There's no purpose served by declaring controllerAsync as an async function, since it never uses await. The only reason for making a function async is so you can use await within it.
¹ This is true in the normal case (the vast majority of cases), including the case of the promises you get from Node.js' fs/promises methods. There are (or were), unfortunately, a couple of libraries that had functions that returned promises but didn't start the actual work until the promise's then method was called. But those were niche outliers and arguably violate the semantics of promises.
Related
If you use promise's then() syntax, you can truly run codes concurrently.
See this example.
async function asyncFunc() {
return '2';
}
console.log('1')
asyncFunc().then(result => console.log(result))
console.log('3')
this prints 1 -> 3 -> 2 because asyncFunction() runs in "async" and rest of code ('3') runs without delay.
But if you use async/await syntax, doesn't it work like plain ordinary "sync" code?
async function asyncFunc() {
return '2';
}
console.log('1')
const result = await asyncFunc()
console.log(result)
console.log('3')
this would print 1->2->3 because it "defers" rest of the code to "await" asyncFunction()
So I'm curious about the purpose of async/await syntax. I guess it allows you to use async functions like sync function. With async/await, I can easily see how my code runs because it has the same flow as plain sync code. But then, why not just use sync code in the first place?
But I'm sure I'm missing something here right? What is the purpose of async/await syntax?
The idea of async/await is as follows:
It is a syntactic sugar for promises. It allows to use async code (like Web Api's - AJAX, DOM api's etc.) in synchronous order.
You ask "But then, why not just use sync code in the first place?" - because in your example you forced a sync code to be returned in a promise, basically you made it async, but it could run as sync in the first place, but there are Web APIs as mentioned above, that work ONLY ASYNC (meaning you have to wait for them to be executed), that's why you cannot turn them in sync "in the first place" :)
When JavaScript encounters the await expression, it pauses the async function execution and waits until the promise is fulfilled (the promise successfully resolved) or rejected (an error has occurred), then it continues the execution where it pauses in that function... The promise can be code from JS engine (as in your case) or it can be from another place like browser apis (truly async).
Is using async and await the crude person's threads?
Many moons ago I learned how to do multithreaded Java code on Android. I recall I had to create threads, start threads, etc.
Now I'm learning Javascript, and I just learned about async and await.
For example:
async function isThisLikeTwoThreads() {
const a = slowFunction();
const b = fastFunction();
console.log(await a, await b);
}
This looks way simpler than what I used to do and is a lot more intuitive.
slowFunction() would start first, then fastFunction() would start, and console.log() would wait until both functions resolved before logging - and slowFunction() and fastFunction() are potentially running at the same time. I expect it's ultimatly on the browser whether or not thesea are seperate threads. But it looks like it walks and talks like crude multithreading. Is it?
Is using async and await the crude person's threads?
No, not at all. It's just syntax sugar (really, really useful sugar) over using promises, which in turn is just a (really, really useful) formalized way to use callbacks. It's useful because you can wait asynchronously (without blocking the JavaScript main thread) for things that are, by nature, asynchronous (like HTTP requests).
If you need to use threads, use web workers, Node.js worker threads, or whatever multi-threading your environment provides. Per specification (nowadays), only a single thread at a time is allowed to work within a given JavaScript "realm" (very loosely: the global environment your code is running in and its associated objects, etc.) and so only a single thread at a time has access to the variables and such within that realm, but threads can cooperate via messaging (including transferring objects between them without making copies) and shared memory.
For example:
async function isThisLikeTwoThreads() {
const a = slowFunction();
const b = fastFunction();
console.log(await a, await b);
}
Here's what that code does when isThisLikeTwoThreads is called:
slowFunction is called synchronously and its return value is assigned to a.
fastFunction is called synchronously and its return value is assigned to b.
When isThisLikeTwoThreads reaches await a, it wraps a in a promise (as though you did Promise.resolve(a)) and returns a new promise (not that same one). Let's call the promise wrapped around a "aPromise" and the promise returned by the function "functionPromise".
Later, when aPromise settles, if it was rejected functionPromise is rejected with the same rejection reason and the following steps are skipped; if it was fulfilled, the next step is done
The code in isThisLikeTwoThreads continues by wrapping b in a promise (bPromise) and waiting for that to settle
When bPromise settles, if it was rejected functionPromise is rejected with the same rejection reason; if it was fulfilled, the code in isThisLikeTwoThreads continues by logging the fulfillment values of aPromise and bPromise and then fulfilling functionPromise with the value undefined
All of the work above was done on the JavaScript thread where the call to isThisLikeTwoThreads was done, but it was spread out across multiple "jobs" (JavaScript terminology; the HTML spec calls them "tasks" and specifies a fair bit of detail for how they're handled on browsers). If slowFunction or fastFunction started an asynchronous process and returned a promise for that, that asynchronous process (for instance, an HTTP call the browser does) may have continued in parallel with the JavaScript thread while the JavaScript thread was doing other stuff or (if it was also JavaScript code on the main thread) may have competed for other work on the JavaScript thread (competed by adding jobs to the job queue and the thread processing them in a loop).
But using promises doesn't add threading. :-)
I suggest you read this to understand that the answer is no: https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop
To summarize, the runtime uses multiple threads for internal operations (network, disk... for a Node.js environment, rendering, indexedDB, network... for a browser environment) but the JavaScript code you wrote and the one you imported from different libraries will always execute in a single thread. Asynchronous operations will trigger callbacks which will be queued and executed one by one.
Basically what happens when executing this function:
async function isThisLikeTwoThreads() {
const a = slowFunction();
const b = fastFunction();
console.log(await a, await b);
}
Execute slowFunction.
Execute fastFunction.
Enqueue the rest of the code (console.log(await a, await b)) when a promise and b promise have resolved. The console.log runs in the same thread after isThisLikeTwoThreads has returned, and after the possible enqueued callbacks have returned. Assuming slowFunction and fastFunction both return promises, this is an equivalent of:
function isThisLikeTwoThreads() {
const a = slowFunction();
const b = fastFunction();
a.then(aResult => b.then(bResult => console.log(aResult, bResult)));
}
Similar but not the same. Javascript will only ever be doing 'one' thing at a time. It is fundamentally single-threaded. That said, it can appear odd as results may appear to be arriving in different orders - exacerbated by functions such as async and await.
For example, even if you're rendering a background process at 70fps there are small gaps in rendering logic available to complete promises or receive event notifications -- it's during these moments promises are completed giving the illusion of multi-threading.
If you REALLY want to lock up a browser try this:
let a = 1;
while (a == 1){
console.log("doing a thing");
}
You will never get javascript to work again and chrome or whatever will murder your script. The reason is when it enters that loop - nothing will touch it again, no events will be rendered and no promises will be delivered - hence, single threading.
If this was a true multithreaded environment you could break that loop from the outside by changing the variable to a new value.
I have an async function that runs by a setInterval somewhere in my code. This function updates some cache in regular intervals.
I also have a different, synchronous function which needs to retrieve values - preferably from the cache, yet if it's a cache-miss, then from the data origins
(I realize making IO operations in a synchronous manner is ill-advised, but lets assume this is required in this case).
My problem is I'd like the synchronous function to be able to wait for a value from the async one, but it's not possible to use the await keyword inside a non-async function:
function syncFunc(key) {
if (!(key in cache)) {
await updateCacheForKey([key]);
}
}
async function updateCacheForKey(keys) {
// updates cache for given keys
...
}
Now, this can be easily circumvented by extracting the logic inside updateCacheForKey into a new synchronous function, and calling this new function from both existing functions.
My question is why absolutely prevent this use case in the first place? My only guess is that it has to do with "idiot-proofing", since in most cases, waiting on an async function from a synchronous one is wrong. But am I wrong to think it has its valid use cases at times?
(I think this is possible in C# as well by using Task.Wait, though I might be confusing things here).
My problem is I'd like the synchronous function to be able to wait for a value from the async one...
They can't, because:
JavaScript works on the basis of a "job queue" processed by a thread, where jobs have run-to-completion semantics, and
JavaScript doesn't really have asynchronous functions — even async functions are, under the covers, synchronous functions that return promises (details below)
The job queue (event loop) is conceptually quite simple: When something needs to be done (the initial execution of a script, an event handler callback, etc.), that work is put in the job queue. The thread servicing that job queue picks up the next pending job, runs it to completion, and then goes back for the next one. (It's more complicated than that, of course, but that's sufficient for our purposes.) So when a function gets called, it's called as part of the processing of a job, and jobs are always processed to completion before the next job can run.
Running to completion means that if the job called a function, that function has to return before the job is done. Jobs don't get suspended in the middle while the thread runs off to do something else. This makes code dramatically simpler to write correctly and reason about than if jobs could get suspended in the middle while something else happens. (Again it's more complicated than that, but again that's sufficient for our purposes here.)
So far so good. What's this about not really having asynchronous functions?!
Although we talk about "synchronous" vs. "asynchronous" functions, and even have an async keyword we can apply to functions, a function call is always synchronous in JavaScript. An async function is a function that synchronously returns a promise that the function's logic fulfills or rejects later, queuing callbacks the environment will call later.
Let's assume updateCacheForKey looks something like this:
async function updateCacheForKey(key) {
const value = await fetch(/*...*/);
cache[key] = value;
return value;
}
What that's really doing, under the covers, is (very roughly, not literally) this:
function updateCacheForKey(key) {
return fetch(/*...*/).then(result => {
const value = result;
cache[key] = value;
return value;
});
}
(I go into more detail on this in Chapter 9 of my recent book, JavaScript: The New Toys.)
It asks the browser to start the process of fetching the data, and registers a callback with it (via then) for the browser to call when the data comes back, and then it exits, returning the promise from then. The data isn't fetched yet, but updateCacheForKey is done. It has returned. It did its work synchronously.
Later, when the fetch completes, the browser queues a job to call that promise callback; when that job is picked up from the queue, the callback gets called, and its return value is used to resolve the promise then returned.
My question is why absolutely prevent this use case in the first place?
Let's see what that would look like:
The thread picks up a job and that job involves calling syncFunc, which calls updateCacheForKey. updateCacheForKey asks the browser to fetch the resource and returns its promise. Through the magic of this non-async await, we synchronously wait for that promise to be resolved, holding up the job.
At some point, the browser's network code finishes retrieving the resource and queues a job to call the promise callback we registered in updateCacheForKey.
Nothing happens, ever again. :-)
...because jobs have run-to-completion semantics, and the thread isn't allowed to pick up the next job until it completes the previous one. The thread isn't allowed to suspend the job that called syncFunc in the middle so it can go process the job that would resolve the promise.
That seems arbitrary, but again, the reason for it is that it makes it dramatically easier to write correct code and reason about what the code is doing.
But it does mean that a "synchronous" function can't wait for an "asynchronous" function to complete.
There's a lot of hand-waving of details and such above. If you want to get into the nitty-gritty of it, you can delve into the spec. Pack lots of provisions and warm clothes, you'll be some time. :-)
Jobs and Job Queues
Execution Contexts
Realms and Agents
You can call an async function from within a non-async function via an Immediately Invoked Function Expression (IIFE):
(async () => await updateCacheForKey([key]))();
And as applied to your example:
function syncFunc(key) {
if (!(key in cache)) {
(async () => await updateCacheForKey([key]))();
}
}
async function updateCacheForKey(keys) {
// updates cache for given keys
...
}
This shows how a function can be both sync and async, and how the Immediately Invoked Function Expression idiom is only immediate if the path through the function being called does synchronous things.
function test() {
console.log('Test before');
(async () => await print(0.3))();
console.log('Test between');
(async () => await print(0.7))();
console.log('Test after');
}
async function print(v) {
if(v<0.5)await sleep(5000);
else console.log('No sleep')
console.log(`Printing ${v}`);
}
function sleep(ms : number) {
return new Promise(resolve => setTimeout(resolve, ms));
}
test();
(Based off of Ayyappa's code in a comment to another answer.)
The console.log looks like this:
16:53:00.804 Test before
16:53:00.804 Test between
16:53:00.804 No sleep
16:53:00.805 Printing 0.7
16:53:00.805 Test after
16:53:05.805 Printing 0.3
If you change the 0.7 to 0.4 everything runs async:
17:05:14.185 Test before
17:05:14.186 Test between
17:05:14.186 Test after
17:05:19.186 Printing 0.3
17:05:19.187 Printing 0.4
And if you change both numbers to be over 0.5, everything runs sync, and no promises get created at all:
17:06:56.504 Test before
17:06:56.504 No sleep
17:06:56.505 Printing 0.6
17:06:56.505 Test between
17:06:56.505 No sleep
17:06:56.505 Printing 0.7
17:06:56.505 Test after
This does suggest an answer to the original question, though. You could have a function like this (disclaimer: untested nodeJS code):
const cache = {}
async getData(key, forceSync){
if(cache.hasOwnProperty(key))return cache[key] //Runs sync
if(forceSync){ //Runs sync
const value = fs.readFileSync(`${key}.txt`)
cache[key] = value
return value
}
//If we reach here, the code will run async
const value = await fsPromises.readFile(`${key}.txt`)
cache[key] = value
return value
}
Now, this can be easily circumvented by extracting the logic inside updateCacheForKey into a new synchronous function, and calling this new function from both existing functions.
T.J. Crowder explains the semantics of async functions in JavaScript perfectly. But in my opinion the paragraph above deserves more discussion. Depending on what updateCacheForKey does, it may not be possible to extract its logic into a synchronous function because, in JavaScript, some things can only be done asynchronously. For example there is no way to perform a network request and wait for its response synchronously. If updateCacheForKey relies on a server response, it can't be turned into a synchronous function.
It was true even before the advent of asynchronous functions and promises: XMLHttpRequest, for instance, gets a callback and calls it when the response is ready. There's no way of obtaining a response synchronously. Promises are just an abstraction layer on callbacks and asynchronous functions are just an abstraction layer on promises.
Now this could have been done differently. And it is in some environments:
In PHP, pretty much everything is synchronous. You send a request with curl and your script blocks until it gets a response.
Node.js has synchronous versions of its file system calls (readFileSync, writeFileSync etc.) which block until the operation completes.
Even plain old browser JavaScript has alert and friends (confirm, prompt) which block until the user dismisses the modal dialog.
This demonstrates that the designers of the JavaScript language could have opted for synchronous versions of XMLHttpRequest, fetch etc. Why didn't they?
[W]hy absolutely prevent this use case in the first place?
This is a design decision.
alert, for instance, prevents the user from interacting with the rest of the page because JavaScript is single threaded and the one and only thread of execution is blocked until the alert call completes. Therefore there's no way to execute event handlers, which means no way to become interactive. If there was a syncFetch function, it would block the user from doing anything until the network request completes, which can potentially take minutes, even hours or days.
This is clearly against the nature of the interactive environment we call the "web". alert was a mistake in retrospect and it should not be used except under very few circumstances.
The only alternative would be to allow multithreading in JavaScript which is notoriously difficult to write correct programs with. Are you having trouble wrapping your head around asynchronous functions? Try semaphores!
It is possible to add a good old .then() to the async function and it will work.
Should consider though instead of doing that, changing your current regular function to async one, and all the way up the call stack until returned promise is not needed, i.e. there's no work to be done with the value returned from async function. In which case it actually CAN be called from a synchronous one.
New to NodeJS. Going through the promise tutorial ('promise-it-wont-hurt') I have the following script:
var Q = require('q');
var deferred = Q.defer();
deffered.resolve('SECOND');
deffered.promise.then(console.log);
console.log('FIRST');
The output:
FIRST
SECOND
I don't get it, I would have thought that since resolved is fired first I should see second first.
They explain that this happens because of 'Promise fires on the same turn of the event loop'. I don't understand what that means...
Basically, the point is that the promise's then handler will not run before the current flow of code has finished executing and control is returned to the execution environment (in this case Node).
This is an important characteristic of Promises/A+ compliant promises because it ensures predictability. Regardless of whether the promise resolves immediately:
function getAPromise() {
var Q = require('q');
var deferred = Q.defer();
deferred.resolve('SECOND');
return deferred.promise;
}
getAPromise().then(console.log);
console.log('FIRST');
or whether it resolves after 10 seconds:
function getAPromise() {
var Q = require('q');
var deferred = Q.defer();
setTimeout(function () {
deferred.resolve('SECOND');
}, 10000);
return deferred.promise;
}
getAPromise().then(console.log);
console.log('FIRST');
you can be assured that FIRST will always be logged first.
This is discussed at length in chapters 2 and 3 of You don't know JS - async & performance. Asynchronous operations that sometimes run asynchronously and sometimes run synchronously are called Zalgos, and they are not considered to be a good thing.
It's important to note that the widely used promises in jQuery do not obey this behavior and have a number of other problems as well. If you find yourself with a jQuery promise, wrap it in a proper promise and proceed:
Q($.ajax(...)).then(breatheWithEase);
They explain that this happens because of 'Promise fires on the same turn of the event loop'. I don't understand what that means...
Me neither, imho this part doesn't make much sense. It probably should mean that when you call resolve or reject, the promise is settled immediately, and from then on will not change its state. This has nothing to do with the callbacks yet.
What is more important to understand is the next sentence in that paragraph:
You can expect that the functions passed to the "then" method of a promise will be called on the NEXT turn of the event loop.
It's like a simple rule of thumb: then callbacks are always called asynchronously. Always, in every proper promise implementation; this is mandated by the Promises/A+ specification.
Why is this? For consistency. Unlike in your example snippet, you don't know how or when a promise is resolved - you just get handed it back from some method you called. And it might have already been resolved, or it might be still pending. Now you call the promise's then method, and you can know: it will be asynchronous. You don't need to deal with cases that might be synchronous or not and change the meaning of your code, it simply is always asynchronous.
Recently I made a webscraper in nodejs using 'promise'. I created a Promise for each url I wanted to scrape and then used all method:
var fetchUrlArray=[];
for(...){
var mPromise = new Promise(function(resolve,reject){
(http.get(...))()
});
fetchUrlArray.push(mPromise);
}
Promise.all(fetchUrlArray).then(...)
There were thousands of urls but only a few of them got timed out. I got the impression that it was handling 5 promises in parallel at a time.
My question is how exactly does promise.all() work. Does it:
Call each promise one by one and switch to the next one till the previous one is resolved.
Or does in process the promises in a batch of a few from the array.
Or does it fire all promises
What is the best way to solve this problem in nodejs. Because as it stands I can solve this problem way faster in Java/C#
What you pass Promise.all() is an array of promises. It knows absolutely nothing about what is behind those promises. All it knows is that those promises will get resolved or rejected sometime in the future and it will create a new master promise that follows the sum of all the promises you passed it. This is one of the nice things about promises. They are an abstraction that lets you coordinate any type of action (usually asynchronous) without regard for what type of action it is. As such, promises have literally nothing to do with the actual action. All they do is monitor the completion or error of the action and report that back to those agents following the promise. Other code actually runs the action.
In your particular case, you are immediately calling http.get() in a tight loop and your code (nothing to do with promises) is launching a zillion http.get() operations at once. Those will get fired as fast as the underlying transport can do them (likely subject to connection limits).
If you want them to be launched serially or in batches of say 10 at a time, then you have to code it that way yourself. Promises have nothing to do with that.
You could use promises to help you code them to launch serially or in batches, but it would take extra of your code to do that either way to make that happen.
The Async library is specifically built for running things in parallel, but with a maximum number in flight at any given time because this is a common scheme where you either have connection limits on your end or you don't want to overwhelm the receiving server. You may be interested in the parallelLimit option which lets you run a number of async operations in parallel, but with a maximum number in flight at any given time.
I would do it like this
Personally, I'm not a big fan of Promises. I think the API is extremely verbose and the resulting code is very hard to read. The method defined below results in very flat code and it's much easier to immediately understand what's going on. At least imo.
Here's a little thing I created for an answer to this question
// void asyncForEach(Array arr, Function iterator, Function callback)
// * iterator(item, done) - done can be called with an err to shortcut to callback
// * callback(done) - done recieves error if an iterator sent one
function asyncForEach(arr, iterator, callback) {
// create a cloned queue of arr
var queue = arr.slice(0);
// create a recursive iterator
function next(err) {
// if there's an error, bubble to callback
if (err) return callback(err);
// if the queue is empty, call the callback with no error
if (queue.length === 0) return callback(null);
// call the callback with our task
// we pass `next` here so the task can let us know when to move on to the next task
iterator(queue.shift(), next);
}
// start the loop;
next();
}
You can use it like this
var urls = [
"http://example.com/cat",
"http://example.com/hat",
"http://example.com/wat"
];
function eachUrl(url, done){
http.get(url, function(res) {
// do something with res
done();
}).on("error", function(err) {
done(err);
});
}
function urlsDone(err) {
if (err) throw err;
console.log("done getting all urls");
}
asyncForEach(urls, eachUrl, urlsDone);
Benefits of this
no external dependencies or beta apis
reusable on any array you want to perform async tasks on
non-blocking, just as you've come to expect with node
could be easily adapted for parallel processing
by writing your own utility, you better understand how this kind of thing works
If you just want to grab a module to help you, look into async and the async.eachSeries method.
First, a clarification: A promise does represent the future result of a computation, nothing else. It does not represent the task or computation itself, which means it cannot be "called" or "fired".
Your script does create all those thousands of promises immediately, and each of those creations does call http.get immediately. I would suspect that the http library (or something it depends on) has a connection pool with a limit of how many requests to make in parallel, and defers the rest implicitly.
Promise.all does not do any "processing" - it's not responsible for starting the tasks and resolving the passed promises. It only listens to them and checks whether they all are ready, and returns a promise for that eventual result.