How to convert a dynamic Set<Promise<T>> into AsyncIterable<T> (unordered)?
The resulting iterable must produce values as they get resolved, and it must end just as the source runs empty.
I have a dynamic cache of promises to be resolved, and values reported, disregarding the order.
NOTE: The source is dynamic, which means it can receive new Promise<T> elements while we progress through the resulting iterator.
UPDATE
After going through all the suggestions, I was able to implement my operator. And here're the official docs.
I'm adding a bounty to reward anyone who can improve it further, though at this point a PR is preferable (it is for a public library), or at least something that fits the same protocol.
Judging from your library implementation, you actually want to transform an AsyncIterable<Promise<T>> into an AsyncIterator<T> by racing up to N of the produced promises concurrently. I would implement that as follows:
async function* limitConcurrent<T>(iterable: AsyncIterable<Promise<T>>, n: number): AsyncIterator<T> {
const pool = new Set();
for await (const p of iterable) {
const promise = Promise.resolve(p).finally(() => {
pool.delete(promise); // FIXME see below
});
promise.catch(() => { /* ignore */ }); // mark rejections as handled
pool.add(promise);
if (pool.size >= n) {
yield /* await */ Promise.race(pool);
}
}
while (pool.size) {
yield /* await */ Promise.race(pool);
}
}
Notice that if one of the promises in the pool rejects, the returned iterator will end with the error and the results of the other promises that are currently in the pool will be ignored.
However, above implementation presumes that the iterable is relatively fast, as it will need to produce n promises before the pool is raced for the first time. If it yields the promises slower than the promises take to resolve, the results are held up unnecessarily.
And worse, the above implementation may loose values. If the returned iterator is not consumed fast enough, or the iterable is not yielding fast enough, multiple promise handlers may delete their respective promise from the pool during one iteration of the loop, and the Promise.race will consider only one of them.
So this would work for a synchronous iterable, but if you actually have an asynchronous iterable, you would need a different solution. Essentially you got a consumer and a producer that are more or less independent, and what you need is some queue between them.
Yet with a single queue it still wouldn't handle backpressure, the producer just runs as fast as it can (given the iteration of promises and the concurrency limit) while filling the queue. What you really need then is a channel that allows synchronisation in both directions, e.g. using two queues:
class AsyncQueue<T> {
resolvers: null | ((res: IteratorResult<T> | Promise<never>) => void)[];
promises: Promise<IteratorResult<T>>[];
constructor() {
// invariant: at least one of the arrays is empty.
// when `resolvers` is `null`, the queue has ended.
this.resolvers = [];
this.promises = [];
}
putNext(result: IteratorResult<T> | Promise<never>): void {
if (!this.resolvers) throw new Error('Queue already ended');
if (this.resolvers.length) this.resolvers.shift()(result);
else this.promises.push(Promise.resolve(result));
}
put(value: T): void {
this.putNext({done: false, value});
}
end(): void {
for (const res of this.resolvers) res({done: true, value: undefined});
this.resolvers = null;
}
next(): Promise<IteratorResult<T>> {
if (this.promises.length) return this.promises.shift();
else if (this.resolvers) return new Promise(resolve => { this.resolvers.push(resolve); });
else return Promise.resolve({done: true, value: undefined});
}
[Symbol.asyncIterator](): AsyncIterator<T> {
// Todo: Use AsyncIterator.from()
return this;
}
}
function limitConcurrent<T>(iterable: AsyncIterable<Promise<T>>, n: number): AsyncIterator<T> {
const produced = new AsyncQueue<T>();
const consumed = new AsyncQueue<void>();
(async () => {
try {
let count = 0;
for await (const p of iterable) {
const promise = Promise.resolve(p);
promise.then(value => {
produced.put(value);
}, _err => {
produced.putNext(promise); // with rejection already marked as handled
});
if (++count >= n) {
await consumed.next(); // happens after any produced.put[Next]()
count--;
}
}
while (count) {
await consumed.next(); // happens after any produced.put[Next]()
count--;
}
} catch(e) {
// ignore `iterable` errors?
} finally {
produced.end();
}
})();
return (async function*() {
for await (const value of produced) {
yield value;
consumed.put();
}
}());
}
function createCache() {
const resolve = [];
const sortedPromises = [];
const noop = () => void 0;
return {
get length() {
return sortedPromises.length
},
add(promiseOrValue) {
const q = new Promise(r => {
resolve.push(r);
const _ = () => {
resolve.shift()(promiseOrValue);
}
Promise.resolve(promiseOrValue).then(_, _);
});
q.catch(noop); // prevent q from throwing when rejected.
sortedPromises.push(q);
},
next() {
return sortedPromises.length ?
{ value: sortedPromises.shift() } :
{ done: true };
},
[Symbol.iterator]() {
return this;
}
}
}
(async() => {
const sleep = (ms, value) => new Promise(resolve => setTimeout(resolve, ms, value));
const cache = createCache();
const start = Date.now();
function addItem() {
const t = Math.floor(Math.random() ** 2 * 8000), // when to resolve
val = t + Date.now() - start; // ensure that the resolved value is in ASC order.
console.log("add", val);
cache.add(sleep(t, val));
}
// add a few initial items
Array(5).fill().forEach(addItem);
// check error handling with a rejecting promise.
cache.add(sleep(1500).then(() => Promise.reject("a rejected Promise")));
while (cache.length) {
try {
for await (let v of cache) {
console.log("yield", v);
if (v < 15000 && Math.random() < .5) {
addItem();
}
// slow down iteration, like if you'd await some API-call.
// promises now resolve faster than we pull them.
await sleep(1000);
}
} catch (err) {
console.log("error:", err);
}
}
console.log("done");
})()
.as-console-wrapper{top:0;max-height:100%!important}
works with both for(const promise of cache){ ... } and for await(const value of cache){ ... }
Error-handling:
for(const promise of cache){
try {
const value = await promise;
}catch(error){ ... }
}
// or
while(cache.length){
try {
for await(const value of cache){
...
}
}catch(error){ ... }
}
rejected Promises (in the cache) don't throw until you .then() or await them.
Also handles backpressure (when your loop is iterating slower than the promises resolve)
for await(const value of cache){
await somethingSlow(value);
}
Related
I am reading Node.js Design Patterns and am trying to understand the following example of a producer-consumer pattern in implementing limited parallel execution (my questions in comments):
export class TaskQueuePC extends EventEmitter {
constructor(concurrency) {
super();
this.taskQueue = [];
this.consumerQueue = [];
for (let i = 0; i < concurrency; i++) {
this.consumer();
}
}
async consumer() {
while (true) {
try {
const task = await this.getNextTask();
await task();
} catch (err) {
console.error(err);
}
}
}
async getNextTask() {
return new Promise((resolve) => {
if (this.taskQueue.length !== 0) {
return resolve(this.taskQueue.shift());
}
this.consumerQueue.push(resolve);
});
}
runTask(task) {
// why are we returning a promise here?
return new Promise((resolve, reject) => {
// why are we wrapping our task here?
const taskWrapper = () => {
const taskPromise = task();
taskPromise.then(resolve, reject);
return taskPromise;
};
if (this.consumerQueue.length !== 0) {
const consumer = this.consumerQueue.shift();
consumer(taskWrapper);
} else {
this.taskQueue.push(taskWrapper);
}
});
}
}
In the constructor, we create queues for both tasks and consumers, and then execute the consumer method up to the concurrency limit.
This pauses each consumer on const task = await getNextTask() which returns a pending promise.
Because there are no tasks yet in our task queue, the resolver for the promise is pushed to the consumer queue.
When a task is added with runTask, the consumer (the pending promise's resolver) is plucked off the queue and called with the task. This returns execution to the consumer method, which will run the task(), eventually looping again to await another task or sit in the queue.
What I cannot grok is the purpose of the Promise and taskWrapper in the runTask method. It seems we would have the same behavior if both the Promise and taskWrapper were omitted:
runTask(task) {
if (this.consumerQueue.length !== 0) {
const consumer = this.consumerQueue.shift();
consumer(task);
} else {
this.taskQueue.push(task);
}
}
In fact, when I execute this version I get the same results. Am I missing something?
It seems I'm unable to use an async function as the first argument to Array.find(). I can't see why this code would not work what is happening under the hood?
function returnsPromise() {
return new Promise(resolve => resolve("done"));
}
async function findThing() {
const promiseReturn = await returnsPromise();
return promiseReturn;
}
async function run() {
const arr = [1, 2];
const found = await arr.find(async thing => {
const ret = await findThing();
console.log("runs once", thing);
return false;
});
console.log("doesn't wait");
}
run();
https://codesandbox.io/s/zk8ny3ol03
Simply put, find does not expect a promise to be returned, because it is not intended for asynchronous things. It loops through the array until one of the elements results in a truthy value being returned. An object, including a promise object, is truthy, and so the find stops on the first element.
If you want an asynchronous equivalent of find, you'll need to write it yourself. One consideration you'll want to have is whether you want to run things in parallel, or if you want to run them sequentially, blocking before you move on to the next index.
For example, here's a version that runs them all in parallel, and then once the promises are all resolved, it finds the first that yielded a truthy value.
async function findAsync(arr, asyncCallback) {
const promises = arr.map(asyncCallback);
const results = await Promise.all(promises);
const index = results.findIndex(result => result);
return arr[index];
}
//... to be used like:
findAsync(arr, async (thing) => {
const ret = await findThing();
return false;
})
Here is a TypeScript version that runs sequentially:
async function findAsyncSequential<T>(
array: T[],
predicate: (t: T) => Promise<boolean>,
): Promise<T | undefined> {
for (const t of array) {
if (await predicate(t)) {
return t;
}
}
return undefined;
}
It might help you to note that Array.prototype.filter is synchronous so it doesn't support async behaviour. I think that the same applies to the "find" property. You can always define your own async property :) Hope this helps!
The other answers provide two solutions to the problem:
Running the promises in parallel and awaiting all before returning the index
Running the promises sequencially, so awaiting every single promise before moving on to the next one.
Imagine you have five promises that finish at different times: The first after one second, the second after two seconds, etc... and the fifth after five seconds.
If I'm looking for the one that finished after three seconds:
The first solution will wait 5 seconds, until all promises are resolved. Then it looks for the one that matches.
The second one will evaluate the first three matches (1 + 2 + 3 = 6 seconds), before
returning.
Here's a third option that should usually be faster: Running the promises in parallel, but only waiting until the first match ("racing them"): 3 seconds.
function asyncFind(array, findFunction) {
return new Promise(resolve => {
let i = 0;
array.forEach(async item => {
if (await findFunction(await item)) {
resolve(item);
return;
}
i++;
if (array.length == i) {
resolve(undefined);
}
});
});
}
//can be used either when the array contains promises
var arr = [asyncFunction(), asyncFunction2()];
await asyncFind(arr, item => item == 3);
//or when the find function is async (or both)
var arr = [1, 2, 3];
await asyncFind(arr, async item => {
return await doSomething(item);
}
When looking for a non-existant item, solutions 1 and 3 will take the same amount of time (until all promises are evaluated, here 5 seconds). The sequencial approach (solution 2) would take 1+2+3+4+5 = 15 seconds.
Demo: https://jsfiddle.net/Bjoeni/w4ayh0bp
I came up with a solution that doesn't appear to be covered here, so I figured I'd share. I had the following requirements.
Accepts an async function (or a function that returns a promise)
Supplies item, index, and array to the function
Returns the item with the fastest resolved promise (all are evaluated in parallel)
Does not exit early if the async function rejects (throws an error)
Supports generic types (in TypeScript)
With those requirements, I came up with the following solution using Promise.any. It will resolve with the first fulfilled value or reject with an AggregateError if none of the promises are fulfilled.
type Predicate<T> = (item: T, index: number, items: T[]) => Promise<boolean>
export const any = async <T>(array: T[], predicate: Predicate<T>): Promise<T> => {
return Promise.any(
array.map(async (item, index, items) => {
if (await predicate(item, index, items)) {
return item
}
throw new Error()
})
)
}
Now we can search an array in parallel with async functions and return the fastest resolved result.
const things = [{ id: 0, ... }, { id: 1, ... }]
const found = await any(things, async (thing) => {
const otherThing = await getOtherThing()
return thing.id === otherThing.id
})
Enumerate the array sequentially
If we want to enumerate the array in sequence, as Array.find does, then we can do that with some modifications.
export const first = async <T>(array: T[], predicate: Predicate<T>): Promise<T> => {
for (const [index, item] of array.entries()) {
try {
if (await predicate(item, index, array)) {
return item
}
} catch {
// If we encounter an error, keep searching.
}
}
// If we do not find any matches, "reject" by raising an error.
throw new Error()
}
Now, we can search an array in sequence and return the first resolved result.
const things = [{ id: 0, ... }, { id: 1, ... }]
const found = await first(things, async (thing) => {
const otherThing = await getOtherThing()
return thing.id === otherThing.id
})
I would like to create an RxJS Observable from an iterable like the following:
const networkIterableFactory = (resource: string) => {
let i = 0;
return {
[Symbol.iterator]() {
return {
next() {
return {
done: false,
value: fetch(resource, {
mode: 'cors',
}).then(async response => {
console.log('i = ', i);
await throttle(10000); // Do some stuff
i++;
return {i: 'i'};
}),
};
},
};
},
};
};
function throttle(ms: number) {
return new Promise(resolve => setTimeout(resolve, ms));
}
let networkIterable = networkIterableFactory('google.com');
let network$ = rxjs.from(networkIterable).pipe(rxjs.operators.take(5));
network$.subscribe(() => console.log('yo!'));
Issue is that i prints 5 times as 0. It seems as though the way that the iterable's iterator saves its state is through updating the outer closure. rxjs.from just takes the whole iterable as one emmission so a bunch of unresolved promises are returned, but I need the iterator state to be altered by logic within the promise callback. Is there a way to make the observable wait until the promise resolves before emitting the next item from the iterator? I would rather avoid using asyncIterable because I don't want to bring in IxRx.
Since the values of your iterable are returned asynchronously, then you should implement Symbol.asyncIterator instead of Symbol.iterator Try this instead:
const networkIterableFactory = (resource: string) => {
let i = 0;
return {
[Symbol.asyncIterator]() {
return {
next() {
return fetch(resource, { mode: 'cors' }).then(x => ({ done: false, value: x }));
},
};
},
};
};
function throttle(ms: number) {
return new Promise(resolve => setTimeout(resolve, ms));
}
let networkIterable = networkIterableFactory('google.com');
let network$ = rxjs.from(networkIterable).pipe(rxjs.operators.take(5));
network$.subscribe(() => console.log('yo!'));
Edit:
RxJS actually doesn't support Async iterators yet: https://github.com/ReactiveX/rxjs/issues/1624.
I also tried with an AsyncGenerator:
const { from } = require('rxjs');
async function* test() {};
const asyncGenerator = test();
from(asyncGenerator);
But it throws:
TypeError: You provided an invalid object where a stream was expected.
You can provide an Observable, Promise, Array, or Iterable.
So you won't be able to make it with this pattern, I actually believe that RxJS is not suited for "pulling" data like you do with take (If this pattern worked, it would end up in an infinite requests loop, even if you only take 5 results). It is rather designed to "push" things to whoever listens.
Well, I am lost in await and async hell. The code below is supposed to loop through a list of files, check if they exist and return back the ones that do exist. But I am getting a zero length list.
Node V8 code: caller:
await this.sourceList()
if (this.paths.length == 0) {
this.abort = true
return
}
Called Functions: (I took out stuff not relevant)
const testPath = util.promisify(fs.access)
class FMEjob {
constructor(root, inFiles, layerType, ticket) {
this.paths = []
this.config = global.app.settings.config
this.sourcePath = this.config.SourcePath
}
async sourceList() {
return await Promise.all(this.files.map(async (f) => {
let source = path.join(this.sourcePath, f.path)
return async () => {
if (await checkFile(source)) {
this.paths.push(source)
}
}
}))
}
async checkFile(path) {
let result = true
try {
await testPath(path, fs.constants.R_OK)
}
catch (err) {
this.errors++
result = false
logger.addLog('info', 'FMEjob.checkFile(): File Missing Error: %s', err.path)
}
return result
}
Your sourceList function is really weird. It returns a promise for an array of asynchronous functions, but it never calls those. Drop the arrow function wrapper.
Also I recommend to never mutate instance properties inside async methods, that'll cause insane bugs when multiple methods are executed concurrently.
this.paths = await this.sourceList()
if (this.abort = (this.paths.length == 0)) {
return
}
async sourceList() {
let paths = []
await Promise.all(this.files.map(async (f) => {
const source = path.join(this.sourcePath, f.path)
// no function here, no return here!
if (await this.checkFile(source)) {
paths.push(source)
}
}))
return paths
}
async checkFile(path) {
try {
await testPath(path, fs.constants.R_OK)
return true
} catch (err) {
logger.addLog('info', 'FMEjob.checkFile(): File Missing Error: %s', err.path)
this.errors++ // questionable as well - better let `sourceList` count these
}
return false
}
I'm looking for a promise function wrapper that can limit / throttle when a given promise is running so that only a set number of that promise is running at a given time.
In the case below delayPromise should never run concurrently, they should all run one at a time in a first-come-first-serve order.
import Promise from 'bluebird'
function _delayPromise (seconds, str) {
console.log(str)
return Promise.delay(seconds)
}
let delayPromise = limitConcurrency(_delayPromise, 1)
async function a() {
await delayPromise(100, "a:a")
await delayPromise(100, "a:b")
await delayPromise(100, "a:c")
}
async function b() {
await delayPromise(100, "b:a")
await delayPromise(100, "b:b")
await delayPromise(100, "b:c")
}
a().then(() => console.log('done'))
b().then(() => console.log('done'))
Any ideas on how to get a queue like this set up?
I have a "debounce" function from the wonderful Benjamin Gruenbaum. I need to modify this to throttle a promise based on it's own execution and not the delay.
export function promiseDebounce (fn, delay, count) {
let working = 0
let queue = []
function work () {
if ((queue.length === 0) || (working === count)) return
working++
Promise.delay(delay).tap(function () { working-- }).then(work)
var next = queue.shift()
next[2](fn.apply(next[0], next[1]))
}
return function debounced () {
var args = arguments
return new Promise(function (resolve) {
queue.push([this, args, resolve])
if (working < count) work()
}.bind(this))
}
}
I don't think there are any libraries to do this, but it's actually quite simple to implement yourself:
function sequential(fn) { // limitConcurrency(fn, 1)
let q = Promise.resolve();
return function(x) {
const p = q.then(() => fn(x));
q = p.reflect();
return p;
};
}
For multiple concurrent requests it gets a little trickier, but can be done as well.
function limitConcurrency(fn, n) {
if (n == 1) return sequential(fn); // optimisation
let q = Promise.resolve();
const active = new Set();
const fst = t => t[0];
const snd = t => t[1];
return function(x) {
function put() {
const p = fn(x);
const a = p.reflect().then(() => {
active.delete(a);
});
active.add(a);
return [Promise.race(active), p];
}
if (active.size < n) {
const r = put()
q = fst(t);
return snd(t);
} else {
const r = q.then(put);
q = r.then(fst);
return r.then(snd)
}
};
}
Btw, you might want to have a look at the actors model and CSP. They can simplify dealing with such things, there are a few JS libraries for them out there as well.
Example
import Promise from 'bluebird'
function sequential(fn) {
var q = Promise.resolve();
return (...args) => {
const p = q.then(() => fn(...args))
q = p.reflect()
return p
}
}
async function _delayPromise (seconds, str) {
console.log(`${str} started`)
await Promise.delay(seconds)
console.log(`${str} ended`)
return str
}
let delayPromise = sequential(_delayPromise)
async function a() {
await delayPromise(100, "a:a")
await delayPromise(200, "a:b")
await delayPromise(300, "a:c")
}
async function b() {
await delayPromise(400, "b:a")
await delayPromise(500, "b:b")
await delayPromise(600, "b:c")
}
a().then(() => console.log('done'))
b().then(() => console.log('done'))
// --> with sequential()
// $ babel-node test/t.js
// a:a started
// a:a ended
// b:a started
// b:a ended
// a:b started
// a:b ended
// b:b started
// b:b ended
// a:c started
// a:c ended
// b:c started
// done
// b:c ended
// done
// --> without calling sequential()
// $ babel-node test/t.js
// a:a started
// b:a started
// a:a ended
// a:b started
// a:b ended
// a:c started
// b:a ended
// b:b started
// a:c ended
// done
// b:b ended
// b:c started
// b:c ended
// done
Use the throttled-promise module:
https://www.npmjs.com/package/throttled-promise
var ThrottledPromise = require('throttled-promise'),
promises = [
new ThrottledPromise(function(resolve, reject) { ... }),
new ThrottledPromise(function(resolve, reject) { ... }),
new ThrottledPromise(function(resolve, reject) { ... })
];
// Run promises, but only 2 parallel
ThrottledPromise.all(promises, 2)
.then( ... )
.catch( ... );
I have the same problem. I wrote a library to implement it. Code is here. I created a queue to save all the promises. When you push some promises to the queue, the first several promises at the head of the queue would be popped and running. Once one promise is done, the next promise in the queue would also be popped and running. Again and again, until the queue has no Task. You can check the code for details. Hope this library would help you.
Advantages
you can define the amount of concurrent promises (near simultaneous requests)
consistent flow: once one promise resolve, another request start no need to guess the server capability
robust against data choke, if the server stop for a moment, it will just wait, and next tasks will not start just because the
clock allowed
do not rely on a 3rd party module it is Vanila node.js
1st thing is to make https a promise, so we can use wait to retrieve data (removed from the example)
2nd create a promise scheduler that submit another request as any promise get resolved.
3rd make the calls
Limiting requests taking by limiting the amount of concurrent promises
const https = require('https')
function httpRequest(method, path, body = null) {
const reqOpt = {
method: method,
path: path,
hostname: 'dbase.ez-mn.net',
headers: {
"Content-Type": "application/json",
"Cache-Control": "no-cache"
}
}
if (method == 'GET') reqOpt.path = path + '&max=20000'
if (body) reqOpt.headers['Content-Length'] = Buffer.byteLength(body);
return new Promise((resolve, reject) => {
const clientRequest = https.request(reqOpt, incomingMessage => {
let response = {
statusCode: incomingMessage.statusCode,
headers: incomingMessage.headers,
body: []
};
let chunks = ""
incomingMessage.on('data', chunk => { chunks += chunk; });
incomingMessage.on('end', () => {
if (chunks) {
try {
response.body = JSON.parse(chunks);
} catch (error) {
reject(error)
}
}
console.log(response)
resolve(response);
});
});
clientRequest.on('error', error => { reject(error); });
if (body) { clientRequest.write(body) }
clientRequest.end();
});
}
const asyncLimit = (fn, n) => {
const pendingPromises = new Set();
return async function(...args) {
while (pendingPromises.size >= n) {
await Promise.race(pendingPromises);
}
const p = fn.apply(this, args);
const r = p.catch(() => {});
pendingPromises.add(r);
await r;
pendingPromises.delete(r);
return p;
};
};
// httpRequest is the function that we want to rate the amount of requests
// in this case, we set 8 requests running while not blocking other tasks (concurrency)
let ratedhttpRequest = asyncLimit(httpRequest, 8);
// this is our datase and caller
let process = async () => {
patchData=[
{path: '/rest/slots/80973975078587', body:{score:3}},
{path: '/rest/slots/809739750DFA95', body:{score:5}},
{path: '/rest/slots/AE0973750DFA96', body:{score:5}}]
for (let i = 0; i < patchData.length; i++) {
ratedhttpRequest('PATCH', patchData[i].path, patchData[i].body)
}
console.log('completed')
}
process()
The classic way of running async processes in series is to use async.js and use async.series(). If you prefer promise based code then there is a promise version of async.js: async-q
With async-q you can once again use series:
async.series([
function(){return delayPromise(100, "a:a")},
function(){return delayPromise(100, "a:b")},
function(){return delayPromise(100, "a:c")}
])
.then(function(){
console.log(done);
});
Running two of them at the same time will run a and b concurrently but within each they will be sequential:
// these two will run concurrently but each will run
// their array of functions sequentially:
async.series(a_array).then(()=>console.log('a done'));
async.series(b_array).then(()=>console.log('b done'));
If you want to run b after a then put it in the .then():
async.series(a_array)
.then(()=>{
console.log('a done');
return async.series(b_array);
})
.then(()=>{
console.log('b done');
});
If instead of running each sequentially you want to limit each to run a set number of processes concurrently then you can use parallelLimit():
// Run two promises at a time:
async.parallelLimit(a_array,2)
.then(()=>console.log('done'));
Read up the async-q docs: https://github.com/dbushong/async-q/blob/master/READJSME.md