This is the scenario:
Using AWS Kinesis >
To get records from Kinesis I need a shard iterator >
Kinesis won't return a new shard iterator until a
request is complete >
The call to "getRecords" is asynchronous >
Attempt to iterate this process fails because the request hasn't resolved before the shard iterator is needed
const getRecordsInShard = (shard, batchSize, streamName) => {
const records = [];
const loop = (response) => {
if (_.isEmpty(response.NextShardIterator)) {
return Promise.resolve(records);
}
records.push(response.Records);
return getRecordsByShardIterator(response.NextShardIterator, batchSize).then(loop);
};
return getStreamIterator(shard.ShardId, shard.SequenceNumberRange.StartingSequenceNumber, (streamName || process.env.KINESIS_STREAM))
.then(response => getRecordsByShardIterator(response.ShardIterator, batchSize))
.then(loop);
};
The above code fails because the promise returned from loop doesn't resolve before the return of the super function. How can I iterate a promise function sequentially, using the return value from the previous iteration as the next's input?
Caveats:
Each iteration relies on the information returned by the previous, the result needs to collect records from each iteration altogether
I don't know why it went so complicated when the library provider give perfect examples to consume kinesis stream.
https://github.com/awslabs/amazon-kinesis-client-nodejs/tree/master/samples
It has basic samples and clickstream sample.
Hope it helps.
Related
Here I have a function that takes an array of string that contains the user names of github accounts. And this function is going to return an array of user data after resolving. There should be one fetch request per user. and requests shouldn’t wait for each other. So that the data arrives as soon as possible. If there’s no such user, the function should return null in the resulting array.
An example for the input would be ["iliakan", "remy", "no.such.users"], and the expected returned promise after resolving would give us [null, Object, Object], Object being the data that contained info about a user.
Here is my attempt to solve this question.
function getUsers(names) {
return new Promise(resolve => {
const array = [];
const url = "https://api.github.com/users/";
const requests = names.map(name => {
const endpoint = `${url}${name}`;
return fetch(endpoint);
});
Promise.all(requests).then(reponses => {
reponses.forEach(response => {
if (response.status === 200) {
response.json().then(data => {
array.push(data);
});
} else {
array.push(null);
}
});
resolve(array);
});
});
}
It does work, i.e. returning an array [null, Object, Object]. And I thought it fulfilled the requirements I stated above. However, after looking at it closely, I felt like I couldn't fully make sense of it.
My question is, look at where we resolve this array, it resolved immediately after the forEach loop. One thing I don't understand is, why does it contain all three items when some of the items are pushed into it asynchronously after the json() is finished. what I mean is, in the case where response.status === 200, the array is pushed with the data resolved from json(), and I would assume this json() operation should take some time. Since we didn't resolve the array after json() operation is finished, how come we still ended up with all data resolved from json()?
Promise.all(requests).then(reponses => {
reponses.forEach(response => {
if (response.status === 200) {
response.json().then(data => {
array.push(data); <--- this should take some time
});
} else {
array.push(null);
}
});
resolve(array); <--- resolve the array immediately after the `forEach` loop
});
});
It looks to me like the array we get should only have one null in it since at the time it is revolved, the .json() should not be finished
You're right, the result is pushed later into the array.
Try to execute this:
const test = await getUsers(['Guerric-P']);
console.log(test.length);
You'll notice it displays 0. Before the result is pushed into the array, its length is 0. You probably think it works because you click on the array in the console, after the result has arrived.
You should do something like this:
function getUsers(names) {
const array = [];
const url = "https://api.github.com/users/";
const requests = names.map(name => {
const endpoint = `${url}${name}`;
return fetch(endpoint);
});
return Promise.all(requests).then(responses => Promise.all(responses.map(x => x.status === 200 ? x.json() : null)));
};
You should avoid using the Promise constructor directly. Here, we don't need to use it at all.
const url = "https://api.github.com/users/";
const getUsers = names =>
Promise.all(names.map(name =>
fetch(url + name).then(response =>
response.status === 200 ? response.json() : null)));
getUsers(["iliakan", "remy", "no.such.users"]).then(console.log);
The Promise constructor should only be used when you're creating new kinds of asynchronous tasks. In this case, you don't need to use the Promise constructor because fetch already returns a promise.
You also don't need to maintain an array and push to it because Promise.all resolves to an array. Finally, you don't need to map over the result of Promise.all. You can transform the promises returned by fetch.
The thing is that because json() operation is really quick, especially if response data is small in size it just has the time to execute. Second of all as objects in JavaScript passed by reference and not by value and Array is a object in JavaScript, independently of execution time it'll still push that data to the array even after it was resolved.
I'm dealing with a loop in Node.js that performs two tasks for every iteration of a loop. To simplify, the code is summarized in:
Extract products metadata from a web page (blocking task).
Save all the products metadata to a database (asynchronous task).
The save operation (2) will perform about 800 operations in a database, and it doesn't need to block the main thread (I can still extracting products metadata from the web pages).
So, that being said, awaiting for the products to being saved doesn't have any sense. But if I throw the promises without awaiting for them, in the last iteration of the loop the Node.js process exits and all the pending operations are not finished.
Which is the best approach to solve this? Is it possible to achieve it without having a counter for finished promises or emitters? Thanks.
for (let shop of shops) {
// 1
const products = await extractProductsMetadata(shop);
// 2
await saveProductsMetadata(products);
}
Collect the promises in an array, then use Promise.all on it:
const storePromises = [];
for (let shop of shops) {
const products = await extractProductsMetadata(shop); //(1)
storePromises.push(saveProductsMetadata(products)); //(2)
}
await Promise.all(storePromises);
// ... all done (3)
Through that (1) will run one after each other, (2) will run in parallel, and (3) will run afterwards.
For sure you can also run (1) and (2) in parallel:
await Promise.all(shops.map(async shop => {
const products = await extractProductsMetadata(shop); //(1)
await saveProductsMetadata(products);
}));
And if an error occured in one of the promises, you can handle that with a try / catch block, to make sure all other shops won't be affected:
await Promise.all(shops.map(async shop => {
try {
const products = await extractProductsMetadata(shop); //(1)
await saveProductsMetadata(products);
} catch(error) {
// handle it here
}
}));
how to signal node to finish the process ?
You could manually call process.exit(0);, but that hides the real problem: NodeJS exits automatically if there is no listener attached anymore. That means that you should close all database connections / servers / etc. after the code above is done.
We are creating packs of data to treat. When we treat the data, we do all the get synchronously, and all the save asynchronously.
I have not handled the failure part, I let you add it to it. appropriate try/catch or function encapsulation will do it.
/**
* Call the given functions that returns promises in a queue
* options = context/args
*/
function promiseQueue(promisesFuncs, options = {}, _i = 0, _ret = []) {
return new Promise((resolve, reject) => {
if (_i >= promisesFuncs.length) {
return resolve(_ret);
}
// Call one
(promisesFuncs[_i]).apply(options.context || this, options.args || [])
.then((ret: any) => promiseQueue(promisesFuncs, _i + 1, options, [
..._ret,
ret,
]))
.then(resolve)
.catch(reject);
});
}
function async executePromiseAsPacks(arr, packSize, _i = 0) {
const toExecute = arr.slice(_i * packSize, packSize);
// Leave if we did execute all packs
if (toExecute.length === 0) return true;
// First we get all the data synchronously
const products = await promiseQueue(toExecute.map(x => () => extractProductsMetadata(x)));
// Then save the products asynchronously
// We do not put await here so it's truly asynchronous
Promise.all(toExecute.map((x, xi) => saveProductsMetadata(products[xi])));
// Call next
return executePromiseAsPacks(arr, packSize, _i + 1);
}
// Makes pack of data to treat (we extract synchronously and save asynchronously)
// Made to handle huge dataset
await executePromisesAsPacks(shops, 50);
I am trying to loop and get different documents from firestore. The 'document ids' are provided by an array named 'cart' as you can see in the code below.
The programming logic which I have tried goes like this the while loop in every iteration gets document from firestore and in first 'then' section it saves the data which it just have got and in second 'then' it increments the 'i' and does the next cycle of loop.
The problem is while loop doesn't wait for that get request to finish. It just keeps looping and crashes.
The thing is even if I somehow manage to do the loop part correct. How would I manage the overall execution flow of program so that only after completing the loop part further code gets executed since the code below uses the cart array which loop part updates.
let i = 0
while (i < cart.length) {
let element = cart[i]
db.collection(`products`).doc(element.productID).get().then((doc1) => {
element.mrp = doc1.data().mrp
element.ourPrice = doc1.data().ourPrice
return console.log('added price details')
}).then(() => {
i++;
return console.log(i)
}).catch((error) => {
// Re-throwing the error as an HttpsError so that the client gets the error details.
throw new functions.https.HttpsError('unknown', error.message, error);
});
}
return db.collection(`Users`).doc(`${uid}`).update({
orderHistory: admin.firestore.FieldValue.arrayUnion({
cart,
status: 'Placed',
orderPlacedTimestamp: timestamp,
outForDeliveryTimestamp: '',
deliveredTimestamp: ''
})
}).then(() => {
console.log("Order Placed Successfully");
})
Your question is not about firebase, you're asking about looping asynchronously. You can see some promises examples here, and async/await here
You can use reduce on the promises.
Note that all the promises are being created at the same time, but the call to the server is done one after the other.
cart.reduce(
(promise, element) =>
promise.then(() => {
return db.collection(`products`)
.doc(element.productID)
.get()
.then(doc1 => {
element.mrp = doc1.data().mrp;
element.ourPrice = doc1.data().ourPrice;
});
}),
Promise.resolve()
);
If you can, use async/await instead. Here all the promises are being created one after the other.
async function fetchCart() {
for (const element of cart) {
const doc1 = await db.collection(`products`).doc(element.productID);
element.mrp = doc1.data().mrp;
element.ourPrice = doc1.data().ourPrice;
console.log('added price details');
}
}
Each call to Cloud Firestore happens asynchronously. So your while loop fires off multiple such requests, but it doesn't wait for them to complete.
If you have code that needs all the results, you will need to uses Promises to ensure the flow. You're already using the promise in the while loop to get doc1.data().mrp. If cart is an array, you can do the following to gather all promises of when the data is loaded:
var promises = cart.map(function(element) {
return db.collection(`products`).doc(element.productID).get().then((doc1) => {
return doc1.data();
});
});
Now you can wait for all data with:
Promise.all(promises).then(function(datas) {
datas.forEach(function(data) {
console.log(data.mrp, data.ourPrice);
});
});
If you're on a modern environment, you can use async/await to abstract away the then:
datas = await Promise.all(promises);
datas.forEach(function(data) {
console.log(data.mrp, data.ourPrice);
});
I am using Firebase Cloud Firestore, however, I think this may be more of a JavaScript asynchronous vs synchronous promise return issue.
I am doing a query to get IDs from one collection, then I am looping over the results of that query to lookup individual records from another collection based on that ID.
Then I want to store each found record into an array and then return the entire array.
results.length is always 0 because return results fires before the forEach completes. If I print results.length from inside the forEach it has data.
How can I wait until the forEach is done before returning from the outer promise and the outer function itself?
getFacultyFavoritesFirebase() {
var dbRef = db.collection("users").doc(global.user_id).collection("favorites");
var dbQuery = dbRef.where("type", "==", "faculty");
var dbPromise = dbQuery.get();
var results = [];
return dbPromise.then(function(querySnapshot) {
querySnapshot.forEach(function(doc) {
var docRef = db.collection("faculty").doc(doc.id);
docRef.get().then(function(doc) {
if (doc.exists) {
results.push(doc);
}
})
});
console.log(results.length);
return results;
})
.catch(function(error) {
console.log("Error getting documents: ", error);
});
}
The trick here is to populate results with promises rather than the result. You can then call Promise.all() on that array of promises and get the results you want. Of course, you can't check if doc.exists before pushing the promise so you will need to deal with that once Promise.all() resolves. For example:
function getFacultyFavoritesFirebase() {
var dbRef = db.collection("users").doc(global.user_id).collection("favorites");
var dbQuery = dbRef.where("type", "==", "faculty");
var dbPromise = dbQuery.get();
// return the main promise
return dbPromise.then(function(querySnapshot) {
var results = [];
querySnapshot.forEach(function(doc) {
var docRef = db.collection("faculty").doc(doc.id);
// push promise from get into results
results.push(docRef.get())
});
// dbPromise.then() resolves to a single promise that resolves
// once all results have resolved
return Promise.all(results)
})
.catch(function(error) {
console.log("Error getting documents: ", error);
});
}
getFacultyFavoritesFirebase
.then(results => {
// use results array here and check for .exists
}
If you have multiple items of work to perform at the same time that come from a loop, you can collect all the promises from all the items of work, and wait for them all to finish with Promise.all(). The general form of a possible solution looks like this:
const promises = [] // collect all promises here
items.forEach(item => {
const promise = item.doWork()
promises.push(promise)
})
Promise.all(promises).then(results => {
// continue processing here
// results[0] is the result of the first promise in the promises array
})
You can adapt this to something that suits your own specific form.
Use for of instead of forEach. Like this:
for (const item of array) {
//do something
}
console.log("finished");
"finished" will be logged after finishing the loop.
Well I know, the thread is old, but the problem is still the same. And because I did run into the same issue and non of the answers did work for me, I want share my solution.
I think it will help someone out there. And maybe it will help me, if I run into the same problem again. ;-)
So the solution is super easy. firebase implements "map" but not direct on snaposhot, but on snapshot.docs.map.
In combination with Promieses.all it works just fine.
const promise = snapshot.docs.map(async (tenant) => {
return CheckTenant(tenant.id).catch(error =>
reject(error),
);
});
Promise.all(promise).then(result => {
// do somothing with the result});
Is there a pattern for making a stream iterable using ES6 generators?
See 'MakeStreamIterable' below.
import {createReadStream} from 'fs'
let fileName = 'largeFile.txt'
let readStream = createReadStream(fileName, {
encoding: 'utf8',
bufferSize: 1024
})
let myIterableAsyncStream = MakeStreamIterable(readStream)
for (let data of myIterableAsyncStream) {
let str = data.toString('utf8')
console.log(str)
}
I'm not interested in co or bluebird's coroutine or blocking with deasync.
The gold is MakeStreamIterable should be a valid function.
Is there a pattern for making a stream iterable using ES6 generators?
No, this cannot be achieved because generators are synchronous. They must know what they are yielding and when. Iteration of an asynchronous data source can only currently be achieved by using some kind of callback-based implementation. So, there is no way to make MakeStreamIterable 'a valid function' if what you mean by this is 'a valid function whose result can be given to a for-of loop'.
Streams are Asynchronous
A stream represents a potentially infinite amount of data received asynchronously over a potentially infinite amount of time. If we take a look at the definition of an iterator on MDN we can define in more detail what it is about a stream that makes it 'uniterable':
An object is an iterator when it knows how to access items from a collection one at a time, while keeping track of its current position within that sequence. In JavaScript an iterator is an object that provides a next() method which returns the next item in the sequence. This method returns an object with two properties: done and value.
(Emphasis is my own.)
Let's pick out the properties of an iterable from this definition. The object must...
know how to access items from a collection one at a time;
be able to keep track of its current position within the sequence of data;
and provide a method, next, that retrieves an object with a property that holds the next value in the sequence or notifies that iteration is done.
A stream doesn't meet any of the above criteria because...
it is not in control of when it receives data and cannot 'look into the future' to find the next value;
it cannot know when or if it has received all data, only when the stream has closed;
and it does not implement the iterable protocol and so does not expose a next method which a for-of can utilise.
______
Faking It(eration)
We can't actually iterate the data received from a stream (definitely not using a for-of), however we can build an interface that pretends to by using Promises (yay!) and abstracting away the stream's event handlers inside a closure.
// MakeStreamIterable.js
export default function MakeStreamIterable (stream) {
let collection = []
let index = 0
let callback
let resolve, reject
stream
.on('error', err => reject && reject(err))
.on('end', () => resolve && resolve(collection))
.on('data', data => {
collection.push(data)
try {
callback && callback(data, index++)
} catch (err) {
this.end()
reject(err)
}
})
function each (cb) {
if(callback) {
return promise
}
callback = (typeof cb === 'function') ? cb : null
if (callback && !!collection) {
collection.forEach(callback)
index = collection.length
}
return promise
}
promise = new Promise((res, rej) => {
resolve = res
reject = rej
})
promise.each = each
return promise
}
And we can use it like this:
import {MakeStreamIterable} from './MakeStreamIterable'
let myIterableAsyncStream = MakeStreamIterable(readStream)
myIterableAsyncStream
.each((data, i) => {
let str = data.toString('utf8')
console.log(i, str)
})
.then(() => console.log('completed'))
.catch(err => console.log(err))
Things to note about this implementation:
It is not necessary to call each immediately on the 'iterable stream'.
When each is called, all values received prior to its call are passed to the callback one-by-one forEach-style. Afterwards all subsequent data are passed immediately to the callback.
The function returns a Promise which resolves the complete collection of data when the stream ends, meaning we actually don't have to call each at all if the method of iteration provided by each isn't satisfactory.
I have fostered the false semantics of calling this an iterator and am therefore an awful human being. Please report me to the relevant authority.
Soon you are going to be able to use Async Iterators and Generators. In node 9.8 you can use it by running with --harmony command line option.
async function* streamAsyncIterator(stream) {
// Get a lock on the stream
const reader = stream.getReader();
try {
while (true) {
// Read from the stream
const {done, value} = await reader.read();
// Exit if we're done
if (done) return;
// Else yield the chunk
yield value;
}
}
finally {
reader.releaseLock();
}
}
async function example() {
const response = await fetch(url);
for await (const chunk of streamAsyncIterator(response.body)) {
// …
}
}
Thanks to Jake Archibald for the examples above.
2020 Update:
It looks like streams will be "natively" iterable in the future - just waiting on browsers to implement it:
https://streams.spec.whatwg.org/#rs-asynciterator
https://github.com/whatwg/streams/issues/778
https://bugs.chromium.org/p/chromium/issues/detail?id=929585
https://bugzilla.mozilla.org/show_bug.cgi?id=1525852
https://bugs.webkit.org/show_bug.cgi?id=194379
for await (const chunk of stream) {
...
}