JS, multi awaits with single promise - javascript

Will this code send more request
const pr1 = request1();
const pr2 = request2();
const pr3 = request3();
await Promise.all([pr1, pr2, pr3])
const res1 = await pr1
const res2 = await pr2
const res3 = await pr3
than
const pr1 = request1();
const pr2 = request2();
const pr3 = request3();
const [res1, res2, res3] = await Promise.all([pr1, pr2, pr3])
What is the most efficient way to send multiple async requests using nodejs?

Will this code send more request
No.
It sends three requests.
Then it waits for all of them to resolve.
Then it waits for each of them to resolve (which takes no time because they have resolved already) assigning them to variables as it goes.
Waiting for something to resolve does not start it from scratch.
What is most efficient way to send multiple async requests using nodejs?
I would be astonished if there were any practical performance differences between the two versions of the code.
The second version, which uses destructuring, is (subjectively) easier to read.
Write for the people maintaining your code (which is will often be you in 6 months time). Giving them code that is easy to maintain is going to save you more than trying to deal with microoptimisations.

If every request is independent of other, you can use await Promise.all([]) as above mentioned. But if you need result of the previous request, then you need to go await for each request.
const [res1, res2, res3] = await Promise.all([pr1, pr2, pr3])

With Promises:
var promise1 = Promise.resolve(3);
var promise2 = 42;
var promise3 = new Promise(function(resolve, reject) {
setTimeout(resolve, 100, 'foo');
});
Promise.all([promise1, promise2, promise3]).then(function(values) {
console.log(values);
});
With Async/ Await:
let [foo, bar] = await Promise.all([getFoo(), getBar()]);

const pr1 = request1();
const pr2 = request2();
const pr3 = request3();
const [res1, res2, res3] = await Promise.all([pr1, pr2, pr3])
This is the best way, this will trigger 3 parallel requests instead of sequential flow.

Related

Promise.all is not resolving if second promise is added to promise array

Given that promise1 and promise2 are correctly resolved with the below code:
export const getExistingPayments = async (payments: Payment[]) => {
const promise1 = await getPayment(payments[0].paymentId)
const promise2 = await getPayment(payments[1].paymentId)
const results = [promise1, promise2]
return results
}
Can anybody help explain why the below code would just hang, the promises are never resolved or rejected:
export const getExistingPayments = async (payments: Payment[]) => {
const promise1 = getPayment(payments[0].paymentId)
const promise2 = getPayment(payments[1].paymentId)
const results = await Promise.all([promise1, promise2])
return results
}
It might also be worth mentioning that when only one promise is passed to the Promise.all the promise is resolved as expected, the below code also works fine:
export const getExistingPayments = async (payments: Payment[]) => {
const promise1 = getPayment(payments[0].paymentId)
const results = await Promise.all([promise1)
return results
}
This largely depends on what happens inside getPayment. In general, your second example should work, but there are cases when that does not happen, particularly, if the two getPayment calls, due to some internal operations wait for each-other, reaching a deadlock.
While based on the information you have given, we cannot tell you exactly what happens in your code, the solution that you apply should be that you ensure that getPayment calls will never cause a deadlock if they are executed asynchronously.
Try something like this, instead of assigning the result of the functions to a variables
const [results1, result2] = await Promise.all([
getPayment(payments[0].paymentId),
getPayment(payments[1].paymentId)
])

Is this type of promise nesting good practice?

I just wanted to know if it is considered good practice to nest promises like in this example, or is there better alternatives ?
getDatabaseModel.searchId(idNoms).then(function([noms, elements]) {
getDatabaseModel.getLocalisations().then(function(localisations){
getDatabaseModel.getStates().then(function(states){
//Some code
})
})
})
Obviously, your promises are independent. So you should use Promise.all() to make it run parallel with the highest performance.
The Promise.all() method takes an iterable of promises as an input,
and returns a single Promise that resolves to an array of the results
of the input promises
var searchById = getDatabaseModel.searchId(idNoms);
var getLocalisations = getDatabaseModel.getLocalisations();
var getStates = getDatabaseModel.getStates();
var result = Promise.all([searchById, getLocalisations, getStates]);
result.then((values) => {
console.log(values);
});
For example, Let's say each promise takes 1s - So it should be 3s in total, But with Promise.all, actually it just takes 1s in total.
var tick = Date.now();
const log = (v) => console.log(`${v} \n Elapsed: ${Date.now() - tick}`);
log("Staring... ");
var fetchData = (name, ms) => new Promise(resolve => setTimeout(() => resolve(name), ms));
var result = Promise.all(
[
fetchData("searchById", 1000),
fetchData("getLocalisations", 1000),
fetchData("getStates", 1000)
]);
result.then((values) => {
log("Complete...");
console.log(values);
});
Besides, If you're concern about asyn/await with more elegant/concise/read it like sync code, then await keyword much says the code should wait until the async request is finished and then afterward it'll execute the next thing. While those promises are independent. So promise.all is better than in your case.
var tick = Date.now();
const log = (v) => console.log(`${v} \n Elapsed: ${Date.now() - tick}`);
var fetchData = (name, ms) => new Promise(resolve => setTimeout(() => resolve(name), ms));
Run();
async function Run(){
log("Staring... ");
var a = await fetchData("searchById", 1000);
var b = await fetchData("getLocalisations", 1000);
var c = await fetchData("getStates", 1000);
log("Complete...");
console.log([a, b, c]);
}
Promises were made to avoid callback hell, but they are not too good at it too. People like promises until they find async/await. The exact same code can be re-written in async/await as
async getModel(idNoms){
const [noms, elements] = await getDatabaseModel.searchId(idNoms);
const localisations = await getDatabaseModel.getLocalisations();
const state = await getDatabaseModel.getStates():
// do something using localisations & state, it'll work
}
getModel(idNoms);
Learn async/await here
IMO it's a little hard to read and understand. Compare with this:
getDatabaseModel.searchId(idNoms)
.then(([noms, elements]) => getDatabaseModel.getLocalisations())
.then(localization => getDatabaseModel.getStates());
As #deceze pointed out there are two things to note:
These functions are called serially
They don't seem to depend on each other as the noms, elements and localization are not used at all.
With Promise.all you can mix and match however you want:
// Call `searchId` and `getState` at the same time
// Call `getLocalisations` after `searchId` is done
// wait for all to finish
Promise.all([
getDatabaseModel.searchId(idNoms).then(([noms, elements]) => getDatabaseModel.getLocalisations()),
getDatabaseModel.getStates()
]).then(([result1, result2]) => console.log('done'));
// Call all 3 at the same time
// wait for all to finish
Promise.all([
getDatabaseModel.searchId(idNoms),
getDatabaseModel.getLocalisations(),
getDatabaseModel.getStates(),
]).then(([result1, result2, result3]) => console.log('done'));

Async not awaiting function before running

I'm trying to parse a specification website from saved HTML on my computer. I can post the file upon request.
I'm burnt out trying to figure out why it won't run synchronously. The comments should log the CCCC's first, then BBBB's, then finally one AAAA.
The code I'm running will not wait at the first hurdle (it prints AAAA... first). Am I using request-promise incorrectly? What is going on?
Is this due to the .each() method of cheerio (I'm assuming it's synchronous)?
const rp = require('request-promise');
const fs = require('fs');
const cheerio = require('cheerio');
async function parseAutodeskSpec(contentsHtmlFile) {
const topics = [];
const contentsPage = cheerio.load(fs.readFileSync(contentsHtmlFile).toString());
const contentsSelector = '.content_htmlbody table td div div#divtreed0e338374 nobr .toc_entry a.treeitem';
contentsPage(contentsSelector).each(async (idx, topicsAnchor) => {
const topicsHtml = await rp(topicsAnchor.attribs['href']);
console.log("topicsHtml.length: ", topicsHtml.length);
});
console.log("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
return topics;
}
Try it this way:
let hrefs = contentsPage(contentsSelector).map((idx, topicsAnchor) => {
return topicsAnchor.attribs['href']
}).get()
let topicsHtml
for(href of hrefs){
topicsHtml = await rp(href);
console.log("topicsHtml.length: ", topicsHtml.length);
}
Now the await is outside of map or each which doesn't quite work the way you think.
As #lumio stated in his comment, I also think that this is because of the each function being synchrone.
You should rather use the map method, and use the Promise.all() on the result to wait enough time:
const obj = contentsPage(contentsSelector).map(async (idx, topicsAnchor) => {
const topicsHtml = await rp(topicsAnchor.attribs['href']);
console.log("topicsHtml.length: ", topicsHtml.length);
const topicsFromPage = await parseAutodeskTopics(topicsHtml)
console.log("topicsFromPage.length: ", topicsFromPage.length);
topics.concat(topicsFromPage);
})
const filtered = Object.keys(obj).filter(key => !isNaN(key)).map(key => obj[key])
await Promise.all(filtered)
console.log("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
Based on the other answers here I came to a rather elegant conclusion. Note the avoidance of async/await in the .map() callback, as cheerio's callbacks (and from what I've learned about async/await, generally all callbacks) seem not to honour the synchronous nature of await well:
async function parseAutodeskSpec(contentsHtmlFile) {
const contentsPage = cheerio.load(fs.readFileSync(contentsHtmlFile).toString());
const contentsSelector = '.content_htmlbody table td div div#divtreed0e338374 nobr .toc_entry a.treeitem';
const contentsReqs = contentsPage(contentsSelector)
.map((idx, elem) => rp(contentsPage(elem).attr('href')))
.toArray();
const topicsReqs = await Promise.all(contentsReqs)
.map(req => parseAutodeskTopics(req));
return await Promise.all(topicsReqs);
}

Crawling multiple URLs in a loop using Puppeteer

I have an array of URLs to scrape data from:
urls = ['url','url','url'...]
This is what I'm doing:
urls.map(async (url)=>{
await page.goto(url);
await page.waitForNavigation({ waitUntil: 'networkidle' });
})
This seems to not wait for page load and visits all the URLs quite rapidly (I even tried using page.waitFor).
I wanted to know if am I doing something fundamentally wrong or this type of functionality is not advised/supported.
map, forEach, reduce, etc, does not wait for the asynchronous operation within them, before they proceed to the next element of the iterator they are iterating over.
There are multiple ways of going through each item of an iterator synchronously while performing an asynchronous operation, but the easiest in this case I think would be to simply use a normal for operator, which does wait for the operation to finish.
const urls = [...]
for (let i = 0; i < urls.length; i++) {
const url = urls[i];
await page.goto(`${url}`);
await page.waitForNavigation({ waitUntil: 'networkidle2' });
}
This would visit one url after another, as you are expecting. If you are curious about iterating serially using await/async, you can have a peek at this answer: https://stackoverflow.com/a/24586168/791691
The accepted answer shows how to serially visit each page one at a time. However, you may want to visit multiple pages simultaneously when the task is embarrassingly parallel, that is, scraping a particular page isn't dependent on data extracted from other pages.
A tool that can help achieve this is Promise.allSettled which lets us fire off a bunch of promises at once, determine which were successful and harvest results.
For a basic example, let's say we want to scrape usernames for Stack Overflow users given a series of ids.
Serial code:
const puppeteer = require("puppeteer"); // ^19.6.3
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const baseURL = "https://stackoverflow.com/users";
const startId = 6243352;
const qty = 5;
const usernames = [];
for (let i = startId; i < startId + qty; i++) {
await page.goto(`${baseURL}/${i}`, {
waitUntil: "domcontentloaded"
});
const sel = ".flex--item.mb12.fs-headline2.lh-xs";
const el = await page.waitForSelector(sel);
usernames.push(await el.evaluate(el => el.textContent.trim()));
}
console.log(usernames);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Parallel code:
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const baseURL = "https://stackoverflow.com/users";
const startId = 6243352;
const qty = 5;
const usernames = (await Promise.allSettled(
[...Array(qty)].map(async (_, i) => {
const page = await browser.newPage();
await page.goto(`${baseURL}/${i + startId}`, {
waitUntil: "domcontentloaded"
});
const sel = ".flex--item.mb12.fs-headline2.lh-xs";
const el = await page.waitForSelector(sel);
const text = await el.evaluate(el => el.textContent.trim());
await page.close();
return text;
})))
.filter(e => e.status === "fulfilled")
.map(e => e.value);
console.log(usernames);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Remember that this is a technique, not a silver bullet that guarantees a speed increase on all workloads. It will take some experimentation to find the optimal balance between the cost of creating more pages versus the parallelization of network requests on a given particular task and system.
The example here is contrived since it's not interacting with the page dynamically, so there's not as much room for gain as in a typical Puppeteer use case that involves network requests and blocking waits per page.
Of course, beware of rate limiting and any other restrictions imposed by sites (running the code above may anger Stack Overflow's rate limiter).
For tasks where creating a page per task is prohibitively expensive or you'd like to set a cap on parallel request dispatches, consider using a task queue or combining serial and parallel code shown above to send requests in chunks. This answer shows a generic pattern for this agnostic of Puppeteer.
These patterns can be extended to handle the case when certain pages depend on data from other pages, forming a dependency graph.
See also Using async/await with a forEach loop which explains why the original attempt in this thread using map fails to wait for each promise.
If you find that you are waiting on your promise indefinitely, the proposed solution is to use the following:
const urls = [...]
for (let i = 0; i < urls.length; i++) {
const url = urls[i];
const promise = page.waitForNavigation({ waitUntil: 'networkidle' });
await page.goto(`${url}`);
await promise;
}
As referenced from this github issue
Best way I found to achieve this.
const puppeteer = require('puppeteer');
(async () => {
const urls = ['https://www.google.com/', 'https://www.google.com/']
for (let i = 0; i < urls.length; i++) {
const url = urls[i];
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(`${url}`, { waitUntil: 'networkidle2' });
await browser.close();
}
})();
Something no one else mentions is that if you are fetching multiple pages using the same page object it is crucial that you set its timeout to 0. Otherwise, once it has fetched the default 30 seconds worth of pages, it will timeout.
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.setDefaultNavigationTimeout(0);

Best es6 way to get name based results with Promise.all

By default the Promise.All([]) function returns a number based index array that contains the results of each promise.
var promises = [];
promises.push(myFuncAsync1()); //returns 1
promises.push(myFuncAsync1()); //returns 2
Promise.all(promises).then((results)=>{
//results = [0,1]
}
What is the best vanilla way to return a named index of results with Promise.all()?
I tried with a Map, but it returns results in an array this way:
[key1, value1, key2, value2]
UPDATE:
My questions seems unclear, here is why i don't like ordered based index:
it's crappy to maintain: if you add a promise in your code you may have to rewrite the whole results function because the index may have change.
it's awful to read: results[42] (can be fixed with jib's answer below)
Not really usable in a dynamic context:
var promises = [];
if(...)
promises.push(...);
else{
[...].forEach(... => {
if(...)
promises.push(...);
else
[...].forEach(... => {
promises.push(...);
});
});
}
Promise.all(promises).then((resultsArr)=>{
/*Here i am basically fucked without clear named results
that dont rely on promises' ordering in the array */
});
ES6 supports destructuring, so if you just want to name the results you can write:
var myFuncAsync1 = () => Promise.resolve(1);
var myFuncAsync2 = () => Promise.resolve(2);
Promise.all([myFuncAsync1(), myFuncAsync2()])
.then(([result1, result2]) => console.log(result1 +" and "+ result2)) //1 and 2
.catch(e => console.error(e));
Works in Firefox and Chrome now.
Is this the kind of thing?
var promises = [];
promises.push(myFuncAsync1().then(r => ({name : "func1", result : r})));
promises.push(myFuncAsync1().then(r => ({name : "func2", result : r})));
Promise.all(promises).then(results => {
var lookup = results.reduce((prev, curr) => {
prev[curr.name] = curr.result;
return prev;
}, {});
var firstResult = lookup["func1"];
var secondResult = lookup["func2"];
}
If you don't want to modify the format of result objects, here is a helper function that allows assigning a name to each entry to access it later.
const allNamed = (nameToPromise) => {
const entries = Object.entries(nameToPromise);
return Promise.all(entries.map(e => e[1]))
.then(results => {
const nameToResult = {};
for (let i = 0; i < results.length; ++i) {
const name = entries[i][0];
nameToResult[name] = results[i];
}
return nameToResult;
});
};
Usage:
var lookup = await allNamed({
rootStatus: fetch('https://stackoverflow.com/').then(rs => rs.status),
badRouteStatus: fetch('https://stackoverflow.com/badRoute').then(rs => rs.status),
});
var firstResult = lookup.rootStatus; // = 200
var secondResult = lookup.badRouteStatus; // = 404
If you are using typescript you can even specify relationship between input keys and results using keyof construct:
type ThenArg<T> = T extends PromiseLike<infer U> ? U : T;
export const allNamed = <
T extends Record<string, Promise<any>>,
TResolved extends {[P in keyof T]: ThenArg<T[P]>}
>(nameToPromise: T): Promise<TResolved> => {
const entries = Object.entries(nameToPromise);
return Promise.all(entries.map(e => e[1]))
.then(results => {
const nameToResult: TResolved = <any>{};
for (let i = 0; i < results.length; ++i) {
const name: keyof T = entries[i][0];
nameToResult[name] = results[i];
}
return nameToResult;
});
};
A great solution for this is to use async await. Not exactly ES6 like you asked, but ES8! But since Babel supports it fully, here we go:
You can avoid using only the array index by using async/await as follows.
This async function allows you to literally halt your code inside of it by allowing you to use the await keyword inside of the function, placing it before a promise. As as an async function encounters await on a promise that hasn't yet been resolved, the function immediately returns a pending promise. This returned promise resolves as soon as the function actually finishes later on. The function will only resume when the previously awaited promise is resolved, during which it will resolve the entire await Promise statement to the return value of that Promise, allowing you to put it inside of a variable. This effectively allows you to halt your code without blocking the thread. It's a great way to handle asynchronous stuff in JavaScript in general, because it makes your code more chronological and therefore easier to reason about:
async function resolvePromiseObject(promiseObject) {
await Promise.all(Object.values(promiseObject));
const ret = {};
for ([key, value] of Object.entries(promiseObject)) {
// All these resolve instantly due to the previous await
ret[key] = await value;
};
return ret;
}
As with anything above ES5: Please make sure that Babel is configured correctly so that users on older browsers can run your code without issue. You can make async await work flawlessly on even IE11, as long as your babel configuration is right.
in regards to #kragovip's answer, the reason you want to avoid that is shown here:
https://medium.com/front-end-weekly/async-await-is-not-about-making-asynchronous-code-synchronous-ba5937a0c11e
"...it’s really easy to get used to await all of your network and I/O calls.
However, you should be careful when using it multiple times in a row as the await keyword stops execution of all the code after it. (Exactly as it would be in synchronous code)"
Bad Example (DONT FOLLOW)
async function processData() {
const data1 = await downloadFromService1();
const data2 = await downloadFromService2();
const data3 = await downloadFromService3();
...
}
"There is also absolutely no need to wait for the completion of first request as none of other requests depend on its result.
We would like to have requests sent in parallel and wait for all of them to finish simultaneously. This is where the power of asynchronous event-driven programming lies.
To fix this we can use Promise.all() method. We save Promises from async function calls to variables, combine them to an array and await them all at once."
Instead
async function processData() {
const promise1 = downloadFromService1();
const promise2 = downloadFromService2();
const promise3 = downloadFromService3();
const allResults = await Promise.all([promise1, promise2, promise3]);

Categories

Resources