Node.js - Await for all the promises thrown inside a loop - javascript

I'm dealing with a loop in Node.js that performs two tasks for every iteration of a loop. To simplify, the code is summarized in:
Extract products metadata from a web page (blocking task).
Save all the products metadata to a database (asynchronous task).
The save operation (2) will perform about 800 operations in a database, and it doesn't need to block the main thread (I can still extracting products metadata from the web pages).
So, that being said, awaiting for the products to being saved doesn't have any sense. But if I throw the promises without awaiting for them, in the last iteration of the loop the Node.js process exits and all the pending operations are not finished.
Which is the best approach to solve this? Is it possible to achieve it without having a counter for finished promises or emitters? Thanks.
for (let shop of shops) {
// 1
const products = await extractProductsMetadata(shop);
// 2
await saveProductsMetadata(products);
}

Collect the promises in an array, then use Promise.all on it:
const storePromises = [];
for (let shop of shops) {
const products = await extractProductsMetadata(shop); //(1)
storePromises.push(saveProductsMetadata(products)); //(2)
}
await Promise.all(storePromises);
// ... all done (3)
Through that (1) will run one after each other, (2) will run in parallel, and (3) will run afterwards.
For sure you can also run (1) and (2) in parallel:
await Promise.all(shops.map(async shop => {
const products = await extractProductsMetadata(shop); //(1)
await saveProductsMetadata(products);
}));
And if an error occured in one of the promises, you can handle that with a try / catch block, to make sure all other shops won't be affected:
await Promise.all(shops.map(async shop => {
try {
const products = await extractProductsMetadata(shop); //(1)
await saveProductsMetadata(products);
} catch(error) {
// handle it here
}
}));
how to signal node to finish the process ?
You could manually call process.exit(0);, but that hides the real problem: NodeJS exits automatically if there is no listener attached anymore. That means that you should close all database connections / servers / etc. after the code above is done.

We are creating packs of data to treat. When we treat the data, we do all the get synchronously, and all the save asynchronously.
I have not handled the failure part, I let you add it to it. appropriate try/catch or function encapsulation will do it.
/**
* Call the given functions that returns promises in a queue
* options = context/args
*/
function promiseQueue(promisesFuncs, options = {}, _i = 0, _ret = []) {
return new Promise((resolve, reject) => {
if (_i >= promisesFuncs.length) {
return resolve(_ret);
}
// Call one
(promisesFuncs[_i]).apply(options.context || this, options.args || [])
.then((ret: any) => promiseQueue(promisesFuncs, _i + 1, options, [
..._ret,
ret,
]))
.then(resolve)
.catch(reject);
});
}
function async executePromiseAsPacks(arr, packSize, _i = 0) {
const toExecute = arr.slice(_i * packSize, packSize);
// Leave if we did execute all packs
if (toExecute.length === 0) return true;
// First we get all the data synchronously
const products = await promiseQueue(toExecute.map(x => () => extractProductsMetadata(x)));
// Then save the products asynchronously
// We do not put await here so it's truly asynchronous
Promise.all(toExecute.map((x, xi) => saveProductsMetadata(products[xi])));
// Call next
return executePromiseAsPacks(arr, packSize, _i + 1);
}
// Makes pack of data to treat (we extract synchronously and save asynchronously)
// Made to handle huge dataset
await executePromisesAsPacks(shops, 50);

Related

Best way to to use async/promise/callbacks in for loop [duplicate]

This question already has answers here:
How do I convert an existing callback API to promises?
(24 answers)
Closed 2 years ago.
I'm building a trading bot that needs to get stock names from separate files. But even I have used async function and await in my code, that doesn't work.
My index file init method.
const init = async () => {
const symbols = await getDownTrendingStock();
console.log("All stocks that are down: " + symbols);
const doOrder = async () => {
//do stuff
}
doOrder();
}
my getDownTrendeingStock file
const downStocks = []
function getDownTrendingStock () {
for(i = 0; i < data.USDTPairs.length; i++){
const USDTPair = data.USDTPairs[i] + "USDT";
binance.prevDay(USDTPair, (error, prevDay, symbol) => {
if(prevDay.priceChangePercent < -2){
downStocks.push(symbol)
}
});
}
return downStocks;
}
I have tried to also use async in for loop because the getDownTrendinStock function returns an empty array before for loop is finished. I didn't find the right way to do that because I was confused with all async, promise and callback stuff. What is the right statement to use in this situation?
Output:
All stocks that are down:
Wanted output:
All stocks that are down: [BTCUSDT, TRXUSDT, ATOMUSDT...]
I think the main issue in the code you posted is that you are mixing callbacks and promises.
This is what's happening in your getDownTrendingStock function:
You start iterating over the data.USDTPairs array, picking the first element
You call binance.prevDay. This does nothing yet, because its an asynchronous function that takes a bit of time. Notably, no data is added to downStocks yet.
You continue doing 1-2, still, no data is added
You return downStocks, which is still empty.
Your function is complete, you print the empty array
Now, at some point, the nodejs event loop continues and starts working on those asynchronous tasks you created earlier by calling binance.prevDay. Internally, it probably calls an API, which takes time; once that call is completed, it calls the function you provided, which pushes data to the downStocks array.
In summary, you didn't wait for the async code to complete. You can achieve this in multiple ways.
One is to wrap this in a promise and then await that promise:
const result= await new Promise((resolve, reject) => {
binance.prevDay(USDTPair, (error, prevDay, symbol) => {
if (error) {
reject(error);
} else {
resolve({prevDay, symbol});
}
});
});
if(result.prevDay.priceChangePercent < -2){
downStocks.push(result.symbol)
}
Note that you can probably also use promisify for this. Also, this means that you will wait for one request to finish before starting the next, which may slow down your code considerably, depending on how many calls you need; you may want to look into Promise.all as well.
Generally speaking, I use two technics:
const asyncFunc = () => {smthAsync};
const arrayToProcess = [];
// 1
const result = await arrayToProcess.reduce((acc, value) => acc.then(() => asyncFunc(value)), Promise.resolve(someInitialValue));
// 2
// here will be eslint errors
for(let i = 0 i < arrayToProcess.length; i+=1) {
const processResult = await asyncFunc(value);
// do with processResult what you want
};

How to wait for this function to finish?

I need to wait for mapping function to finish before I send the data to the console. I know it has something to do with Promise. I've been trying for hours and I couldn't get it to work even after reading so much about promises and async functions...
async function inactiveMemberWarner() {
var msg = "```javascript\nI have sent warnings to members that have been inactive for 2 weeks.\n\n"
var inactiveMembers = '';
var count = 0;
var guildMembers = client.guilds.find(g => g.name === mainGuild).members;
const keyPromises = await guildMembers.map(async (member) => {
if (isMod(member)) {
connection.query(`SELECT * from users WHERE userID='${member.id}'`, (err, data) => {
if (data[0]) {
if (!data[0].warnedForInactivity && moment().isSameOrAfter(moment(data[0].lastMSGDate).add('2', 'week'))) {
count++;
var updateWarning = {warnedForInactivity: 1}
connection.query(`UPDATE users SET ? WHERE userID='${data[0].userID}'`, updateWarning);
member.send(`**[*]** WARNING: You've been inactive on \`\`${mainGuild}\`\` for 2 weeks. Members that have been inactive for at least a month will be kicked.`);
inactiveMembers += `${count}. ${member.user.tag}\n`;
return inactiveMembers;
}
}
});
}
});
await Promise.all(keyPromises).then(inactiveMembersData => console.log(inactiveMembers)); // RETURNS AN EMPTY STRING
setTimeout(() => console.log(inactiveMembers), 5000); // RETURNS THE INACTIVE MEMBERS AFTER WAITING FOR 5 SECONDS (PRMITIVE WAY)
}
inactiveMemberWarner();
Thank you in advance!
You're close, but not quite there.
First, some notes:
await can be used on any value, but it is entirely pointless to use it on anything that isn't a Promise. Your guildMembers.map(...); returns an array, not a Promise.
Mixing await and .then(...) works, but is kinda messy. You're already using await - why bother dealing with callbacks?
Using guildMembers.map(async ...) like this will ensure that all the requests are fired more or less instantaneously, and they could finish in any order. This is fine, but it is kind of a race condition and results in a more or less random order of results.
This is not a good approach even just conceptually! Any time you ever have to loop queries, try and investigate ways to do it in only one query. SQL is quite powerful.
The reason your current code doesn't work is because your connection.query function escapes the async control flow. What I mean by this is that the whole point of using async/await and Promises is basically to keep track of the callbacks locally, and to make use of promise chaining to dynamically add callbacks. If you call an async function which returns a Promise, you can now carry that Promise anywhere else in your code and attach a success handler to it dynamically: either with .then() or with the sugar await.
But the connection.query function doesn't return a Promise, it just has you pass another naked callback - this one is not being tracked by a Promise! The Promise doesn't have a reference to that callback, it can't know when that callback is getting called, and thus your async/await control flow is escaped and your promises resolve long before the queries have ran.
You can resolve this by making a new Promise in the async function:
async function inactiveMemberWarner() {
var msg = "```javascript\nI have sent warnings to members that have been inactive for 2 weeks.\n\n"
var inactiveMembers = '';
var count = 0;
var guildMembers = client.guilds.find(g => g.name === mainGuild).members;
const keyPromises = guildMembers.map(async (member) => {
if (isMod(member)) {
return new Promise((resolve, reject) => {
connection.query(`SELECT * from users WHERE userID='${member.id}'`, (err, data) => {
if (err) reject(err); //make errors bubble up so they can be handled
if (data[0]) {
if (!data[0].warnedForInactivity && moment().isSameOrAfter(moment(data[0].lastMSGDate).add('2', 'week'))) {
count++;
var updateWarning = {warnedForInactivity: 1}
connection.query(`UPDATE users SET ? WHERE userID='${data[0].userID}'`, updateWarning);
member.send(`**[*]** WARNING: You've been inactive on \`\`${mainGuild}\`\` for 2 weeks. Members that have been inactive for at least a month will be kicked.`);
resolve(`${count}. ${member.user.tag}\n`;);
}
} else resolve(""); //make sure to always resolve or the promise may hang
});
});
}
});
let inactiveMembersData = await Promise.all(keyPromises); // Returns an array of inactive member snippets.
inactiveMembers = inactiveMembersData.join(""); //join array of snippets into one string
}
inactiveMemberWarner();
This will work, but there is a much much much better way. SQL supports the IN operator, which allows you to have conditions like WHERE userID IN (list_of_ids). In other words, you can do this in one query. You can even specify more conditions, such as warnedForInactivity = 0 and lastMSGDate BETWEEN (NOW() - INTERVAL 14 DAY) AND NOW(). This way you can offload all of your current processing logic onto the SQL server - something that you should try to do virtually every single time you can. It would simplify this code a lot too. I won't go any further as it's out of scope for this question but feel free to ask another if you can't figure it out.
I can't test this, but this is what normally works for me is when wanting to wait on smt:
async function inactiveMemberWarner() {
new Promise(function(cb,rj){
var msg = "```javascript\nI have sent warnings to members that have been inactive for 2 weeks.\n\n"
var inactiveMembers = '';
var count = 0;
var guildMembers = client.guilds.find(g => g.name === mainGuild).members;
const keyPromises = await guildMembers.map(async (member) => {
if (isMod(member)) {
connection.query(`SELECT * from users WHERE userID='${member.id}'`, (err, data) => {
if (data[0]) {
if (!data[0].warnedForInactivity && moment().isSameOrAfter(moment(data[0].lastMSGDate).add('2', 'week'))) {
count++;
var updateWarning = {warnedForInactivity: 1}
connection.query(`UPDATE users SET ? WHERE userID='${data[0].userID}'`, updateWarning);
member.send(`**[*]** WARNING: You've been inactive on \`\`${mainGuild}\`\` for 2 weeks. Members that have been inactive for at least a month will be kicked.`);
inactiveMembers += `${count}. ${member.user.tag}\n`;
cb(inactiveMembers);
}
}
});
}
});
cb('No Members');
}).then(inactiveMembersData => console.log(inactiveMembers)); // SHOULD RETURNS THE INACTIVE MEMBERS
}
inactiveMemberWarner();

node async/await not working for me (when using Postgres / Node - working with DB updates before going to next call) [duplicate]

This question already has answers here:
Using async/await with a forEach loop
(33 answers)
Closed 3 years ago.
await is not blocking as expected, when a block of code updates db (using postgres / node )
https://node-postgres.com
I have a list of async function calls, each call udpates a database, and each subsequent call works on data updated by the previous call.
There are about eight calls in a row, and each call must update the complete set of data it is working with, 100% to completion, before going to the next.
I tried to make everything not async, but it appears I am forced to make everything async/await because of the library I am using (postgres / node).
Each function call must complete 100% before going on to the next function call, because the next step does a select on rows where a field is not null (where the previous step fills in a value).
I have an await in front of each call, that does something (see code below):
loads the db from a csv,
next step selects all rows just inserted, calls an API and updates the database,
and so on,
but at one point, when the next function executes, NONE of the rows have been updated (as I trace through and verify, a SQL statement returns nothing back),
the code seems to pass right through going to the second function call, not blocking, honoring the await, and completing it's code block.
If I comment out some of the latter rows (dependent on the previous), and let the program run to completion, the database gets updated.
There is nothing functionally wrong with the code, everything works, just not from beginning to completion.
After running two function calls at the beginning, letting that run, I can then comment out those rows, uncomment the later rows in the flow, and run again, and everything works as expected, but I cannot run to completion with both uncommented.
What can I do to make sure each function call completes 100%, has all updates completed in the database, before going to the next step?
async/await is not working for me.
this is not pseudo-code it's the actual code, that is executing, that I am working with, the function names changed only. It is real working code, cut-n-pasted direct from my IDE.
// these are functions I call below (each in their own .js)
const insert_rows_to_db_from_csv = require('./insert_rows_to_db_from_csv')
const call_api_using_rows_from_function_above = require('./call_api_using_rows_from_function_above')
const and_so_on = require('./and_so_on')
const and_so_on_and_on = require('./and_so_on_and_on')
const and_so_on_and_on_and_on = require('./and_so_on_and_on_and_on')
// each of the above exports a main() function where I can call func.main() just // like this one defined below (this is my main() entry point)
module.exports = {
main: async function (csvFilePath) {
console.log('service: upload.main()')
try {
const csvList = []
let rstream = fs.createReadStream(csvFilePath)
.pipe(csv())
.on('data', (data) => csvList.push(data))
.on('end', async () => {
let num_rows = csvList.length
//step one (if I run these two, with step two calls below commented out, this works)
await insert_rows_to_db_from_csv.main(csvList);
await call_api_using_rows_from_function_above.main();
// step two
// blows up here, on the next function call,
// no rows selected in sql statements, must comment out, let the above run to
// completion, then comment out the rows above, and let these run separate
await work_with_rows_updated_in_previous_call_above.main(); // sets
await and_so_on.main();
await and_so_on_and_on.main();
await and_so_on_and_on_and_on.main();
})
} catch (err) {
console.log(err.stack)
} finally {
}
}
};
here is the one liner I am using to call the insert/update to the DB:
return await pool.query(sql, values);
that's it, nothing more. This is from using:
https://node-postgres.com/
npm install pg
PART 2 - continuing on,
I think the problem might be here. This is where I am doing each
API call, then insert (that the next function call is dependent upon), some code smell here that I can't sort out.
processBatch(batch) is called, that calls the API, gets a response back, and then within there it calls `handleResponseDetail(response), where the insert is happening. I think the problem is here, if there are any ideas?
this is a code block inside:
await call_api_using_rows_from_function_above.main();
It completes with no errors, inserts rows, and commits, then the next function is called, and this next function finds no rows (inserted here). But the await on the entire main() .js blocks and waits, so I don't understand.
/**
* API call, and within call handleResponse which does the DB insert.
* #param batch
* #returns {Promise<*>}
*/
async function processBatch(batch) {
console.log('Processing batch');
return await client.send(batch).then(res => {
return handleResponseDetail(res);
}).catch(err => handleError(err));
}
// should this be async?
function handleResponseDetail(response) {
response.lookups.forEach(async function (lookup) {
if (typeof lookup.result[0] == "undefined") { // result[0] is Candidate #0
++lookup_fail;
console.log('No response from API for this address.')
} else {
++lookup_success;
const id = await insert(lookup);
}
});
}
Given the code block from your Part 2 edit, the problem is now clear: all of your insert()s are being scheduled outside of the blocking context of the rest of your async/await code! This is because of that .forEach, see this question for more details.
I've annotated your existing code to show the issue:
function handleResponseDetail(response) { //synchronous function
response.lookups.forEach(async function (lookup) { //asynchronous function
//these async functions all get scheduled simultaneously
//without waiting for the previous one to complete - that's why you can't use forEach like this
if (typeof lookup.result[0] == "undefined") { // result[0] is Candidate #0
++lookup_fail;
console.log('No response from API for this address.')
} else {
++lookup_success;
const id = await insert(lookup); //this ONLY blocks the inner async function, not the outer `handleResponseDetail`
}
});
}
Here is a fixed version of that function which should work as you expect:
async function handleResponseDetail(response) {
for(const lookup of response.lookups) {
if (typeof lookup.result[0] == "undefined") { // result[0] is Candidate #0
++lookup_fail;
console.log('No response from API for this address.')
} else {
++lookup_success;
const id = await insert(lookup); //blocks handleResponseDetail until done
}
}
}
Alternatively, if the order of insertion doesn't matter, you can use Promise.all for efficiency:
async function handleResponseDetail(response) {
await Promise.all(response.lookups.map(async lookup => {
if (typeof lookup.result[0] == "undefined") { // result[0] is Candidate #0
++lookup_fail;
console.log('No response from API for this address.')
} else {
++lookup_success;
const id = await insert(lookup);
}
})); //waits until all insertions have completed before returning
}
To reiterate, you cannot easily use .forEach() with async/await because .forEach() simply calls the given function for each element of the array synchronously, with no regard for awaiting each promise before calling the next. If you need the loop to block between each element, or to wait for all elements to complete processing before returning from the function (this is your use case), you need to use a different for loop or alternatively a Promise.all() as above.
What your main function currently does is merely creating stream, assigning listeners and instantly returning. It does not await for all the listeners to resolve like you are trying to have it do
You need to extract your file reading logic to another function, which will return a Promise that will resolve only when the entire file is read, then await for that Promise inside main
function getCsvList(csvFilePath) {
return new Promise((resolve, reject) => {
const csvList = []
fs.createReadStream(csvFilePath)
.pipe(csv())
.on('data', (data) => csvList.push(data))
.on('end', () => {
resolve(csvList)
})
.on('error', (e) => reject(e))
})
}
module.exports = {
main: async function (csvFilePath) {
try {
const csvList = await getCsvList(csvFilePath)
await insert_rows_to_db_from_csv.main(csvList);
await call_api_using_rows_from_function_above.main();
await work_with_rows_updated_in_previous_call_above.main();
await and_so_on.main();
await and_so_on_and_on.main();
await and_so_on_and_on_and_on.main();
} catch (err) {
console.log(err.stack)
} finally {
}
}
};

await loop vs Promise.all [duplicate]

This question already has answers here:
Any difference between await Promise.all() and multiple await?
(6 answers)
Closed 4 years ago.
Having a set of async operations on db to do, I'm wondering what's the difference performance-wise of doing a "blocking" await loop versus a Promise.all.
let insert = (id,value) => {
return new Promise(function (resolve, reject) {
connnection.query(`insert into items (id,value) VALUES (${id},"${value}")`, function (err, result) {
if (err) return reject(err)
return resolve(result);
});
});
};
Promise.all solution (it needs a for loop to builds the array of promises..)
let inserts = [];
for (let i = 0; i < SIZE; i++) inserts.push(insert(i,"..string.."))
Promise.all(inserts).then(values => {
console.log("promise all ends");
});
await loop solution
let inserts = [];
(async function loop() {
for (let i = 0; i < SIZE; i++) {
await insert(i, "..string..")
}
console.log("await loop ends");
})
Edit: thanks for the anwsers, but I would dig into this a little more.
await is not really blocking, we all know that, it's blocking in its own code block. An await loop sequentially fire requests, so if in the middle 1 requests takes longer, the other ones waits for it.
Well this is similar to Promise.all: if a 1 req takes longer, the callback is not executed until ALL the responses are returned.
Your example of using Promise.all will create all promises first before waiting for them to resolve. This means that your requests will fire concurrently and the callback given to Promise.all(...).then(thisCallback) will only fire if all requests were successful.
Note: promise returned from Promise.all will reject as soon as one of the promises in the given array rejects.
const SIZE = 5;
const insert = i => new Promise(resolve => {
console.log(`started inserting ${i}`);
setTimeout(() => {
console.log(`inserted ${i}`);
resolve();
}, 300);
});
// your code
let inserts = [];
for (let i = 0; i < SIZE; i++) inserts.push(insert(i, "..string.."))
Promise.all(inserts).then(values => {
console.log("promise all ends");
});
// requests are made concurrently
// output
// started inserting 0
// started inserting 1
// started inserting 2
// ...
// started inserting 4
// inserted 0
// inserted 1
// ...
// promise all ends
Note: It might be cleaner to use .map instead of a loop for this scenario:
Promise.all(
Array.from(Array(SIZE)).map((_, i) => insert(i,"..string.."))
).then(values => {
console.log("promise all ends");
});
Your example of using await on the other hand, waits for each promise to resolve before continuing and firing of the next one:
const SIZE = 5;
const insert = i => new Promise(resolve => {
console.log(`started inserting ${i}`);
setTimeout(() => {
console.log(`inserted ${i}`);
resolve();
}, 300);
});
let inserts = [];
(async function loop() {
for (let i = 0; i < SIZE; i++) {
await insert(i, "..string..")
}
console.log("await loop ends");
})()
// no request is made until the previous one is finished
// output
// started inserting 0
// inserted 0
// started inserting 1
// ...
// started inserting 4
// inserted 4
// await loop ends
The implications for performance in the above cases are directly correlated to their different behavior.
If "efficient" for your use case means to finish up the requests as soon as possible, then the first example wins because the requests will be happening around the same time, independently, whereas in the second example they will happen in a serial fashion.
In terms of complexity, the time complexity for your first example is equal to O(longestRequestTime) because the requests will happen essentially in parallel and thus the request taking the longest will drive the worst-case scenario.
On the other hand, the await example has O(sumOfAllRequestTimes) because no matter how long individual requests take, each one has to wait for the previous one to finish and thus the total time will always include all of them.
To put things in numbers, ignoring all other potential delays due to the environment and application in which the code is ran, for 1000 requests, each taking 1s, the Promise.all example would still take ~1s while the await example would take ~1000s.
Maybe a picture would help:
Note: Promise.all won't actually run the requests exactly in parallel and the performance in general will greatly depend on the exact environment in which the code is running and the state of it (for instance the event loop) but this is a good approximation.
The major difference between the two approaches is that
The await version issues server requests sequentially in the loop. If one of them errors without being caught, no more requests are issued. If request errors are trapped using try/catch blocks, you can identify which request failed and perhaps code in some some form of recovery or even retry the operation.
The Promise.all version will make server requests in or near parallel fashion, limited by browser restrictions on the maximum number of concurrent requests permitted. If one of the requests fails the Promise.all returned promise fails immediately. If any requests were successful and returned data, you lose the data returned. In addition if any request fails, no outstanding requests are cancelled - they were initiated in user code (the insert function) when creating the array of promises.
As mentioned in another answer, await is non blocking and returns to the event loop until its operand promise is settled. Both the Promise.all and await while looping versions allow responding to other events while requests are in progress.
Each has different advantages, it's up to us which one we need to solve our problem.
await loop
for(let i = 0;i < SIZE; i++){
await promiseCall();
}
It will call all promises in parallel if any promise rejected it won't have any effect on other promises.
In ES2018 it has simplified for certain situation like if you want to call the second iteration only if the first iteration got finished, refer the following ex.
async function printFiles () {
const files = await getFilePaths()
for await (const file of fs.readFile(file, 'utf8')) {
console.log(contents)
}
}
Promise.all()
var p1 = Promise.resolve(32);
var p2 = 123;
var p3 = new Promise((resolve, reject) => {
setTimeout(() => {
resolve("foo");
}, 100);
});
Promise.all([p1, p2, p3]).then(values => {
console.log(values); // [32, 123, "foo"]
});
This will execute every promise sequentially and finally return combined revolved values array.
If any one of these promise get rejected it will return value of that rejected promise only. follow following ex,
var p1 = Promise.resolve(32);
var p2 = Promise.resolve(123);
var p3 = new Promise((resolve, reject) => {
setTimeout(() => {
resolve("foo");
}, 100);
});
Promise.all([p1, p2, p3]).then(values => {
console.log(values); // 123
});

Refractroing: return or push value to new array value from mongoose callback

Actually I'm not sure that Title of my question is 'correct', if you
have any idea with it, you could leave a comment and I'll rename it.
I am trying to rewrite my old function which make http-requests and insert many object at mongoDB via mongoose. I already have a working version of it, but I face a problem while using it. Basically, because when I'm trying to insertMany 20 arrays from 20+ request with ~50'000 elements from one request it cause a huge memory leak. Even with MongoDB optimization.
Logic of my code:
function main() {
server.find({locale: "en_GB"}).exec(function (err, server) {
for (let i = 0; i < server.length; i++) { //for example 20 servers
rp({url: server[i].slug}).then(response => {
auctions.count({
server: server[i].name,
lastModified: {$gte: response.data.files[0].lastModified}
}).then(function (docs) {
if (docs < 0) {
//We don't insert data if they are already up-to-date
}
else {
//I needed response.data.files[0].url and server[i].name from prev. block
//And here is my problem
requests & insertMany and then => loop main()
})
}
})
}).catch(function (error) {
console.log(error);
})
}
})
}
main()
Actually I have already trying many different things to fix it. First-of-all I was trying to add setInterval after else block like this:
setTimeout(function () {
//request every server with interval, instead of all at once
}, 1000 * (i + 1));
but I create another problem for myself because I needed to recursive my main() function right after. So I can't use: if (i === server[i].length-1) to call garbage collector or to restart main() because not all server skip count validation
Or let's see another example of mine:
I change for (let i = 0; i < server.length; i++) from 3-rd line to .map and move it from 3-rd line close to else block but setTimeout doesn't work with .map version, but as you may already understand script lose correct order and I can't make a delay with it.
Actually I already understand how to fix it at once. Just re-create array via let array_new = [], array_new.push = response.data.files[0].url with use of async/await. But I'm not a big expert in it, so I already waste a couple of hours. So the only problem for now, that I don't know how to return values from else block
As for now I'm trying to form array inside else block
function main() {
--added let array_new = [];
[v1]array_new.url += response.data.files[0].url;
[v2]array_new.push(response.data.files[0].url);
return array_new
and then call array_new array via .then , but not one of these works fine for now. So maybe someone will give me a tip or show me already answered question #Stackoverflow that could be useful in my situation.
Since you are essentially dealing with promises, you can refactor your function logic to use async await as follows:
function async main() {
try {
const servers = await server.find({locale: "en_GB"}).exec()
const data = servers.map(async ({ name, slug }) => {
const response = await rp({ url: slug })
const { lastModified, url } = response.data.files[0]
const count = await auctions.count({
server: name,
lastModified: { $gte: lastModified }
})
let result = {}
if (count > 0) result = { name, url }
return result
}).filter(d => Object.keys(d).length > 0)
Model.insertMany(data)
} catch (err) {
console.error(err)
}
}
Your problem is with logic obscured by your promises. Your main function recursively calls itself N times, where N is the number of servers. This builds up exponentially to eat memory both by the node process and MongoDB handling all the requests.
Instead of jumping into async / await, start by using the promises and waiting for the batch of N queries to complete before starting another batch. You can use [Promise.all] for this.
function main() {
server.find({locale: "en_GB"}).exec(function (err, server) {
// need to keep track of each promise for each server
let promises = []
for (let i = 0; i < server.length; i++) {
let promise = rp({
url: server[i].slug
}).then(function(response) {
// instead of nesting promises, return the promise so it is handled by
// the next then in the chain.
return auctions.count({
server: server[i].name,
lastModified: {
$gte: response.data.files[0].lastModified
}
});
}).then(function (docs) {
if (docs > 0) {
// do whatever you need to here regarding making requests and
// inserting into DB, but don't call main() here.
return requestAndInsert();
}
}).catch(function (error) {
console.log(error);
})
// add the above promise to out list.
promises.push(promise)
}
// register a new promise to run once all of the above promises generated
// by the loop have been completed
Promise.all(promises).then(function () {
// now you can call main again, optionally in a setTimeout so it waits a
// few seconds before fetchin more data.
setTimeout(main, 5000);
})
})
}
main()

Categories

Resources