HTTP Request promises and concurrency - javascript

Imagine a request triggered with:
http.get('/endpoint')
Imagine 10 requests triggered with:
const promises = _.range(10).map(i => http.get('/endpoint'))
If I want to execute all the requests simultaneously, I generally use
const results = await Promise.all(promises)
Now, let's imagine that I want to only execute 2 requests at a time, will Promise.all() on only two items be enough to execute only 2 requests at a time or will still all requests be triggered at the same time?
const promises = _.range(10).map(i => http.get('/endpoint'))
let results = []
for (let chunk of _.chunk(promises, 2)) {
results = results.concat(await Promise.all(chunk))
}
If it still executes the 10 requests at the same time, how could I prevent this behavior?
Note: _ refers to the lodash library in order to make the question simpler

Use bluebird Promise.map that allow to configure concurrency - the concurrency limit applies to Promises returned by the mapper function and it basically limits the number of Promises created.
Promise.map(_.range(10), () => {...}, {concurrency: 2})

You can use mapLimit from async.
https://caolan.github.io/async/docs.html#mapLimit
import mapLimit from 'async/mapLimit';
const ids = _.range(10);
const maxParallel = 2;
mapLimit(ids, maxParallel, (id, callback) => {
http.get(`/endpoint/${id}`)
.then(result => callback(null, result))
.catch(error => callback(error, null));
}, (error, results) {
//Some stuff
});

Related

Too many simultaneous requests with NodeJS+request-promise

I have NodeJS project with a BIG array (about 9000 elements) containing URLs. Those URLs are going to be requested using the request-promise package. However, 9000 concurrent GET requests to the same website from the same client is neither liked by the server or the client, so I want to spread them out over time. I have looked around a bit and found Promise.map together with the {concurrency: int} option here, which sounded like it would do what I want. But I cannot get it to work. My code looks like this:
const rp = require('request-promise');
var MongoClient = require('mongodb').MongoClient;
var URLarray = []; //This contains 9000 URLs
function getWebsite(url) {
rp(url)
.then(html => { /* Do some stuff */ })
.catch(err => { console.log(err) });
}
MongoClient.connect('mongodb://localhost:27017/some-database', function (err, client) {
Promise.map(URLArray, (url) => {
db.collection("some-collection").findOne({URL: url}, (err, data) => {
if (err) throw err;
getWebsite(url, (result) => {
if(result != null) {
console.log(result);
}
});
}, {concurrency: 1});
});
I think I probably misunderstand how to deal with promises. In this scenario I would have thought that, with the concurrency option set to 1, each URL in the array would in turn be used in the database search and then passed as a parameter to getWebsite, whose result would be displayed in its callback function. THEN the next element in the array would be processed.
What actually happens is that a few (maybe 10) of the URLs are fetch correctly, then the server starts to respond sporadically with 500 internal server error. After a few seconds, my computer freezes and then restarts (which I guess is due to some kind of panic?).
How can I attack this problem?
If the problem is really about concurrency, you can divide the work into chunks and chain the chunks.
Let's start with a function that does a mongo lookup and a get....
// answer a promise that resolves to data from mongo and a get from the web
// for a given a url, return { mongoResult, webResult }
// (assuming this is what OP wants. the OP appears to discard the mongo result)
//
function lookupAndGet(url) {
// use the promise-returning variant of findOne
let result = {}
return db.collection("some-collection").findOne({URL: url}).then(mongoData => {
result.mongoData = mongoData
return rp(url)
}).then(webData => {
result.webData = webData
return result
})
}
lodash and underscore both offer a chunk method that breaks an array into an array of smaller. Write your own or use theirs.
const _ = require('lodash')
let chunks = _.chunk(URLArray, 5) // say 5 is a reasonable concurrency
Here's the point of the answer, make a chain of chunks so you only perform the smaller size concurrently...
let chain = chunks.reduce((acc, chunk) => {
const chunkPromise = Promise.all(chunk.map(url => lookupAndGet(url)))
return acc.then(chunkPromise)
}, Promise.resolve())
Now execute the chain. The chunk promises will return chunk-sized arrays of results, so your reduced result will be an array of arrays. Fortunately, lodash and underscore both have a method to "flatten" the nested array.
// turn [ url, url, ...] into [ { mongoResult, webResult }, { mongoResult, webResult }, ...]
// running only 5 requests at a time
chain.then(result => {
console.log(_.flatten(result))
})

Node.js - Await for all the promises thrown inside a loop

I'm dealing with a loop in Node.js that performs two tasks for every iteration of a loop. To simplify, the code is summarized in:
Extract products metadata from a web page (blocking task).
Save all the products metadata to a database (asynchronous task).
The save operation (2) will perform about 800 operations in a database, and it doesn't need to block the main thread (I can still extracting products metadata from the web pages).
So, that being said, awaiting for the products to being saved doesn't have any sense. But if I throw the promises without awaiting for them, in the last iteration of the loop the Node.js process exits and all the pending operations are not finished.
Which is the best approach to solve this? Is it possible to achieve it without having a counter for finished promises or emitters? Thanks.
for (let shop of shops) {
// 1
const products = await extractProductsMetadata(shop);
// 2
await saveProductsMetadata(products);
}
Collect the promises in an array, then use Promise.all on it:
const storePromises = [];
for (let shop of shops) {
const products = await extractProductsMetadata(shop); //(1)
storePromises.push(saveProductsMetadata(products)); //(2)
}
await Promise.all(storePromises);
// ... all done (3)
Through that (1) will run one after each other, (2) will run in parallel, and (3) will run afterwards.
For sure you can also run (1) and (2) in parallel:
await Promise.all(shops.map(async shop => {
const products = await extractProductsMetadata(shop); //(1)
await saveProductsMetadata(products);
}));
And if an error occured in one of the promises, you can handle that with a try / catch block, to make sure all other shops won't be affected:
await Promise.all(shops.map(async shop => {
try {
const products = await extractProductsMetadata(shop); //(1)
await saveProductsMetadata(products);
} catch(error) {
// handle it here
}
}));
how to signal node to finish the process ?
You could manually call process.exit(0);, but that hides the real problem: NodeJS exits automatically if there is no listener attached anymore. That means that you should close all database connections / servers / etc. after the code above is done.
We are creating packs of data to treat. When we treat the data, we do all the get synchronously, and all the save asynchronously.
I have not handled the failure part, I let you add it to it. appropriate try/catch or function encapsulation will do it.
/**
* Call the given functions that returns promises in a queue
* options = context/args
*/
function promiseQueue(promisesFuncs, options = {}, _i = 0, _ret = []) {
return new Promise((resolve, reject) => {
if (_i >= promisesFuncs.length) {
return resolve(_ret);
}
// Call one
(promisesFuncs[_i]).apply(options.context || this, options.args || [])
.then((ret: any) => promiseQueue(promisesFuncs, _i + 1, options, [
..._ret,
ret,
]))
.then(resolve)
.catch(reject);
});
}
function async executePromiseAsPacks(arr, packSize, _i = 0) {
const toExecute = arr.slice(_i * packSize, packSize);
// Leave if we did execute all packs
if (toExecute.length === 0) return true;
// First we get all the data synchronously
const products = await promiseQueue(toExecute.map(x => () => extractProductsMetadata(x)));
// Then save the products asynchronously
// We do not put await here so it's truly asynchronous
Promise.all(toExecute.map((x, xi) => saveProductsMetadata(products[xi])));
// Call next
return executePromiseAsPacks(arr, packSize, _i + 1);
}
// Makes pack of data to treat (we extract synchronously and save asynchronously)
// Made to handle huge dataset
await executePromisesAsPacks(shops, 50);

Node.js - How to return callback with array from for loop with MySQL query?

I'm trying to get list of virtual communities on my Node.js app and then return it with callback function. When i call a getList() method with callback it returns a empty array.
const mysqli = require("../mysqli/connect");
class Communities{
getList(callback){
var list = [];
mysqli.query("SELECT * FROM communities", (err, communities) => {
for(let i = 0; i < communities.length; i++){
mysqli.query("SELECT name FROM users WHERE id='"+ communities[i].host +"'", (err, host) => {
list.push({
"id": communities[i].id,
"name": communities[i].name,
"hostID": communities[i].host,
"hostName": host[0].name,
"verified": communities[i].verified,
"people": communities[i].people
});
});
}
callback(list);
});
}
}
new Communities().getList((list) => {
console.log(list);
});
I need to make for loop to asynchronous and call callback when for loop ends. Please let me know how to do this. Thanks.
Callbacks get really ugly if you have to combine multiple of them, thats why Promises were invented to simplify that. To use Promises in your case you have to create a Promise first when querying the database¹:
const query = q => new Promise((resolve, reject) => mysqli.query(q, (err, result) => err ? reject(err) : resolve(result)));
Now doing multiple queries will return multiple promises, that can be combined using Promise.all to one single promise²:
async getList(){
const communities = await query("SELECT * FROM communities");
const result = await/*³*/ Promise.all(communities.map(async community => {
const host = await query(`SELECT name FROM users WHERE id='${community.host}'`);/*⁴*/
return {
...community,
hostName: host[0].name,
};
}));
return result;
}
Now you can easily get the result with:
new Communities().getList().then(list => {
console.log(list);
});
Read on:
Working with Promises - Google Developers
Understanding async / await - Ponyfoo
Notes:
¹: If you do that more often, you should probably rather use a mysql library that does support promises natively, that safes a lot of work.
²: Through that the requests are done in parallel, which means, that it is way faster than doing one after another (which could be done using a for loop & awaiting inside of it).
³: That await is superfluous, but I prefer to keep it to mark it as an asynchronous action.
⁴: I guess that could also be done using one SQL query, so if it is too slow for your usecase (which I doubt) you should optimize the query itself.

Sending API calls in batches

I'm currently trying to simulate half a million IoT devices to push payload to Azure IoT Hub using nodejs. Since node is multi-threaded in nature, its flooding iot hub with data and i am getting network errors.
I also tried async/await method but that is taking a lot of time to push data to IoT Hub.
Is there a way to only run 100 calls in parallel, wait for all of them to complete and then run the next 100 in node?
Much appreciated!
Build your batches as a nested array of Promises, then use Promise.all
on each batch in a loop that awaits for each Promise.all to resolve.
// This is a mock request function, could be a `request` call
// or a database query; whatever it is, it MUST return a Promise.
const sendRequest = () => {
return new Promise((resolve) => {
setTimeout(() => {
console.log('request sent')
resolve()
}, 1000)
})
}
// 5 batches * 2 requests = 10 requests.
const batches = Array(5).fill(Array(2).fill(sendRequest))
;(async function() {
for (const batch of batches) {
try {
console.log('-- sending batch --')
await Promise.all(batch.map(f => f()))
} catch(err) {
console.error(err)
}
}
})()
If you are using lodash you can make it a bit easier by using chunk which will divide an array into chunks of provided max size
So in your case you can use it like this
variable calls (array of 550 lets say)
const batchCalls = _.chunk(calls, 100);
for (const batchCall of batchCalls) {
await Promise.all(batchCall.map(call => call())) // makes a hundred calls in series
}
You can readily use bluebird Promise's map with concurrency option. This processes the max records as mentioned in the concurrency, before picking up the next batch.
example :
Promise.map([], {concurrency : 100})
limited-request-queue could be used to queue the request. There are options to set the Maximum number of connections at any given time. Below is the code we used to send 5 request every second. Also there will only be 5 request sent at any given time.
limited-request-queue
/*
Request passed to Targer App (5 requests per seconds)
Get the response for each request and passed the response to Source App
maxSockets: The maximum number of connections allowed at any given time. A value of 0 will prevent anything from going out. A value of Infinity will provide no concurrency limiting.
maxSocketsPerHost:The maximum number of connections per host allowed at any given time. A value of 0 will prevent anything from going out. A value of Infinity will provide no per-host concurrency limiting.
rateLimit: The number of milliseconds to wait before each maxSocketsPerHost
*/
var queue1 = new RequestQueue({'maxSockets': 5, 'maxSocketsPerHost': 5, 'rateLimit': 1000}, {
item: function(input, done) {
request(input.url, function(error, response) {
input.res.send(response.body);
done();
});
},
end: function() {
console.log("Queue 1 completed!");
}
});
//To queue request - A for loop could be used to send multiple request
queue1.enqueue({'url': ''});
If I'm not mistaken, you can use the 'array' of items and the Promise.all() method (or in your case .allSettled() to just see the results of each call) and then process each one inside it like this:
function chunk (items, size) {
const chunks = [];
items = [].concat(...items);
while (items.length) { chunks.push(items.splice(0, size)); }
return chunks;
}
async function ProcessDevice(device) {
// do your work here
}
// splice your items into chunks of 100, then process each chunk
// catching the result of each ProcessDevice in the chunk.map
// the results of the chunk are passed into the .then( )
// and you have a .catch( ) in case there's an error anywhere in the items
var jobArray = chunk(items,100);
for (let i = 0; i < jobArray.length; i++) {
Promise.allSettled(
jobArray[i].map(ja => ProcessDevice(ja))
.then(function(results) { console.log("PromiseResults: " + results); })
.catch((err) => { console.log("error: " + err); });
}

await loop vs Promise.all [duplicate]

This question already has answers here:
Any difference between await Promise.all() and multiple await?
(6 answers)
Closed 4 years ago.
Having a set of async operations on db to do, I'm wondering what's the difference performance-wise of doing a "blocking" await loop versus a Promise.all.
let insert = (id,value) => {
return new Promise(function (resolve, reject) {
connnection.query(`insert into items (id,value) VALUES (${id},"${value}")`, function (err, result) {
if (err) return reject(err)
return resolve(result);
});
});
};
Promise.all solution (it needs a for loop to builds the array of promises..)
let inserts = [];
for (let i = 0; i < SIZE; i++) inserts.push(insert(i,"..string.."))
Promise.all(inserts).then(values => {
console.log("promise all ends");
});
await loop solution
let inserts = [];
(async function loop() {
for (let i = 0; i < SIZE; i++) {
await insert(i, "..string..")
}
console.log("await loop ends");
})
Edit: thanks for the anwsers, but I would dig into this a little more.
await is not really blocking, we all know that, it's blocking in its own code block. An await loop sequentially fire requests, so if in the middle 1 requests takes longer, the other ones waits for it.
Well this is similar to Promise.all: if a 1 req takes longer, the callback is not executed until ALL the responses are returned.
Your example of using Promise.all will create all promises first before waiting for them to resolve. This means that your requests will fire concurrently and the callback given to Promise.all(...).then(thisCallback) will only fire if all requests were successful.
Note: promise returned from Promise.all will reject as soon as one of the promises in the given array rejects.
const SIZE = 5;
const insert = i => new Promise(resolve => {
console.log(`started inserting ${i}`);
setTimeout(() => {
console.log(`inserted ${i}`);
resolve();
}, 300);
});
// your code
let inserts = [];
for (let i = 0; i < SIZE; i++) inserts.push(insert(i, "..string.."))
Promise.all(inserts).then(values => {
console.log("promise all ends");
});
// requests are made concurrently
// output
// started inserting 0
// started inserting 1
// started inserting 2
// ...
// started inserting 4
// inserted 0
// inserted 1
// ...
// promise all ends
Note: It might be cleaner to use .map instead of a loop for this scenario:
Promise.all(
Array.from(Array(SIZE)).map((_, i) => insert(i,"..string.."))
).then(values => {
console.log("promise all ends");
});
Your example of using await on the other hand, waits for each promise to resolve before continuing and firing of the next one:
const SIZE = 5;
const insert = i => new Promise(resolve => {
console.log(`started inserting ${i}`);
setTimeout(() => {
console.log(`inserted ${i}`);
resolve();
}, 300);
});
let inserts = [];
(async function loop() {
for (let i = 0; i < SIZE; i++) {
await insert(i, "..string..")
}
console.log("await loop ends");
})()
// no request is made until the previous one is finished
// output
// started inserting 0
// inserted 0
// started inserting 1
// ...
// started inserting 4
// inserted 4
// await loop ends
The implications for performance in the above cases are directly correlated to their different behavior.
If "efficient" for your use case means to finish up the requests as soon as possible, then the first example wins because the requests will be happening around the same time, independently, whereas in the second example they will happen in a serial fashion.
In terms of complexity, the time complexity for your first example is equal to O(longestRequestTime) because the requests will happen essentially in parallel and thus the request taking the longest will drive the worst-case scenario.
On the other hand, the await example has O(sumOfAllRequestTimes) because no matter how long individual requests take, each one has to wait for the previous one to finish and thus the total time will always include all of them.
To put things in numbers, ignoring all other potential delays due to the environment and application in which the code is ran, for 1000 requests, each taking 1s, the Promise.all example would still take ~1s while the await example would take ~1000s.
Maybe a picture would help:
Note: Promise.all won't actually run the requests exactly in parallel and the performance in general will greatly depend on the exact environment in which the code is running and the state of it (for instance the event loop) but this is a good approximation.
The major difference between the two approaches is that
The await version issues server requests sequentially in the loop. If one of them errors without being caught, no more requests are issued. If request errors are trapped using try/catch blocks, you can identify which request failed and perhaps code in some some form of recovery or even retry the operation.
The Promise.all version will make server requests in or near parallel fashion, limited by browser restrictions on the maximum number of concurrent requests permitted. If one of the requests fails the Promise.all returned promise fails immediately. If any requests were successful and returned data, you lose the data returned. In addition if any request fails, no outstanding requests are cancelled - they were initiated in user code (the insert function) when creating the array of promises.
As mentioned in another answer, await is non blocking and returns to the event loop until its operand promise is settled. Both the Promise.all and await while looping versions allow responding to other events while requests are in progress.
Each has different advantages, it's up to us which one we need to solve our problem.
await loop
for(let i = 0;i < SIZE; i++){
await promiseCall();
}
It will call all promises in parallel if any promise rejected it won't have any effect on other promises.
In ES2018 it has simplified for certain situation like if you want to call the second iteration only if the first iteration got finished, refer the following ex.
async function printFiles () {
const files = await getFilePaths()
for await (const file of fs.readFile(file, 'utf8')) {
console.log(contents)
}
}
Promise.all()
var p1 = Promise.resolve(32);
var p2 = 123;
var p3 = new Promise((resolve, reject) => {
setTimeout(() => {
resolve("foo");
}, 100);
});
Promise.all([p1, p2, p3]).then(values => {
console.log(values); // [32, 123, "foo"]
});
This will execute every promise sequentially and finally return combined revolved values array.
If any one of these promise get rejected it will return value of that rejected promise only. follow following ex,
var p1 = Promise.resolve(32);
var p2 = Promise.resolve(123);
var p3 = new Promise((resolve, reject) => {
setTimeout(() => {
resolve("foo");
}, 100);
});
Promise.all([p1, p2, p3]).then(values => {
console.log(values); // 123
});

Categories

Resources