Sending API calls in batches - javascript

I'm currently trying to simulate half a million IoT devices to push payload to Azure IoT Hub using nodejs. Since node is multi-threaded in nature, its flooding iot hub with data and i am getting network errors.
I also tried async/await method but that is taking a lot of time to push data to IoT Hub.
Is there a way to only run 100 calls in parallel, wait for all of them to complete and then run the next 100 in node?
Much appreciated!

Build your batches as a nested array of Promises, then use Promise.all
on each batch in a loop that awaits for each Promise.all to resolve.
// This is a mock request function, could be a `request` call
// or a database query; whatever it is, it MUST return a Promise.
const sendRequest = () => {
return new Promise((resolve) => {
setTimeout(() => {
console.log('request sent')
resolve()
}, 1000)
})
}
// 5 batches * 2 requests = 10 requests.
const batches = Array(5).fill(Array(2).fill(sendRequest))
;(async function() {
for (const batch of batches) {
try {
console.log('-- sending batch --')
await Promise.all(batch.map(f => f()))
} catch(err) {
console.error(err)
}
}
})()

If you are using lodash you can make it a bit easier by using chunk which will divide an array into chunks of provided max size
So in your case you can use it like this
variable calls (array of 550 lets say)
const batchCalls = _.chunk(calls, 100);
for (const batchCall of batchCalls) {
await Promise.all(batchCall.map(call => call())) // makes a hundred calls in series
}

You can readily use bluebird Promise's map with concurrency option. This processes the max records as mentioned in the concurrency, before picking up the next batch.
example :
Promise.map([], {concurrency : 100})

limited-request-queue could be used to queue the request. There are options to set the Maximum number of connections at any given time. Below is the code we used to send 5 request every second. Also there will only be 5 request sent at any given time.
limited-request-queue
/*
Request passed to Targer App (5 requests per seconds)
Get the response for each request and passed the response to Source App
maxSockets: The maximum number of connections allowed at any given time. A value of 0 will prevent anything from going out. A value of Infinity will provide no concurrency limiting.
maxSocketsPerHost:The maximum number of connections per host allowed at any given time. A value of 0 will prevent anything from going out. A value of Infinity will provide no per-host concurrency limiting.
rateLimit: The number of milliseconds to wait before each maxSocketsPerHost
*/
var queue1 = new RequestQueue({'maxSockets': 5, 'maxSocketsPerHost': 5, 'rateLimit': 1000}, {
item: function(input, done) {
request(input.url, function(error, response) {
input.res.send(response.body);
done();
});
},
end: function() {
console.log("Queue 1 completed!");
}
});
//To queue request - A for loop could be used to send multiple request
queue1.enqueue({'url': ''});

If I'm not mistaken, you can use the 'array' of items and the Promise.all() method (or in your case .allSettled() to just see the results of each call) and then process each one inside it like this:
function chunk (items, size) {
const chunks = [];
items = [].concat(...items);
while (items.length) { chunks.push(items.splice(0, size)); }
return chunks;
}
async function ProcessDevice(device) {
// do your work here
}
// splice your items into chunks of 100, then process each chunk
// catching the result of each ProcessDevice in the chunk.map
// the results of the chunk are passed into the .then( )
// and you have a .catch( ) in case there's an error anywhere in the items
var jobArray = chunk(items,100);
for (let i = 0; i < jobArray.length; i++) {
Promise.allSettled(
jobArray[i].map(ja => ProcessDevice(ja))
.then(function(results) { console.log("PromiseResults: " + results); })
.catch((err) => { console.log("error: " + err); });
}

Related

Firestore batch insert 375 documents per commit, but not 500 documents. Why?

I'm tryng insert a little more them 1400 objects inside my firestore database by cloud function (with 540 seconds of timeout) using this code:
...
const response = await fetch(url)
if (response.ok) {
const json = await response.json()
if (json.hasOwnProperty('data')) {
const teams = json[`data`]
var players = teams.flatMap((team) => {
return team.squad.data
})
var playersBatch = []
while (players.length > 0) {
const playerBatch = players.splice(0, 375)
playersBatch.push(playerBatch)
}
for (playerBatch of playersBatch) {
const batch = database.batch()
for (player of playerBatch) {
const reference = database
.collection(`players`)
.doc(`${player.player_id}`)
batch.set(reference, player, { merge: true })
}
await batch.commit()
}
} else {
...
}
} else {
...
}
...
The above code work for me, but work when I insert 375 documents per batch, when I try to insert 500 documents the batch commit not work in first loop and give me a timeout exception.
Function execution took 540005 ms, finished with status: 'timeout'
Can batch produces timeout? Has batch any limitation when inserting huge documents? Why I can insert 375 but not 500 each time?
TLDR: It's not the batch that is timing out but it requires more than 9 minutes to be completed, however your function is timing out after 9 minutes, which is it's upper limit for timeouts and this is why you get this error.
The problem in this case is the cloud function itself, not the batched write.
As you can see in this documentation:
A batched write can contain up to 500 operations
However, the cloud functions documentation for timeout says the following:
Function execution time is limited by the timeout duration, which you can specify at function deployment time. By default, a function times out after 1 minute, but you can extend this period up to 9 minutes.
If you convert the 540005 ms you will get roughly 9 minutes, which is the upper limit of Cloud Functions execution before timeout. So this is why you can't operate a 500 records batch but you can with 375 records, as it is under the 9 minutes timeout limit of the Cloud Function.
If I've understood what you are doing correctly, these are the steps you are trying to perform:
Fetch the URL
For each team in the response, extract a list of all players
For each player, update their data in the database
Slightly shuffling your code around and moving await batch.commit() out of the for loop (which made your code wait for each batch to complete before moving to the next), gives:
const response = await fetch(url)
if (!response.ok || response.status === 204) {
// A 204 code will break response.json() with a parsing error
// You might want to check for the 429 status here
throw new Error(`Unexpected status code ${response.status}!`)
}
const json = await response.json() // note: empty bodies will throw a parsing error
if (!json.hasOwnProperty("data")) {
throw new Error(`Unexpected response body!`, json)
}
const teams = json["data"]
const players = teams.flatMap((team) => {
return team.squad.data // the array of players in this team
})
const playersInBatches = [];
while (players.length > 0) {
const thisPlayerBatch = players.splice(0, 500)
playersInBatches.push(thisPlayerBatch)
}
const batches = playersInBatches.map((playersInThisBatch) => {
const dbBatch = database.batch()
for (let player of playersInThisBatch) {
const reference = database
.collection("players")
.doc(`${player.player_id}`)
dbBatch.set(reference, player, { merge: true })
}
return dbBatch
}
// commit all batches in parallel and wait for them to finish
await Promise.all(batches.map((b) => b.commit()))
console.log("Synced successfully!")
Notes:
As an expansion to what you are doing, you may want to store the response's caching headers like the ETag or Last-Modified. This allows you to ask the third-party server if the data has changed at all before downloading it.
I've replaced your if (condition) { /* do lots of work */ } else { /* do small amount of work to handle error */ } with if (!condition) { /* do small amount of work to handle error */ return; } /* do lots of work */. This is known as "failing-fast" and is used to prevent large and/or nested if-else trees while also showing your error handling next to what's causing the error.
If any one batch fails, then the others may not be successfully written to the database, but this bug is in your original code too. You could change the last lines to the following to make them not kill the other batches:
// commit all batches in parallel and wait for them to finish
const results = await Promise.all(batches.map(
(b) => b.commit().then(
() => ({success: true}),
(error) => ({success: false, error})
)
))
let succeeded = 0, failed = 0
results.forEach(result => result.success ? succeeded++ : failed++)
if (failed > 0) {
console.log(`Synced ${succeeded}/${results.length} batches of players successfully!`)
return
}
console.log("Synced all players successfully!")
You also aren't the first to encounter this problem of batching your writes to your database. It's common enough that there is this MultiBatch utility class that handles the batches for you.
const response = await fetch(url)
if (!response.ok || response.status === 204) {
// A 204 code will break response.json() with a parsing error
// You might want to check for the 429 status here
throw new Error(`Unexpected status code ${response.status}!`)
}
const json = await response.json() // note: empty bodies will throw a parsing error
if (!json.hasOwnProperty("data")) {
throw new Error(`Unexpected response body!`, json)
}
const teams = json["data"]
const multiBatch = new MultiBatch(database)
const playersColRef = database.collection("players")
teams.forEach((team) => {
team.squad.data // the array of players in this team
.forEach(player => {
const reference = playersColRef.doc(`${player.player_id}`)
multiBatch.set(reference, player, { merge: true })
})
})
await multiBatch.commit(/* pass true here to suppress errors */)
console.log("Synced successfully!")

Too many simultaneous requests with NodeJS+request-promise

I have NodeJS project with a BIG array (about 9000 elements) containing URLs. Those URLs are going to be requested using the request-promise package. However, 9000 concurrent GET requests to the same website from the same client is neither liked by the server or the client, so I want to spread them out over time. I have looked around a bit and found Promise.map together with the {concurrency: int} option here, which sounded like it would do what I want. But I cannot get it to work. My code looks like this:
const rp = require('request-promise');
var MongoClient = require('mongodb').MongoClient;
var URLarray = []; //This contains 9000 URLs
function getWebsite(url) {
rp(url)
.then(html => { /* Do some stuff */ })
.catch(err => { console.log(err) });
}
MongoClient.connect('mongodb://localhost:27017/some-database', function (err, client) {
Promise.map(URLArray, (url) => {
db.collection("some-collection").findOne({URL: url}, (err, data) => {
if (err) throw err;
getWebsite(url, (result) => {
if(result != null) {
console.log(result);
}
});
}, {concurrency: 1});
});
I think I probably misunderstand how to deal with promises. In this scenario I would have thought that, with the concurrency option set to 1, each URL in the array would in turn be used in the database search and then passed as a parameter to getWebsite, whose result would be displayed in its callback function. THEN the next element in the array would be processed.
What actually happens is that a few (maybe 10) of the URLs are fetch correctly, then the server starts to respond sporadically with 500 internal server error. After a few seconds, my computer freezes and then restarts (which I guess is due to some kind of panic?).
How can I attack this problem?
If the problem is really about concurrency, you can divide the work into chunks and chain the chunks.
Let's start with a function that does a mongo lookup and a get....
// answer a promise that resolves to data from mongo and a get from the web
// for a given a url, return { mongoResult, webResult }
// (assuming this is what OP wants. the OP appears to discard the mongo result)
//
function lookupAndGet(url) {
// use the promise-returning variant of findOne
let result = {}
return db.collection("some-collection").findOne({URL: url}).then(mongoData => {
result.mongoData = mongoData
return rp(url)
}).then(webData => {
result.webData = webData
return result
})
}
lodash and underscore both offer a chunk method that breaks an array into an array of smaller. Write your own or use theirs.
const _ = require('lodash')
let chunks = _.chunk(URLArray, 5) // say 5 is a reasonable concurrency
Here's the point of the answer, make a chain of chunks so you only perform the smaller size concurrently...
let chain = chunks.reduce((acc, chunk) => {
const chunkPromise = Promise.all(chunk.map(url => lookupAndGet(url)))
return acc.then(chunkPromise)
}, Promise.resolve())
Now execute the chain. The chunk promises will return chunk-sized arrays of results, so your reduced result will be an array of arrays. Fortunately, lodash and underscore both have a method to "flatten" the nested array.
// turn [ url, url, ...] into [ { mongoResult, webResult }, { mongoResult, webResult }, ...]
// running only 5 requests at a time
chain.then(result => {
console.log(_.flatten(result))
})

Node, wait and retry api calls that fail

So I fetch an array of urls from api with a rate limit, currently I handle this by adding a timeout to each call like this:
const calls = urls.map((url, i) =>
new Promise(resolve => setTimeout(resolve, 250 * i))
.then(() => fetch(url)
)
);
const data = await Promise.all(calls);
forcing a 250ms wait between each call. This ensures that the rate limit is never exceeded.
The thing is, this isn't really necessary. I've tried with 0ms wait time, and most of the cases I have to repeatedly reload the page four or five times before the api starts to return:
{ error: { status: 429, message: 'API rate limit exceeded' } }
and most of the times you only have to wait a second or so before you can safely reload the page and get all data.
A more reasonable approach would be to collect the calls that return 429 (if they do), wait for a set amount of time and then retry them (and perhaps redo this a set amount of times).
Problem, I'm a bit stumped as to how one would go about achieving this?
EDIT:
Just got home and will look through the answers but there seem to have been an assumption made which I don't believe is necessary: The calls does not have to be sequential, they can be fired (and returned) in any order.
The term for what you want is exponential backoff. You can modify your code so that it continues trying on a certain failure condition:
const max_wait = 2000;
async function wait(ms) {
return new Promise(resolve => {
setTimeout(resolve, ms);
});
}
const calls = urls.map(async (url) => {
let retry = 0, result;
do {
if (retry !== 0) { await wait(Math.pow(2, retry); }
result = await fetch(url);
retry++;
} while(result.status !== 429 || (Math.pow(2, retry) > max_wait))
return result;
}
Or you can try using a library to handle the backoff for you like https://github.com/MathieuTurcotte/node-backoff
If I understand the question right, your trying to:
a) Execute fetch() calls sequentially (with a possibly optional delay)
b) Retry failed requests with a backoff delay
As you likely found out, .map() does not really help with a) as it does not wait for any async stuff when iterating (which is why you create a greater and greater timeout with i*250).
I personally find it the easiest to keep things sequential by using a for of loop instead, as this will work nicely with async/await:
const fetchQueue = async (urls, delay = 0, retries = 0, maxRetries = 3) => {
const wait = (timeout = 0) => {
if (timeout) { console.log(`Waiting for ${timeout}`); }
return new Promise(resolve => {
setTimeout(resolve, timeout);
});
};
for (url of urls) {
try {
await wait(retries ? retries * Math.max(delay, 1000) : delay);
let response = await fetch(url);
let data = await (
response.headers.get('content-type').includes('json')
? response.json()
: response.text()
);
response = {
headers: [...response.headers].reduce((acc, header) => {
return {...acc, [header[0]]: header[1]};
}, {}),
status: response.status,
data: data,
};
// in reality, only do that for errors
// that make sense to retry
if ([404, 429].includes(response.status)) {
throw new Error(`Status Code ${response.status}`);
}
console.log(response.data);
} catch(err) {
console.log('Error:', err.message);
if (retries < maxRetries) {
console.log(`Retry #${retries+1} ${url}`);
await fetchQueue([url], delay, retries+1, maxRetries);
} else {
console.log(`Max retries reached for ${url}`);
}
}
}
};
// populate some real URLs urls to fetch
// index 0 will generate an inexistent URL to test error behaviour
const urls = new Array(101).fill(null).map((x, i) => `https://jsonplaceholder.typicode.com/todos/${i}`);
// fetch urls one after another (sequentially)
// and delay each request by 250ms
fetchQueue(urls, 250);
If a request fails (e.g. you get one of the errors specified in the array with error status codes), the above function will retry them a maximum of 3 times (by default) with a backoff delay that increases by a second on each retry.
As you wrote, the delay between requests is probably not necessary, so you could just remove the 250 in the function call. Because each request is executed one after the other, you're less likely to run into rate limit issues but if you do, it's very easy to add some custom delay.
Here is an example that allows to handle an array of promises sequencially, by setting a delay expressed in milliseconds and accepting a third callback determining whether the request should be retried.
In the below code, some sample requests are mocked to:
Test a successful response.
Test an error response. If the error response contains an error code and the error code is 403, true is returned and the call is retried in the next run (delayed by x milliseconds).
Test an error response without an error code.
There is a global counter below that give up the promise after N tries (in the below example 5), all of that is handled in this code:
const result = await resolveSequencially(promiseTests, 250, (err) => {
return ++errorCount, !!(err && err.error && err.error.status === 403 && errorCount <= 5);
});
Where the error count is first increased and it returns true if the error is defined, has an error property and its status is 403.
Of course, the example is just to test things out, but I think you're looking for something allowing you to have a cleverer control over the promise loop cycle, hence here is a solution doing just that.
I will add some comments below, you can run the test below to check what happens directly in the console.
// Nothing that relevant, this one is just for testing purposes!
let errorCount = 0;
// Declare the function.
const resolveSequencially = (promises, delay, onFailed, onFinished) => {
// store the results.
const results = [];
// Define a self invoking recursiveHandle function.
(recursiveHandle = (current, max) => { // current is the index of the currently looped promise, max is the maximum needed.
console.log('recursiveHandle invoked, current is, ', current ,'max is', max);
if (current === max) onFinished(results); // <-- if all the promises have been looped, resolve.
else {
// Define a method to handle the promise.
let handlePromise = () => {
console.log('about to handle promise');
const p = promises[current];
p.then((success) => {
console.log('success invoked!');
results.push(success);
// if it's successfull, push the result and invoke the next element.
recursiveHandle(current + 1, max);
}).catch((err) => {
console.log('An error was catched. Invoking callback to check whether I should retry! Error was: ', err);
// otherwise, invoke the onFailed callback.
const retry = onFailed(err);
// if retry is true, invoke again the recursive function with the same indexes.
console.log('retry is', retry);
if (retry) recursiveHandle(current, max);
else recursiveHandle(current + 1, max); // <-- otherwise, procede regularly.
});
};
if (current !== 0) setTimeout(() => { handlePromise() }, delay); // <-- if it's not the first element, invoke the promise after the desired delay.
else handlePromise(); // otherwise, invoke immediately.
}
})(0, promises.length); // Invoke the IIFE with a initial index 0, and a maximum index which is the length of the promise array.
}
const promiseTests = [
Promise.resolve(true),
Promise.reject({
error: {
status: 403
}
}),
Promise.resolve(true),
Promise.reject(null)
];
const test = () => {
console.log('about to invoke resolveSequencially');
resolveSequencially(promiseTests, 250, (err) => {
return ++errorCount, !!(err && err.error && err.error.status === 403 && errorCount <= 5);
}, (done) => {
console.log('finished! results are:', done);
});
};
test();

Node.js - Await for all the promises thrown inside a loop

I'm dealing with a loop in Node.js that performs two tasks for every iteration of a loop. To simplify, the code is summarized in:
Extract products metadata from a web page (blocking task).
Save all the products metadata to a database (asynchronous task).
The save operation (2) will perform about 800 operations in a database, and it doesn't need to block the main thread (I can still extracting products metadata from the web pages).
So, that being said, awaiting for the products to being saved doesn't have any sense. But if I throw the promises without awaiting for them, in the last iteration of the loop the Node.js process exits and all the pending operations are not finished.
Which is the best approach to solve this? Is it possible to achieve it without having a counter for finished promises or emitters? Thanks.
for (let shop of shops) {
// 1
const products = await extractProductsMetadata(shop);
// 2
await saveProductsMetadata(products);
}
Collect the promises in an array, then use Promise.all on it:
const storePromises = [];
for (let shop of shops) {
const products = await extractProductsMetadata(shop); //(1)
storePromises.push(saveProductsMetadata(products)); //(2)
}
await Promise.all(storePromises);
// ... all done (3)
Through that (1) will run one after each other, (2) will run in parallel, and (3) will run afterwards.
For sure you can also run (1) and (2) in parallel:
await Promise.all(shops.map(async shop => {
const products = await extractProductsMetadata(shop); //(1)
await saveProductsMetadata(products);
}));
And if an error occured in one of the promises, you can handle that with a try / catch block, to make sure all other shops won't be affected:
await Promise.all(shops.map(async shop => {
try {
const products = await extractProductsMetadata(shop); //(1)
await saveProductsMetadata(products);
} catch(error) {
// handle it here
}
}));
how to signal node to finish the process ?
You could manually call process.exit(0);, but that hides the real problem: NodeJS exits automatically if there is no listener attached anymore. That means that you should close all database connections / servers / etc. after the code above is done.
We are creating packs of data to treat. When we treat the data, we do all the get synchronously, and all the save asynchronously.
I have not handled the failure part, I let you add it to it. appropriate try/catch or function encapsulation will do it.
/**
* Call the given functions that returns promises in a queue
* options = context/args
*/
function promiseQueue(promisesFuncs, options = {}, _i = 0, _ret = []) {
return new Promise((resolve, reject) => {
if (_i >= promisesFuncs.length) {
return resolve(_ret);
}
// Call one
(promisesFuncs[_i]).apply(options.context || this, options.args || [])
.then((ret: any) => promiseQueue(promisesFuncs, _i + 1, options, [
..._ret,
ret,
]))
.then(resolve)
.catch(reject);
});
}
function async executePromiseAsPacks(arr, packSize, _i = 0) {
const toExecute = arr.slice(_i * packSize, packSize);
// Leave if we did execute all packs
if (toExecute.length === 0) return true;
// First we get all the data synchronously
const products = await promiseQueue(toExecute.map(x => () => extractProductsMetadata(x)));
// Then save the products asynchronously
// We do not put await here so it's truly asynchronous
Promise.all(toExecute.map((x, xi) => saveProductsMetadata(products[xi])));
// Call next
return executePromiseAsPacks(arr, packSize, _i + 1);
}
// Makes pack of data to treat (we extract synchronously and save asynchronously)
// Made to handle huge dataset
await executePromisesAsPacks(shops, 50);

HTTP Request promises and concurrency

Imagine a request triggered with:
http.get('/endpoint')
Imagine 10 requests triggered with:
const promises = _.range(10).map(i => http.get('/endpoint'))
If I want to execute all the requests simultaneously, I generally use
const results = await Promise.all(promises)
Now, let's imagine that I want to only execute 2 requests at a time, will Promise.all() on only two items be enough to execute only 2 requests at a time or will still all requests be triggered at the same time?
const promises = _.range(10).map(i => http.get('/endpoint'))
let results = []
for (let chunk of _.chunk(promises, 2)) {
results = results.concat(await Promise.all(chunk))
}
If it still executes the 10 requests at the same time, how could I prevent this behavior?
Note: _ refers to the lodash library in order to make the question simpler
Use bluebird Promise.map that allow to configure concurrency - the concurrency limit applies to Promises returned by the mapper function and it basically limits the number of Promises created.
Promise.map(_.range(10), () => {...}, {concurrency: 2})
You can use mapLimit from async.
https://caolan.github.io/async/docs.html#mapLimit
import mapLimit from 'async/mapLimit';
const ids = _.range(10);
const maxParallel = 2;
mapLimit(ids, maxParallel, (id, callback) => {
http.get(`/endpoint/${id}`)
.then(result => callback(null, result))
.catch(error => callback(error, null));
}, (error, results) {
//Some stuff
});

Categories

Resources