I'm implementing a query engine that mass fetches and processes requests. I am using async/await.
Right now the flow of execution runs in a hierarchy where there is a list of items containing queries, and each of those queries have a fetch.
What I am trying to do is bundle the items in groups of n, so even if each of them have m queries with fetches inside, only n*m requests run simultaneously; and specially only one request will be made simultaneously to the same domain.
The problem is, when I await the execution of the items (at the outer level, in a while that groups items and will stop iterations until the promises resolve), those promises are resolving when the execution of an inner query is deferred because of the inner await of the fetch.
That causes my queuing while to only stop momentarily, instead of awaiting for the inner promises to resolve to.
This is the outer, queuing class:
class AsyncItemQueue {
constructor(items, concurrency) {
this.items = items;
this.concurrency = concurrency;
}
run = async () => {
let itemPromises = [];
const bundles = Math.ceil(this.items.length / this.concurrency);
let currentBundle = 0;
while (currentBundle < bundles) {
console.log(`<--------- FETCHING ITEM BUNDLE ${currentBundle} OF ${bundles} --------->`);
const lowerRange = currentBundle * this.concurrency;
const upperRange = (currentBundle + 1) * this.concurrency;
itemPromises.push(
this.items.slice(lowerRange, upperRange).map(item => item.run())
);
await Promise.all(itemPromises);
currentBundle++;
}
};
}
export default AsyncItemQueue;
This is the simple item class that queue is running. I'm omitting superfluous code.
class Item {
// ...
run = async () => {
console.log('Item RUN', this, this.name);
return await Promise.all(this.queries.map(query => {
const itemPromise = query.run(this.name);
return itemPromise;
}));
}
}
And this is the queries contained inside items. Every item has a list of queries. Again, some code is removed as it's not interesting.
class Query {
// ...
run = async (item) => {
// Step 1: If requisites, await.
if (this.requires) {
await this.savedData[this.requires];
}
// Step 2: Resolve URL.
this.resolveUrl(item);
// Step 3: If provides, create promise in savedData.
const fetchPromise = this.fetch();
if (this.saveData) {
this.saveData.forEach(sd => (this.savedData[sd] = fetchPromise));
}
// Step 4: Fetch.
const document = await fetchPromise;
// ...
}
}
The while in AsyncItemQueue is stopping correctly, but only until the execution flow reaches step 3 in Query. As soon as it reaches that fetch, which is a wrapper for the standard fetch functions, the outer promise resolves, and I end up with all the requests being performed at the same time.
I suspect the problem is somewhere in the Query class, but I am stumped as to how to avoid the resolution of the outer promise.
I tried making the Query class run function return document, just in case, but to no avail.
Any idea or guidance would be greatly appreciated. I'll try to answer any questions about the code or provide more if needed.
Thanks!
PS: Here is a codesandbox with a working example: https://codesandbox.io/s/goofy-tesla-iwzem
As you can see in the console exit, the while loop is iterating before the fetches finalize, and they are all being performed at the same time.
I've solved it.
The problem was in the AsyncItemQueue class. Specifically:
itemPromises.push(
this.items.slice(lowerRange, upperRange).map(item => item.run())
);
That was pushing a list of promises into the list, and so, later on:
await Promise.all(itemPromises);
Did not find any promises to wait in that list (because it contained more lists, with promises inside).
The solution was to change the code to:
await Promise.all(this.items.slice(lowerRange, upperRange).map(item => item.run()));
Now it is working perfectly. Items are being run in batches of n, and a new batch will not run until the previous has finished.
I'm not sure this will help anyone but me, but I'll leave it here in case somebody finds a similar problem someday. Thanks for the help.
Related
I have a piece of code that executes a long list of http requests, and I've written the code in such a way that it always has 4 requests running parallel. It just so happens that the server can handle 4 parallel requests the fastest. With less the code would work slower and with more the requests would take longer to finish. Anyway, here is the code:
const itemsToRemove = items.filter(
// ...
)
const removeItem = (item: Item) => item && // first it checks if item isn't undefined
// Then it creates a DELETE request
itemsApi.remove(item).then(
// And then whenever a request finishes,
// it adds the next request to the queue.
// This ensures that there will always
// be 4 requests running parallel.
() => removeItem(itemsToRemove.shift())
)
// I start with a chunk of the first 4 items.
const firstChunk = itemsToRemove.splice(0, 4)
await Promise.allSettled(
firstChunk.map(removeItem)
)
Now the problem with this code is that if list is very long (as in thousands of items), at some point the browser tab just crashes. Which is a little unhelpful, because I don't get to see a specific error message that tells me what went wrong.
But my guess is that this part of the code:
itemsApi.remove(item).then(
() => removeItem(itemsToRemove.shift())
)
May be creating a Maximum call stack size exceeded issue? Because in a way I'm constantly adding to the call stack, aren't I?
Do you think my guess is correct? And regardless of if your answer is yes or no, do you have an idea how I could achieve the same goal without crashing the browser tab? Can I refactor this code in a way that doesn't add to the call stack? (If I'm indeed doing that?)
The issue with your code is in
await Promise.allSettled(firstChunk.map(removeItem)
The argument passed to Promise.allSettled needs to be an array of Promises as per the documentation:
The Promise.allSettled() method returns a promise that fulfills after all of the given promises have either fulfilled or rejected, with an array of objects that each describes the outcome of each promise.
Your recursive function then runs all of the requests one after the other throwing a Maximum call stack size exceeded error and crashing your browser.
The solution I came up with (it could probably be shortened) is like so:
let items = []
while (items.length < 20) {
items = [...items, `item-${items.length + 1}`]
}
// A mockup of the API function that executes an asynchronous task and returns once it is resolved
async function itemsApi(item) {
await new Promise((resolve) => setTimeout(() => {resolve(item)}, 1000))
}
async function f(items) {
const itemsToRemove = items
// call the itemsApi and resolve the promise after the itemsApi function finishes
const removeItem = (item) => item &&
new Promise((resolve, reject) =>
itemsApi(item)
.then((res) => resolve(res))
.catch(e => reject(e))
)
// Recursive function that removes a chunk of 4 items after the previous chunk has been removed
function removeChunk(chunk) {
// exit the function once there is no more items in the array
if (itemsToRemove.length === 0) return
console.log(itemsToRemove)
// after the first 4 request finish, keep making a new chunk of 4 requests until the itemsToRemove array is empty
Promise.allSettled(chunk.map(removeItem))
.then(() => removeChunk(itemsToRemove.splice(0, 4)))
}
const firstChunk = itemsToRemove.splice(0, 4)
// initiate the recursive function
removeChunk(firstChunk)
}
f(items)
I hope this answers your question
this is the code from my route:
const hotspots = await Hotspot.find({user: req.user._id});
if(!hotspots) return res.render("hotspot");
let hotspotData = [];
let rewards = [];
function getApiData(){
return new Promise((resolve,reject)=>{
let error = false;
for (let hotspot of hotspots){
axios.get(`https://api.helium.io/v1/hotspots/${hotspot.hotspot}`)
.then((resp)=>hotspotData.push(resp.data));
axios.get(`https://api.helium.io/v1/hotspots/${hotspot.hotspot}/rewards/sum`)
.then((resp)=> {
console.log("first console log",resp.data.data.total)
rewards.push(resp.data.data.total) ;
console.log("Data gained from axios",resp.data.data.total);
})
}
if(!error) resolve();
else reject("Something went wrong");
})
}
getApiData().then(()=> {
console.log(hotspotData);
console.log(rewards);
res.render("hotspot",{hotspots: hotspotData, rewards: rewards});
});
Everything else is fine, but when I'm logging hotspotData and rewards, I'm getting an empty array but as you can see I've already pushed the data I got from APIs to those arrays respectively. But sti;; I'm getting empty array. Please help me. I have logged the API data and yes it's sending the correct data but I don't know why that data does not get pushed to the arrays.
Edit: This is the output which suggests my code is not getting executed in the way I want it to:
What happens here is that the code that sends back the response is executed before the axios responses arrive:
First, a call to a database (is that correct?) using a async/await approach.
Then you define a function that returns a Promise and call it processing the response in the then() function.
That piece of code is executed when the Promise is resolve()d (watch out there's no code to handle a rejected response).
Now the promise is resolved once the for loop ends, it doesn't wait
for the axios promises to return, it just triggers the requests and
continue. The code block after they return will be run in another moment.
If you want to solve this, one thing you could do is to use async/await everywhere, and make the calls sequentially, something like this:
const hotspots = await Hotspot.find(...);
function async getApiData() {
let hotspotData = [];
let rewards = [];
for loop {
hotspot = await axios.get(hotspot)
reward = await axios.get(reward)
...
// add both to the arrays
}
return { hotspotData, rewards }
}
const { hotspotData, rewards } = await getApiData();
console.log(hotspotData);
console.log(rewards);
res.render("hotspot",{hotspots: hotspotData, rewards: rewards});
Note: the code above is more pseudo-code than real javascript ok?
I think this could be an interesting read: https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/Async_await
Also, notice all the axios calls seem to be independent from each other, so perhaps you can run them asynchronously, and wait for all promises to be resolved to continue. Check this out for more info about it: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all
(make sure it works sequentially, first)
I think it should be something like an array of promises (one per hotspot and reward in the for loop), then returning something like:
return Promise.all([promise1, promise2, promise3]);
I'm making requests to an API, but their server only allows a certain number of active connections, so I would like to limit the number of ongoing fetches. For my purposes, a fetch is only completed (not ongoing) when the HTTP response body arrives at the client.
I would like to create an abstraction like this:
const fetchLimiter = new FetchLimiter(maxConnections);
fetchLimiter.fetch(url, options); // Returns the same thing as fetch()
This would make things a lot simpler, but there seems to be no way of knowing when a stream being used by other code ends, because streams are locked while they are being read. It is possible to use ReadableStream.tee() to split the stream into two, use one and return the other to the caller (possibly also constructing a Response with it), but this would degrade performance, right?
Since fetch uses promises, you can take advantage of that to make a simple queue system.
This is a method I've used before for queuing promise based stuff. It enqueues items by creating a Promise and then adding its resolver to an array. Of course until that Promise resolves, the await keeps any later promises from being invoked.
And all we have to do to start the next fetch when one finishes is just grab the next resolver and invoke it. The promise resolves, and then the fetch starts!
Best part, since we don't actually consume the fetch result, there's no worries about having to clone or anything...we just pass it on intact, so that you can consume it in a later then or something.
*Edit: since the body is still streaming after the fetch promise resolves, I added a third option so that you can pass in the body type, and have FetchLimiter retrieve and parse the body for you.
These all return a promise that is eventually resolved with the actual content.
That way you can just have FetchLimiter parse the body for you. I made it so it would return an array of [response, data], that way you can still check things like the response code, headers, etc.
For that matter, you could even pass in a callback or something to use in that await if you needed to do something more complex, like different methods of parsing the body depending on response code.
Example
I added comments to indicate where the FetchLimiter code begins and ends...the rest is just demo code.
It's using a fake fetch using a setTimeout, which will resolve between 0.5-1.5 secs. It will start the first three requests immediately, and then the actives will be full, and it will wait for one to resolve.
When that happens, you'll see the comment that the promise has resolved, then the next promise in the queue will start, and then you'll see the then from in the for loop resolve. I added that then just so you could see the order of events.
(function() {
const fetch = (resource, init) => new Promise((resolve, reject) => {
console.log('starting ' + resource);
setTimeout(() => {
console.log(' - resolving ' + resource);
resolve(resource);
}, 500 + 1000 * Math.random());
});
function FetchLimiter() {
this.queue = [];
this.active = 0;
this.maxActive = 3;
this.fetchFn = fetch;
}
FetchLimiter.prototype.fetch = async function(resource, init, respType) {
// if at max active, enqueue the next request by adding a promise
// ahead of it, and putting the resolver in the "queue" array.
if (this.active >= this.maxActive) {
await new Promise(resolve => {
this.queue.push(resolve); // push, adds to end of array
});
}
this.active++; // increment active once we're about to start the fetch
const resp = await this.fetchFn(resource, init);
let data;
if (['arrayBuffer', 'blob', 'json', 'text', 'formData'].indexOf(respType) >= 0)
data = await resp[respType]();
this.active--; // decrement active once fetch is done
this.checkQueue(); // time to start the next fetch from queue
return [resp, data]; // return value from fetch
};
FetchLimiter.prototype.checkQueue = function() {
if (this.active < this.maxActive && this.queue.length) {
// shfit, pulls from start of array. This gives first in, first out.
const next = this.queue.shift();
next('resolved'); // resolve promise, value doesn't matter
}
}
const limiter = new FetchLimiter();
for (let i = 0; i < 9; i++) {
limiter.fetch('/mypage/' + i)
.then(x => console.log(' - .then ' + x));
}
})();
Caveats:
I'm not 100% sure if the body is still streaming when the promise resolves...that seems to be a concern for you. However if that's a problem you could use one of the Body mixin methods like blob or text or json, which doesn't resolve until the body content is completely parsed (see here)
I intentionally kept it very short (like 15 lines of actual code) as a very simple proof of concept. You'd want to add some error handling in production code, so that if the fetch rejects because of a connection error or something that you still decrement the active counter and start the next fetch.
Of course it's also using async/await syntax, because it's so much easier to read. If you need to support older browsers, you'd want to rewrite with Promises or transpile with babel or equivalent.
I understand that this is a basic question, but I can't figure it out myself, how to export my variable "X" (which is actually a JSON object) out of "for" cycle. I have tried a various ways, but in my case function return not the JSON.object itself, but a "promise.pending".
I guess that someone more expirienced with this will help me out. My code:
for (let i = 0; i < server.length; i++) {
const fetch = require("node-fetch");
const url = ''+(server[i].name)+'';
const getData = async url => {
try {
const response = await fetch(url);
return await response.json();
} catch (error) {
console.log(error);
}
};
getData(url).then(function(result) { //promise.pending w/o .then
let x = result; //here is real JSON that I want to export
});
}
console.log(x); // -element is not exported :(
Here's some cleaner ES6 code you may wish to try:
const fetch = require("node-fetch");
Promise.all(
server.map((srv) => {
const url = String(srv.name);
return fetch(url)
.then((response) => response.json())
.catch((err) => console.log(err));
})
)
.then((results) => {
console.log(results);
})
.catch((err) => {
console.log('total failure!');
console.log(err);
});
How does it work?
Using Array.map, it transforms the list of servers into a list of promises which are executed in parallel. Each promise does two things:
fetch the URL
extract JSON response
If either step fails, that one promise rejects, which will then cause the whole series to reject immediately.
Why do I think this is better than the accepted answer? In a word, it's cleaner. It doesn't mix explicit promises with async/await, which can make asynchronous logic muddier than necessary. It doesn't import the fetch library on every loop iteration. It converts the server URL to a string explicitly, rather than relying on implicit coercion. It doesn't create unnecessary variables, and it avoids the needless for loop.
Whether you accept it or not, I offer it up as another view on the same problem, solved in what I think is a maximally elegant and clear way.
Why is this so hard? Why is async work so counterintuitive?
Doing async work requires being comfortable with something known as "continuation passing style." An asynchronous task is, by definition, non-blocking -- program execution does not wait for the task to complete before moving to the next statement. But we often do async work because subsequent statements require data that is not yet available. Thus, we have the callback function, then the Promise, and now async/await. The first two solve the problem with a mechanism that allows you to provide "packages" of work to do once an asynchronous task is complete -- "continuations," where execution will resume once some condition obtains. There is absolutely no difference between a boring node-style callback function and the .then of a Promise: both accept functions, and both will execute those functions at specific times and with specific data. The key job of the callback function is to act as a receptacle for data about the asynchronous task.
This pattern complicates not only basic variable scoping, which was your main concern, but also the issue of how best to express complicated workflows, which are often a mix of blocking and non-blocking statements. If doing async work requires providing lots of "continuations" in the form of functions, then we know that doing this work will be a constant battle against the proliferation of a million little functions, a million things needing names that must be unique and clear. This is a problem that cannot be solved with a library. It requires adapting one's style to the changed terrain.
The less your feet touch the ground, the better. :)
Javascript builds on the concept of promises. When you ask getData to to do its work, what is says is that, "OK, this is going to take some time, but I promise that I'll let you know after the work is done. So please have faith on my promise, I'll let you know once the work is complete", and it immediately gives you a promise to you.
That's what you see as promise.pending. It's pending because it is not completed yet. Now you should register a certain task (or function) with that promise for getData to call when he completes the work.
function doSomething(){
var promiseArray = [];
for (let i = 0; i < server.length; i++) {
const fetch = require("node-fetch");
const url = ''+(server[i].name)+'';
const getData = async url => {
try {
const response = await fetch(url);
return await response.json();
} catch (error) {
console.log(error);
}
};
promiseArray.push(getData(url)); // keeping track of all promises
}
return Promise.all(promiseArray); //see, I'm not registering anything to promise, I'm passing it to the consumer
}
function successCallback(result) {
console.log("It succeeded with " + result);
}
function failureCallback(error) {
console.log("It failed with " + error);
}
let promise = doSomething(); // do something is the function that does all the logic in that for loop and getData
promise.then(successCallback, failureCallback);
I will appreciate if you help me with the following case:
Given function:
async function getAllProductsData() {
try {
allProductsInfo = await getDataFromUri(cpCampaignsLink);
allProductsInfo = await getCpCampaignsIdsAndNamesData(allProductsInfo);
await Promise.all(allProductsInfo.map(async (item) => {
item.images = await getProductsOfCampaign(item.id);
}));
allProductsInfo = JSON.stringify(allProductsInfo);
console.log(allProductsInfo);
return allProductsInfo;
} catch(err) {
handleErr(err);
}
}
That function is fired when server is started and it gathers campaigns information from other site: gets data(getDataFromUri()), then extracts from data name and id(getCpCampaignsIdsAndNamesData()), then gets products images for each campaign (getProductsOfCampaign());
I have also express.app with following piece of code:
app.get('/products', async (req, res) => {
if (allProductsInfo.length === undefined) {
console.log('Pending...');
allProductsInfo = await getAllProductsData();
}
res.status(200).send(allProductsInfo);
});
Problem description:
Launch server, wait few seconds until getAllProductsData() gets executed, do '/products' GET request, all works great!
Launch server and IMMEDIATELY fire '/products' GET request' (for that purpose I added IF with console.log('Pending...') expression), I get corrupted result back: it contains all campaigns names, ids, but NO images arrays.
[{"id":"1111","name":"Some name"},{"id":"2222","name":"Some other name"}],
while I was expecting
[{"id":"1111","name":"Some name","images":["URI","URI"...]}...]
I will highly appreciate your help about:
Explaining the flow of what is happening with async execution, and why the result is being sent without waiting for images arrays to be added to object?
If you know some useful articles/specs part covering my topic, I will be thankful for.
Thanks.
There's a huge issue with using a global variable allProductsInfo, and then firing multiple concurrent functions that use it asynchronously. This creates race conditions of all kinds, and you have to consider yourself lucky that you got only not images data.
You can easily solve this by making allProductsInfo a local variable, or at least not use it to store the intermediate results from getDataFromUri and getCpCampaignsIdsAndNamesData - use different (local!) variables for those.
However, even if you do that, you're potentially firing getAllProductsData multiple times, which should not lead to errors but is still inefficient. It's much easier to store a promise in the global variable, initialise this once with a single call to the info gathering procedure, and just to await it every time - which won't be noticeable when it's already fulfilled.
async function getAllProductsData() {
const data = await getDataFromUri(cpCampaignsLink);
const allProductsInfo = await getCpCampaignsIdsAndNamesData(allProductsInfo);
await Promise.all(allProductsInfo.map(async (item) => {
item.images = await getProductsOfCampaign(item.id);
}));
console.log(allProductsInfo);
return JSON.stringify(allProductsInfo);
}
const productDataPromise = getAllProductsData();
productDataPromise.catch(handleErr);
app.get('/products', async (req, res) => {
res.status(200).send(await productDataPromise);
});
Of course you might also want to start your server (or add the /products route to it) only after the data is loaded, and simply serve status 500 until then. Also you should consider what happens when the route is hit after the promise is rejected - not sure what handleErr does.