What's the best design approach for NodeJS promised & loops? [duplicate] - javascript

This question already has answers here:
Correct way to write loops for promise.
(13 answers)
Closed 4 years ago.
I have a PHP script which does a loop (several thousand times). Each loop requests a paginated URL which returns some results in JSON. Each result requires at least 2 further API hits to request more information. Once all this is complete, I'd build this into a packet for indexing/storing.
In PHP this is quite easy as it's naturally synchronous.
I'm in the process of moving this to a NodeJS setup and finding it difficult to elegantly do this in NodeJS.
If I make a for-loop, the initial HTTP request (which is often promise based) would complete almost instantly, leaving the resolution of the request to the promise handlers. This means all several-thousand page requests would end up being fired pretty much in parallel. It would also mean I'd have to chain up the sub-requests within the promise resolution (which leads to further promise chains as I have to wait for those to be resolves).
I've tried using the async/await approach, but they dont seem to play nicely with for-loops (or forEach, at least).
This is roughly what I'm working with right now, but I'm starting to think its entirely wrong and I could be using callbacks to trigger the next loop?!
async processResult(result) {
const packet = {
id: result.id,
title: result.title,
};
const subResponse1 = await getSubThing1();
packet.thing1 = subResponse1.body.thing1;
const subResponse2 = await getSubThing2();
packet.thing2 = subResponse2.body.thing2;
}
(async () => {
for (let page = START_PAGE; page < MAX_PAGES; page += 1) {
console.log(`Processing page ${page}...`);
getList(page)
.then(response => Promise.all(response.body.list.map(processResult)))
.then(data => console.log('DATA DONE', data));
console.log(`Page ${page} done!`);
}
})();

I've tried using the async/await approach, but they dont seem to play nicely with for-loops (or forEach, at least).
Yes, async/await does not play nicely with forEach. Do not use forEach! But putting an await in your normal for loop will work just fine:
(async () => {
for (let page = START_PAGE; page < MAX_PAGES; page += 1) {
console.log(`Processing page ${page}...`);
const response = await getList(page);
const data = await Promise.all(response.body.list.map(processResult));
console.log('DATA DONE', data);
console.log(`Page ${page} done!`);
}
})();

Related

How to know when a fetch ends without locking the body stream?

I'm making requests to an API, but their server only allows a certain number of active connections, so I would like to limit the number of ongoing fetches. For my purposes, a fetch is only completed (not ongoing) when the HTTP response body arrives at the client.
I would like to create an abstraction like this:
const fetchLimiter = new FetchLimiter(maxConnections);
fetchLimiter.fetch(url, options); // Returns the same thing as fetch()
This would make things a lot simpler, but there seems to be no way of knowing when a stream being used by other code ends, because streams are locked while they are being read. It is possible to use ReadableStream.tee() to split the stream into two, use one and return the other to the caller (possibly also constructing a Response with it), but this would degrade performance, right?
Since fetch uses promises, you can take advantage of that to make a simple queue system.
This is a method I've used before for queuing promise based stuff. It enqueues items by creating a Promise and then adding its resolver to an array. Of course until that Promise resolves, the await keeps any later promises from being invoked.
And all we have to do to start the next fetch when one finishes is just grab the next resolver and invoke it. The promise resolves, and then the fetch starts!
Best part, since we don't actually consume the fetch result, there's no worries about having to clone or anything...we just pass it on intact, so that you can consume it in a later then or something.
*Edit: since the body is still streaming after the fetch promise resolves, I added a third option so that you can pass in the body type, and have FetchLimiter retrieve and parse the body for you.
These all return a promise that is eventually resolved with the actual content.
That way you can just have FetchLimiter parse the body for you. I made it so it would return an array of [response, data], that way you can still check things like the response code, headers, etc.
For that matter, you could even pass in a callback or something to use in that await if you needed to do something more complex, like different methods of parsing the body depending on response code.
Example
I added comments to indicate where the FetchLimiter code begins and ends...the rest is just demo code.
It's using a fake fetch using a setTimeout, which will resolve between 0.5-1.5 secs. It will start the first three requests immediately, and then the actives will be full, and it will wait for one to resolve.
When that happens, you'll see the comment that the promise has resolved, then the next promise in the queue will start, and then you'll see the then from in the for loop resolve. I added that then just so you could see the order of events.
(function() {
const fetch = (resource, init) => new Promise((resolve, reject) => {
console.log('starting ' + resource);
setTimeout(() => {
console.log(' - resolving ' + resource);
resolve(resource);
}, 500 + 1000 * Math.random());
});
function FetchLimiter() {
this.queue = [];
this.active = 0;
this.maxActive = 3;
this.fetchFn = fetch;
}
FetchLimiter.prototype.fetch = async function(resource, init, respType) {
// if at max active, enqueue the next request by adding a promise
// ahead of it, and putting the resolver in the "queue" array.
if (this.active >= this.maxActive) {
await new Promise(resolve => {
this.queue.push(resolve); // push, adds to end of array
});
}
this.active++; // increment active once we're about to start the fetch
const resp = await this.fetchFn(resource, init);
let data;
if (['arrayBuffer', 'blob', 'json', 'text', 'formData'].indexOf(respType) >= 0)
data = await resp[respType]();
this.active--; // decrement active once fetch is done
this.checkQueue(); // time to start the next fetch from queue
return [resp, data]; // return value from fetch
};
FetchLimiter.prototype.checkQueue = function() {
if (this.active < this.maxActive && this.queue.length) {
// shfit, pulls from start of array. This gives first in, first out.
const next = this.queue.shift();
next('resolved'); // resolve promise, value doesn't matter
}
}
const limiter = new FetchLimiter();
for (let i = 0; i < 9; i++) {
limiter.fetch('/mypage/' + i)
.then(x => console.log(' - .then ' + x));
}
})();
Caveats:
I'm not 100% sure if the body is still streaming when the promise resolves...that seems to be a concern for you. However if that's a problem you could use one of the Body mixin methods like blob or text or json, which doesn't resolve until the body content is completely parsed (see here)
I intentionally kept it very short (like 15 lines of actual code) as a very simple proof of concept. You'd want to add some error handling in production code, so that if the fetch rejects because of a connection error or something that you still decrement the active counter and start the next fetch.
Of course it's also using async/await syntax, because it's so much easier to read. If you need to support older browsers, you'd want to rewrite with Promises or transpile with babel or equivalent.

Waiting for loops to finish by using await on the loop

Synchronicity in js loops is still driving me up the wall.
What I want to do is fairly simple
async doAllTheThings(data, array) {
await array.forEach(entry => {
let val = //some algorithm using entry keys
let subVal = someFunc(/*more entry keys*/)
data[entry.Namekey] = `${val}/${subVal}`;
});
return data; //after data is modified
}
But I can't tell if that's actually safe or not. I simply don't like the simple loop pattern
for (i=0; i<arrayLength; i++) {
//do things
if (i === arrayLength-1) {
return
}
}
I wanted a better way to do it, but I can't tell if what I'm trying is working safely or not, or I simply haven't hit a data pattern that will trigger the race condition.
Or perhaps I'm overthinking it. The algorithm in the array consists solely of some MATH and assignment statements...and a small function call that itself also consists solely of more MATH and assignment statements. Those are supposedly fully synchronous across the board. But loops are weird sometimes.
The Question
Can you use await in that manner, outside the loop itself, to trigger the code to wait for the loop to complete? Or is the only safe way to accomplish this the older manner of simply checking where you are in the loop, and not returning until you hit the end, manually.
One of the best ways to handle async and loops is to put then on a promise and wait for Promise.all remember that await returns a Promise so you can do:
async function doAllTheThings(array) {
const promises = []
array.forEach((entry, index) => {
promises.push(new Promise((resolve) => {
setTimeout(() => resolve(entry + 1), 200 )
}))
});
return Promise.all(promises)
}
async function main () {
const arrayPlus1 = await doAllTheThings([1,2,3,4,5])
console.log(arrayPlus1.join(', '))
}
main().then(() => {
console.log('Done the async')
}).catch((err) => console.log(err))
Another option is to use generators but they are a little bit more complex so if you can just save your promises and wait for then that is an easier approach.
About the question at the end:
Can you use await in that manner, outside the loop itself, to trigger the code to wait for the loop to complete? Or is the only safe way to accomplish this the older manner of simply checking where you are in the loop, and not returning until you hit the end, manually.
All javascript loops are synchronous so the next line will wait for the loop to execute.
If you need to do some async code in loop a good approach is the promise approach above.
Another approach for async loops specially if you have to "pause" or get info from outside the loop is the iterator/generator approach.

Difference between Serial and parallel

I am fetching some persons from an API and then executing them in parallel, how would the code look if i did it in serial?
Not even sure, if below is in parallel, i'm having a hard time figuring out the difference between the two of them.
I guess serial is one by one, and parallel (promise.all) is waiting for all promises to be resolved before it puts the value in finalResult?
Is this understood correct?
Below is a snippet of my code.
Thanks in advance.
const fetch = require('node-fetch')
const URL = "https://swapi.co/api/people/";
async function fetchPerson(url){
const result = await fetch(url);
const data = await result.json().then((data)=>data);
return data ;
}
async function printNames() {
const person1 = await fetchPerson(URL+1);
const person2 = await fetchPerson(URL+2);
let finalResult =await Promise.all([person1.name, person2.name]);
console.log(finalResult);
}
printNames().catch((e)=>{
console.log('There was an error :', e)
});
Let me translate this code into a few different versions. First, the original version, but with extra work removed:
const fetch = require('node-fetch')
const URL = "https://swapi.co/api/people/";
async function fetchPerson(url){
const result = await fetch(url);
return await result.json();
}
async function printNames() {
const person1 = await fetchPerson(URL+1);
const person2 = await fetchPerson(URL+2);
console.log([person1.name, person2.name]);
}
try {
await printNames();
} catch(error) {
console.error(error);
}
The code above is equivalent to the original code you posted. Now to get a better understanding of what's going on here, let's translate this to the exact same code pre-async/await.
const fetch = require('node-fetch')
const URL = "https://swapi.co/api/people/";
function fetchPerson(url){
return fetch(url).then((result) => {
return result.json();
});
}
function printNames() {
let results = [];
return fetchPerson(URL+1).then((person1) => {
results.push(person1.name);
return fetchPerson(URL+2);
}).then((person2) => {
results.push(person2.name);
console.log(results);
});
}
printNames().catch((error) => {
console.error(error);
});
The code above is equivalent to the original code you posted, I just did the extra work the JS translator will do. I feel like this makes it a little more clear. Following the code above, we will do the following:
Call printNames()
printNames() will request person1 and wait for the response.
printNames() will request person2 and wait for the response.
printNames() will print the results
As you can imagine, this can be improved. Let's request both persons at the same time. We can do that with the following code
const fetch = require('node-fetch')
const URL = "https://swapi.co/api/people/";
function fetchPerson(url){
return fetch(url).then((result) => {
return result.json();
});
}
function printNames() {
return Promise.all([fetchPerson(URL+1).then((person1) => person1.name), fetchPerson(URL+2).then((person2) => person2.name)]).then((results) => {
console.log(results);
});
}
printNames().catch((error) => {
console.error(error);
});
This code is NOT equivalent to the original code posted. Instead of performing everything serially, we are now fetching the different users in parallel. Now our code does the following
Call printNames()
printNames() will
Send a request for person1
Send a request for person2
printNames() will wait for a response from both of the requests
printNames() will print the results
The moral of the story is that async/await is not a replacement for Promises in all situations, it is syntactic sugar to make a very specific way of handling Promises easier. If you want/can perform tasks in parallel, don't use async/await on each individual Promise. You can instead do something like the following:
const fetch = require('node-fetch')
const URL = "https://swapi.co/api/people/";
async function fetchPerson(url){
const result = await fetch(url);
return result.json();
}
async function printNames() {
const [ person1, person2 ] = await Promise.all([
fetchPerson(URL+1),
fetchPerson(URL+2),
]);
console.log([person1.name, person2.name]);
}
printNames().catch((error) => {
console.error(error);
});
Disclaimer
printNames() (and everything else) doesn't really wait for anything. It will continue executing any and all code that comes after the I/O that does not appear in a callback for the I/O. await simply creates a callback of the remaining code that is called when the I/O has finished. For example, the following code snippet will not do what you expect.
const results = Promise.all([promise1, promise2]);
console.log(results); // Gets executed immediately, so the Promise returned by Promise.all() is printed, not the results.
Serial vs. Parallel
In regards to my discussion with the OP in the comments, I wanted to also add a description of serial vs. parallel work. I'm not sure how familiar you are with the different concepts, so I'll give a pretty abstraction description of them.
First, I find it prudent to say that JS does not support parallel operations within the same JS environment. Specifically, unlike other languages where I can spin up a thread to perform any work in parallel, JS can only (appear to) perform work in parallel if something else is doing the work (I/O).
That being said, let's start off with a simple description of what serial vs. parallel looks like. Imagine, if you will, doing homework for 4 different classes. The time it takes to do each class's homework can be seen in the table below.
Class | Time
1 | 5
2 | 10
3 | 15
4 | 2
Naturally, the work you do will happen serially and look something like
You_Instance1: doClass1Homework() -> doClass2Homework() -> doClass3Homework() -> doClass4Homework()
Doing the homework serially would take 32 units of time. However, wouldn't be great if you could split yourself into 4 different instances of yourself? If that were the case, you could have an instance of yourself for each of your classes. This might look something like
You_Instance1: doClass1Homework()
You_Instance2: doClass2Homework()
You_Instance3: doClass3Homework()
You_Instance4: doClass4Homework()
Working in parallel, you can now finish your homework in 15 units of time! That's less than half the time.
"But wait," you say, "there has to be some disadvantage to splitting myself into multiple instances to do my homework or everybody would be doing it."
You are correct. There is some overhead to splitting yourself into multiple instances. Let's say that splitting yourself requires deep meditation and an out of body experience, which takes 5 units of time. Now finishing your homework would look something like:
You_Instance1: split() -> split() -> doClass1Homework()
You_Instance2: split() -> doClass2Homework()
You_Instance3: doClass3Homework()
You_Instance4: doClass4Homework()
Now instead of taking 15 units of time, completing your homework takes 25 units of time. This is still cheaper than doing all of your homework by yourself.
Summary (Skip here if you understand serial vs. parallel execution)
This may be a silly example, but this is exactly what serial vs. parallel execution looks like. The main advantage of parallel execution is that you can perform several long running tasks at the same time. Since multiple workers are doing something at the same time, the work gets done faster.
However, there are disadvantages. Two of the big ones are overhead and complexity. Executing code in parallel isn't free, no matter what environment/language you use. In JS, parallel execution can be quite expensive because this is achieved by sending a request to a server. Depending on various factors, the round trip can take 10s to 100s of milliseconds. That is extremely slow for modern computers. This is why parallel execution is usually reserved for long running processes (completing homework) or when it just cannot be avoided (loading data from disk or a server).
The other main disadvantage is the added complexity. Coordinating multiple tasks occurring in parallel can be difficult (Dining Philosophers, Consumer-Producer Problem, Starvation, Race Conditions). But in JS the complexity also comes from understanding the code (Callback Hell, understanding what gets executed when). As mentioned above, a set of instructions occurring after asynchronous code does not wait to execute until the asynchronous code completes (unless it occurs in a callback).
How could I get multiple instances of myself to do my homework in JS?
There are a couple of different ways to accomplish this. One way you could do this is by setting up 4 different servers. Let's call them class1Server, class2Server, class3Server, and class4Server. Now to make these servers do your homework in parallel, you would do something like this:
Promise.all([
startServer1(),
startServer2(),
startServer3(),
startServer4()
]).then(() => {
console.log("Homework done!");
}).catch(() => {
console.error("One of me had trouble completing my homework :(");
});
Promise.all() returns a Promise that either resolves when all of the Promises are resolved or rejects when one of them is rejected.
Both functions fetchPerson and printNames run in serial as you are awaiting the results. The Promise.all use is pointless in your case, since the both persons already have been awaited (resolved).
To fetch two persons in parallel:
const [p1, p2] = await Promise.all([fetchPerson(URL + '1'), fetchPerson(URL + '2')])
Given two async functions:
const foo = async () => {...}
const bar = async () => {...}
This is serial:
const x = await foo()
const y = await bar()
This is parallel:
const [x,y] = await Promise.all([foo(), bar()])

Move object/variable away from async function

I understand that this is a basic question, but I can't figure it out myself, how to export my variable "X" (which is actually a JSON object) out of "for" cycle. I have tried a various ways, but in my case function return not the JSON.object itself, but a "promise.pending".
I guess that someone more expirienced with this will help me out. My code:
for (let i = 0; i < server.length; i++) {
const fetch = require("node-fetch");
const url = ''+(server[i].name)+'';
const getData = async url => {
try {
const response = await fetch(url);
return await response.json();
} catch (error) {
console.log(error);
}
};
getData(url).then(function(result) { //promise.pending w/o .then
let x = result; //here is real JSON that I want to export
});
}
console.log(x); // -element is not exported :(
Here's some cleaner ES6 code you may wish to try:
const fetch = require("node-fetch");
Promise.all(
server.map((srv) => {
const url = String(srv.name);
return fetch(url)
.then((response) => response.json())
.catch((err) => console.log(err));
})
)
.then((results) => {
console.log(results);
})
.catch((err) => {
console.log('total failure!');
console.log(err);
});
How does it work?
Using Array.map, it transforms the list of servers into a list of promises which are executed in parallel. Each promise does two things:
fetch the URL
extract JSON response
If either step fails, that one promise rejects, which will then cause the whole series to reject immediately.
Why do I think this is better than the accepted answer? In a word, it's cleaner. It doesn't mix explicit promises with async/await, which can make asynchronous logic muddier than necessary. It doesn't import the fetch library on every loop iteration. It converts the server URL to a string explicitly, rather than relying on implicit coercion. It doesn't create unnecessary variables, and it avoids the needless for loop.
Whether you accept it or not, I offer it up as another view on the same problem, solved in what I think is a maximally elegant and clear way.
Why is this so hard? Why is async work so counterintuitive?
Doing async work requires being comfortable with something known as "continuation passing style." An asynchronous task is, by definition, non-blocking -- program execution does not wait for the task to complete before moving to the next statement. But we often do async work because subsequent statements require data that is not yet available. Thus, we have the callback function, then the Promise, and now async/await. The first two solve the problem with a mechanism that allows you to provide "packages" of work to do once an asynchronous task is complete -- "continuations," where execution will resume once some condition obtains. There is absolutely no difference between a boring node-style callback function and the .then of a Promise: both accept functions, and both will execute those functions at specific times and with specific data. The key job of the callback function is to act as a receptacle for data about the asynchronous task.
This pattern complicates not only basic variable scoping, which was your main concern, but also the issue of how best to express complicated workflows, which are often a mix of blocking and non-blocking statements. If doing async work requires providing lots of "continuations" in the form of functions, then we know that doing this work will be a constant battle against the proliferation of a million little functions, a million things needing names that must be unique and clear. This is a problem that cannot be solved with a library. It requires adapting one's style to the changed terrain.
The less your feet touch the ground, the better. :)
Javascript builds on the concept of promises. When you ask getData to to do its work, what is says is that, "OK, this is going to take some time, but I promise that I'll let you know after the work is done. So please have faith on my promise, I'll let you know once the work is complete", and it immediately gives you a promise to you.
That's what you see as promise.pending. It's pending because it is not completed yet. Now you should register a certain task (or function) with that promise for getData to call when he completes the work.
function doSomething(){
var promiseArray = [];
for (let i = 0; i < server.length; i++) {
const fetch = require("node-fetch");
const url = ''+(server[i].name)+'';
const getData = async url => {
try {
const response = await fetch(url);
return await response.json();
} catch (error) {
console.log(error);
}
};
promiseArray.push(getData(url)); // keeping track of all promises
}
return Promise.all(promiseArray); //see, I'm not registering anything to promise, I'm passing it to the consumer
}
function successCallback(result) {
console.log("It succeeded with " + result);
}
function failureCallback(error) {
console.log("It failed with " + error);
}
let promise = doSomething(); // do something is the function that does all the logic in that for loop and getData
promise.then(successCallback, failureCallback);

Need help related async functions execution flow (await, promises, node.js)

I will appreciate if you help me with the following case:
Given function:
async function getAllProductsData() {
try {
allProductsInfo = await getDataFromUri(cpCampaignsLink);
allProductsInfo = await getCpCampaignsIdsAndNamesData(allProductsInfo);
await Promise.all(allProductsInfo.map(async (item) => {
item.images = await getProductsOfCampaign(item.id);
}));
allProductsInfo = JSON.stringify(allProductsInfo);
console.log(allProductsInfo);
return allProductsInfo;
} catch(err) {
handleErr(err);
}
}
That function is fired when server is started and it gathers campaigns information from other site: gets data(getDataFromUri()), then extracts from data name and id(getCpCampaignsIdsAndNamesData()), then gets products images for each campaign (getProductsOfCampaign());
I have also express.app with following piece of code:
app.get('/products', async (req, res) => {
if (allProductsInfo.length === undefined) {
console.log('Pending...');
allProductsInfo = await getAllProductsData();
}
res.status(200).send(allProductsInfo);
});
Problem description:
Launch server, wait few seconds until getAllProductsData() gets executed, do '/products' GET request, all works great!
Launch server and IMMEDIATELY fire '/products' GET request' (for that purpose I added IF with console.log('Pending...') expression), I get corrupted result back: it contains all campaigns names, ids, but NO images arrays.
[{"id":"1111","name":"Some name"},{"id":"2222","name":"Some other name"}],
while I was expecting
[{"id":"1111","name":"Some name","images":["URI","URI"...]}...]
I will highly appreciate your help about:
Explaining the flow of what is happening with async execution, and why the result is being sent without waiting for images arrays to be added to object?
If you know some useful articles/specs part covering my topic, I will be thankful for.
Thanks.
There's a huge issue with using a global variable allProductsInfo, and then firing multiple concurrent functions that use it asynchronously. This creates race conditions of all kinds, and you have to consider yourself lucky that you got only not images data.
You can easily solve this by making allProductsInfo a local variable, or at least not use it to store the intermediate results from getDataFromUri and getCpCampaignsIdsAndNamesData - use different (local!) variables for those.
However, even if you do that, you're potentially firing getAllProductsData multiple times, which should not lead to errors but is still inefficient. It's much easier to store a promise in the global variable, initialise this once with a single call to the info gathering procedure, and just to await it every time - which won't be noticeable when it's already fulfilled.
async function getAllProductsData() {
const data = await getDataFromUri(cpCampaignsLink);
const allProductsInfo = await getCpCampaignsIdsAndNamesData(allProductsInfo);
await Promise.all(allProductsInfo.map(async (item) => {
item.images = await getProductsOfCampaign(item.id);
}));
console.log(allProductsInfo);
return JSON.stringify(allProductsInfo);
}
const productDataPromise = getAllProductsData();
productDataPromise.catch(handleErr);
app.get('/products', async (req, res) => {
res.status(200).send(await productDataPromise);
});
Of course you might also want to start your server (or add the /products route to it) only after the data is loaded, and simply serve status 500 until then. Also you should consider what happens when the route is hit after the promise is rejected - not sure what handleErr does.

Categories

Resources