Node fetch loop too slow - javascript

I have an API js file which I call with a POST method, passing in an array of objects which each contains a site url (about 26 objects or urls) as the body, and with the code below I loop through this array (sites) , check if each object url returns a json by adding to the url the "/items.json" , if so push the json content into another final array siteLists which I send back as response.
The problem is for just 26 urls, this API call takes more than 5 seconds to complete, am I doing it the wrong way or is it just the way fetch works in Node.js?
const sites content looks like:
[{label: "JonLabel", name: "Jon", url: "jonurl.com"},{...},{...}]
code is:
export default async (req, res) => {
if (req.method === 'POST') {
const body = JSON.parse(req.body)
const sites = body.list // this content shown above
var siteLists = []
if (sites?.length > 0){
var b=0, idd=0
while (b < sites.length){
let url = sites?.[b]?.url
if (url){
let jurl = `${url}/items.json`
try {
let fUrl = await fetch(jurl)
let siteData = await fUrl.json()
if (siteData){
let items = []
let label = sites?.[b]?.label || ""
let name = sites?.[b]?.name || ""
let base = siteData?.items
if(base){
var c = 0
while (c < base.length){
let img = base[c].images[0].url
let titl = base[c].title
let obj = {
url: url,
img: img,
title: titl
}
items.push(obj)
c++
}
let object = {
id: idd,
name: name,
label: label,
items: items
}
siteLists.push(object)
idd++
}
}
}catch(err){
//console.log(err)
}
}
b++
}
res.send({ sites: siteLists })
}
res.end()
}
EDIT: (solution?)
So it seems the code with promises as suggested below and marked as the solution works in the sense that is faster, the funny thing tho is it still takes more than 5 secs to load and still throws a Failed to load resource: the server responded with a status of 504 (Gateway Time-out) error, since Vercel, where the app is hosted passed to a max timeout of 5 secs for serverless functions, therefore never loading the content in the response. Locally, where I got no timeout limits is visibly faster to load, but it surprises me that such a query takes so long to complete where it should be a matter of ms.

The biggest problem I see here is that you appear to be awaiting for one fetch to complete before you loop through to start the next fetch request, effectively running them serially. If you rewrote your script to run all of the simultaneously in parallel, you could push each request sequentially into a Promise.all and then process the results when they return.
Think of it like this-- if each request took a second to complete, and you have 26 requests, and you wait for one to complete before starting the next, it will take 26 seconds altogether. However, if you run them each all together, if they still each take only one second to complete the whole thing altogether will take just one second.
An example in psuedocode--
You want to change this:
const urls = ['url1', 'url2', 'url3'];
for (let url of urls) {
const result = await fetch(url);
process(result)
}
...into this:
const urls = ['url1', 'url2', 'url3'];
const requests = [];
for (let url of urls) {
requests.push(fetch(url));
}
Promise.all(requests)
.then(
(results) => results.forEach(
(result) => process(result)
)
);

While await is a great sugar, sometimes it's better to stick with then
export default async (req, res) => {
if (req.method === 'POST') {
const body = JSON.parse(req.body)
const sites = body.list // this content shown above
const siteListsPromises = []
if (sites?.length > 0){
var b=0
while (b < sites.length){
let url = sites?.[b]?.url
if (url) {
let jurl = `${url}/items.json`
// #1
const promise = fetch(jurl)
// #2
.then(async (fUrl) => {
let siteData = await fUrl.json()
if (siteData){
...
return {
// #3
id: -1,
name: name,
label: label,
items: items
}
}
})
// #4
.catch(err => {
// console.log(err)
})
siteListsPromises.push(promise)
}
b++
}
}
// #5
const siteLists = (await Promise.all(siteListsPromises))
// #6
.filter(el => el !== undefined)
// #7
.map((el, i) => ({ id: i, ...el }))
res.send({ sites: siteLists })
}
res.end()
}
Look for // #N comments in the snippet.
Don't await for requests to complete. Instead iterate over sites and send all requests at once
Chain json() and siteData processing after the fetch with then. And should your processing of siteData be more computational heavy it'd have even more sense to do so, instead of performing all of it only after all promises resolve.
If you (or someone on your team) have some troubles with understanding closures, don't bother setting the id of siteData elements in the cycle. I won't dive in this, but will address it further.
use .catch() instead of try{}catch(){}. Because without await it won't work.
await results of all requests with the Promise.all()
filter out those where siteData was falsy
finally set the id field.

Related

Why does this recursive function not run asynchronously?

I have a start(node, array) function that should perform a DFS by traversing an object tree via recursive calls to an API through callMsGraph(token, end) until image properties are found at the end of the tree, at which point they are pushed to array. The function seems like it works, but I can't get the output unless I wrap it in a 2 second setTimeout which indicates the recursion is not being waited on to complete. I would want to play around with async/await more, but it's not at the top-level.
I'm not sure if the nextNode.then is doing anything or maybe callMsGraph() needs to be awaited on differently to how I know. A solution would be much appreciated.
shelfdb.data = async (accessToken) => {
const token = accessToken;
const endpoint = 'https://graph.microsoft.com/v1.0/sites/webgroup.sharepoint.com,23e7ef7a-a529-4dde-81ba-67afb4f44401,0fa8e0f7-1c76-4ad0-9b6e-a485f9bfd63c/drive/items/01GNYB5KPQ57RHLPZCJFE2QMVKT5U3NYY3/children'
function start(node, array) {
if(node.value.length > 0) {
node.value.forEach(function(child) {
var end = 'https://graph.microsoft.com/v1.0/sites/webgroup.sharepoint.com,23e7ef7a-a529-4dde-81ba-67afb4f44401,0fa8e0f7-1c76-4ad0-9b6e-a485f9bfd63c/drive/items/' + child.id + '/children';
var nextNode = callMsGraph(token, end);
nextNode.then(function(currResult) {
if (currResult.value.length > 0) {
if ('image' in currResult.value[0]) {
currResult.value.forEach(function(imgChild) {
let img = {
'name': imgChild.name,
'job': imgChild.parentReference.path.split("/")[6],
'path': imgChild.webUrl,
'id': imgChild.id
}
array.push(img);
})
// complete storing images at tail object, go one level up after loop
return;
}
// if no 'image' or value, go into child
start(currResult, array);
}
}).catch(function(e) {
console.error(e.message);
})
})
}
return array;
}
var res = await callMsGraph(token, endpoint); // start recursion
var output = start(res, []);
console.log(output); // only displays value if wrapped in setTimeout
return output; // empty []
}
Each query to the API via callMsGraph(), returns an object like this, where subsequent queries are made with the id of each object/folder (as new endpoint) in value until an object with image property is found. The MS Graph API requires that folders are expanded at each level to access their children.
{
id: '01GNYB5KPQ57RHLPZCJFE2QMVKT5U3NYY3'
value: [
{
id: '01GNYB5KJMH5T4GXADUVFZRSITWZWNQROS',
name: 'Folder1',
},
{
id: '01GNYB5KMJKILOFDZ6PZBZYMXY4BGOI463',
name: 'Folder2',
}
]
}
This is the callMsGraph() helper:
function callMsGraph(accessToken, graphEndpoint) {
const headers = new Headers();
const bearer = `Bearer ${accessToken}`;
headers.append("Authorization", bearer);
const options = {
method: "GET",
headers: headers
};
return fetch(graphEndpoint, options)
.then(response => response.json())
.catch(error => {
console.log(error);
throw error;
});
}
The rule with promises is that once you opt into one (more likely, are forced into it by a library), all code that needs to block for a result anywhere after it also has to await. You can't "go back" to sync and if even a single piece of the promise chain between where the promise starts and where you want its result isn't awaited, the result will be unreachable*.
Taking a snippet of the code:
function start(node, array) { // not async!
// ..
node.value.forEach(function(child) { // doesn't await!
// ..
nextNode.then(function(currResult) {
// this promise is not hooked up to anything!
start(...) // recurse without await!
There's no await in front of then, start doesn't return a promise and isn't awaited recursively, and forEach has no way to await its callback's asynchronous results, so each promise in the nextNode.then chain is orphaned into the void forever*.
The solution is a structure like this:
async function start(node, array) {
// ..
for (const child of node.value) {
// ..
const currResult = await callMsGraph(token, end);
// ..
await start(...);
array.push(currResult);
}
// returns a promise implicitly
}
// ..
await start(...);
// `array` is populated here
Or Promise.all, which runs in parallel and returns an array (which could replace the parameter array):
function start(node, array) {
return Promise.all(node.value.map(async child => {
const currResult = await callMsGraph(token, end);
// ..
await start(...);
return currResult;
}));
}
I'd be happy to provide a minimal, runnable example, but the code you've provided isn't runnable, so you'll have to massage this a bit to work for you. If you make sure to await everything, you're good to go (and generally avoid mixing .then and async/await--the latter seems easier for this use case).
* (for all practical intents and purposes)
There is a few places where you are not handling promises returned in you code. nextNode.then if your forEach loop is just "called", next line of the code will not wait for it to complete, forEach loop will complete execution before then callbacks are called.
I changed you code a bit, but I have no way to check if it works correctly due to i would need to populate dummy data for callMsGraph but if you encounter any - tell me and I'll modify the answer
shelfdb.data = async (accessToken) => {
const token = accessToken;
const endpoint = 'https://graph.microsoft.com/v1.0/sites/webgroup.sharepoint.com,23e7ef7a-a529-4dde-81ba-67afb4f44401,0fa8e0f7-1c76-4ad0-9b6e-a485f9bfd63c/drive/items/01GNYB5KPQ57RHLPZCJFE2QMVKT5U3NYY3/children'
const images = [];
async function start(node, array) {
if (node.value.length <= 0) return array; // or === 0 or whatever
for (const child of node.value) {
const end = `https://graph.microsoft.com/v1.0/sites/webgroup.sharepoint.com,23e7ef7a-a529-4dde-81ba-67afb4f44401,0fa8e0f7-1c76-4ad0-9b6e-a485f9bfd63c/drive/items/${child.id}/children`;
const nextNode = await callMsGraph(token, end);
if (nextNode.value.length > 0) {
if ('image' in nextNode.value[0]) {
const mapped = nextNode.value.map(imgChild => {
return {
'name': imgChild.name,
'job': imgChild.parentReference.path.split("/")[6],
'path': imgChild.webUrl,
'id': imgChild.id
}
});
array.push(...mapped);
}
// if no 'image' or value, go into child
await start(nextNode, array);
}
}
return array;
}
var res = await callMsGraph(token, endpoint);
var output = await start(res, []);
console.log(output);
return output;
}
Also, please, feel free to add a try{} catch{} blocks in any place you need them, I skipped them

What's the optimal way to perform a large quantity of http requests in Node.js?

Assume there is a shop with 500 products, each with an ID starting from 0 to 500, each having its data stored in a JSON file living under a URL (e.g myshop.com/1.json, ...2.json etc).
Using a Node.js script, I would like to download all of these JSON files and store them locally. I can do it consecutively:
const totalProductsCount = 500;
try {
let currentItem = 1;
while (currentItem < (totalProductsCount + 1)) {
const product = await axios.get(`https://myshop.com/${currentItem}.json`);
fs.writeFileSync(`./product-${currentItem}.json`, JSON.stringify(product.data, null, 2));
currentItem++;
}
} catch (e) {
return;
}
Which works. However, I'd like to download these files fast, really fast. So I am trying to split all of my requests into groups, and get these groups in parallel. What I have is the following:
const _ = require('lodash');
const fs = require('fs');
const axios = require('axios');
const getChunk = async (chunk, index) => {
// The counter here is used for logging purposes only
let currentItem = 1;
try {
// Iterate through the items 1-50
await chunk.reduce(async (promise, productId) => {
await promise;
const product = await axios.get(`https://myshop.com/${productId}`);
if (product && product.data) {
console.log('Got product', currentItem, 'from chunk', index);
fs.writeFileSync(`./product-${productId}.json`, JSON.stringify(product.data, null, 2));
}
currentItem++;
}, Promise.resolve());
} catch (e) {
throw e;
}
}
const getProducts = async () => {
const totalProductsCount = 500;
// Create an array of 500 elements => [1, 2, 3, 4, ..., 499, 500]
const productIds = Array.from({ length: totalProductsCount }, (_, i) => i + 1);
// Using lodash, I am chunking that array into 10 groups of 50 each
const chunkBy = Math.ceil(productIds.length / 10);
const chunked = _.chunk(productIds, chunkBy);
// Run the `getChunkProducts` on each of the chunks in parallel
const products = await Promise.all([
...chunked.map((chunk, index) => getChunk(chunk, index))
])
// If the items are to be returned here, it should be with a single-level array
return _.flatten(products);
};
(async () => {
const products = await getProducts();
})()
This seems to be working most of the time, especially when I use on a smaller number of items. However, there is a behaviour which I cannot explain, where the script hangs when I ask for larger quantities of items.
What would be the best way to achieve this/best-practice and being able to catch any files that hang or that may not have been downloaded (since my thought is, I can download whatever I can with the chunking-action, then get back an array of all products ids which failed to download, and download them using the first method consecutively).
You are writing files synchronously in the middle of an async action! Change writeFileSync to use the async version. This should be an immediate improvement. As an additional performance enhancement you would ideally use a code path that does not parse the response if you want the results directly written into a file. It looks like you can use responseType: 'stream' in your request config to accomplish this. This would prevent the overhead of parsing the response into a JS object before writing it to the file.
It also sounds like you may also want to adjust the timeout on your http requests to be at a lower level to determine if it should fail after a few seconds instead of waiting for a request you think should fail. If you refer to the docs there is a param on the request config that you could lower to a few seconds. https://axios-http.com/docs/req_config

How to pull data from Paginated JSON

I have say 300 items 10 show to a page. The page loads the JSON data and is limited to 10 (this cannot be changed)
I want to scrub through the 30 odd pages pulling each item and listing it.
url.com/api/some-name?page=1 etc
The script ideally will use the above URL as a rule and scrub through increments of 1 until all 10 from each page is populated.
Can this be done? How would I go about it? Any advice or assistance to this would help me greatly in learning and looking at methods people suggest.
const getInfo = async function(pageNo) {
const jsonUrl = "https://website.com/api/some-title";
let actualUrl = jsonUrl + `?page=${pageNo}`;
let jsonResults = await fetch(actualUrl).then(response => {
return response.json();
});
return jsonResults;
};
const getEntireList = async function(pageNo) {
const results = await getInfo(pageNo);
console.log("Retreiving data from API for page:" + pageNo);
if (results.length > 0) {
return results.concat(await getEntireList(pageNo));
} else {
return results;
}
};
(async () => {
const entireList = await getEntireList();
console.log(entireList);
})();
I can see some issues in your code.
the initial call to getEntireList() should be initialised with the index of first page, maybe like const entireList = await getEntireList(1);
The page number will need to be incremented at some point.
results.concat() probably won't have the desired effect. json() returns an object, list, or value (depending on the server) and results will be one of those type. concat() operates on strings; so calling json() is (at best) redundant.

How do I loop through multiple pages in an API?

I am using the Star Wars API https://swapi.co/ I need to pull in starships information, the results for starships span 4 pages, however a get call returns only 10 results per page. How can I iterate over multiple pages and get the info that I need?
I have used the fetch api to GET the first page of starships and then added this array of 10 to my totalResults array, and then created a While loop to check to see if 'next !== null' (next is the next page property in the data, if we were viewing the last page i.e. page 4, then next would be null "next" = null) So as long as next does not equal null, my While loop code should fetch the data and add it to my totalResults array. I have changed the value of next at the end, but it seems to looping forever and crashing.
function getData() {
let totalResults = [];
fetch('https://swapi.co/api/starships/')
.then( res => res.json())
.then(function (json) {
let starships = json;
totalResults.push(starships.results);
let next = starships.next;
while ( next !== null ) {
fetch(next)
.then( res => res.json() )
.then( function (nextData) {
totalResults.push(nextData.results);
next = nextData.next;
})
}
});
}
Code keeps looping meaning my 'next = nextData.next;' increment does not seem to be working.
You have to await the response in a while loop, otherwise the loop runs synchronously, while the results arrive asynchronously, in other words the while loop runs forever:
async getData() {
const results = [];
let url = 'https://swapi.co/api/starships/';
do {
const res = await fetch(url);
const data = await res.json();
url = data.next;
results.push(...data.results);
} while(url)
return results;
}
You can do it with async/await functions more easily:
async function fetchAllPages(url) {
const data = [];
do {
let response = fetch(url);
url = response.next;
data.push(...response.results);
} while ( url );
return data;
}
This way you can reutilize this function for other api calls.

Axios.all in NodeJS failing with 404 error

I hope you can help me because it is HOURS of trying to get this problem resolved. I've googled so much and tried all of the solutions I found, but I keep getting the same error.
I am trying to make an axis get a request to an API that is paginated for 1 result per page, loop through all of the results, and resolve the promises with the promise array.
I have verified that without the loop, just getting 1 request, everything works. I have successful writing to MongoDB using MongoDB driver and its fine. Once I bring the loop in I cannot get the promises to resolve. I was able to console.log that the promise array does, indeed, have x number of pending promises in them.
const MongoClient = require('mongodb')
const axios = require('axios');
const url = 'https://catfact.ninja/fact'
let db = null;
let client = null;
//this one works great
const getMetaData = function () {
let data = axios.get(url+"s")
.then(response => {
return response.data
}).catch(error => console.log(error));
return data;
}
//this one will not resolve
const dataArray = async function (total) {
let baseUrl = url+"s/facts?page="
let res =[];
let promises = [];
for (let page = 1; page <= total; page++){
promises.push(axios.get(baseUrl+page))
}
axios.all(promises).then(result => console.log(result))
//originally i wanted to map the result to an array of json
//objects, but if i could even get a console.log it would be
//a win. spread operator not working, Promise.all not working
//i must have tried 12 different stackoverflow responses from
//other questions. until i can resolve the promises I can't do anything.
}
exports.connect = async function(url, done) {
if (db) return done();
// let data = await getMetaData()
// let total = data['total']
let arr = dataArray(5);
//console.log("arr is "+arr)
MongoClient.connect(url, {useNewUrlParser: true}, function (err, client){
if (err) return done(err);
client = client;
db = client.db('morefun');
/*
db.collection('catfacts').insertMany(dataArray, function(err, res){
if (err) throw err;
console.log("Inserted: " + res.insertedCount);
})*/
done();
});
}
exports.get = function() {
return db;
}
//make sure this is correct
exports.close = function(done) {
if (db) {
client.close(function(err, result) {
db = null;
mode = null;
done(err);
});
}
}
I need an array of JSON objects for the insertMany function to work. please someone help me. what am I doing wrong?
In the for loop, you are creating a URL like this: https://catfact.ninja/facts/facts?page=1 – this is incorrect, the correct URL should be https://catfact.ninja/facts?page=1 (with facts only once).
Also, the keyword async is not needed here, and you should return the result of axios.all.
A correct version of your code:
const dataArray = function (total) {
let baseUrl = url+"s?page="
let res =[];
let promises = [];
for (let page = 1; page <= total; page++){
promises.push(axios.get(baseUrl+page))
}
return axios.all(promises).then(result => {
console.log(result);
return result;
});
}
You can then get your data like this:
let arr = await dataArray(5);
Getting the actual data the way you want it
From your comments, I see that what you really want is to post-process the data obtained from the API to ultimately get one array that contains only the cat data.
You can do this by “massaging” the data with map and reduce, like this:
return axios
.all(promises)
.then(result => result.map(({ data }) => data.data).reduce((curr, acc) => acc.concat(curr), []));
Note: I've left out the console.log statement here for brevity.
The actual data is nested as property 'data' in an object within an object as property 'data', so the map call retrieves that
We get an array of arrays, each with an object with cat data; the reduce call flattens this to a simple array of cat data
We get a result that looks like this, which is hopefully what you want 😁:
[
{
"fact": "Cats see six times better in the dark and at night than humans.",
"length": 63
},
{
"fact": "The ability of a cat to find its way home is called “psi-traveling.” Experts think cats either use the angle of the sunlight to find their way or that cats have magnetized cells in their brains that act as compasses.",
"length": 220
},
{
"fact": "Cat's urine glows under a black light.",
"length": 38
},
{
"fact": "There are more than 500 million domestic cats in the world, with approximately 40 recognized breeds.",
"length": 100
},
{
"fact": "A tomcat (male cat) can begin mating when he is between 7 and 10 months old.",
"length": 76
}
]
No sure if it's the answer but I know I've ran into issues when not using the syntax axios wants exactly for an axios all.
axios.all([fetch1request(), fetch2request()]).then(axios.spread((fetch1, fetch2 ) => * whatever logic you need. But at this point the requests are complete* })

Categories

Resources