Writing files in a loop - JS - javascript

I seem to be having an issue when trying to write to a file in a loop, the loop is iterating even though the first file has not been created (I think i am either not understanding promises or the asynchronous nature of the script)
So on the command line i will run node write_file_script.js premier_league
// teams.js
module.exports = {
premier_league: [
{ team_name: "Team 1", friendly_name: "name 1"},
{ team_name: "Team 2", friendly_name: "name 2"}
]
}
My Script
const args = process.argv.slice(2);
const TEAM = require('./teams');
const Excel = require('exceljs');
const workbook = new Excel.Workbook();
for (var team = 0; team < TEAM[args].length; team++) {
console.log("Starting Excel File generation for " + TEAM[args][team]['team_name']);
var fhcw = require('../data_files/home/fhcw/' + TEAM[args][team]['friendly_name'] + '_home' + '.json');
fhcw = fhcw.map(Number);
workbook.xlsx.readFile('./excel_files/blank.xlsx')
.then(function() {
var worksheet = workbook.getWorksheet(1);
// Write FHCW
for (i=0; i < fhcw.length; i++) {
col = i+6;
worksheet.getCell('E'+ col).value = fhcw[i];
}
console.log(TEAM[args][team])
workbook.xlsx.writeFile('./excel_files/' + TEAM[args] + '/' + TEAM[args][team]['friendly_name'] + '.xlsx');
});
}
The output i get when running this is
Starting Excel File generation for Team 1
Starting Excel File generation for Team 2
undefined
(node:75393) UnhandledPromiseRejectionWarning: Unhandled promise rejection
(rejection id: 1): TypeError: Cannot read property 'friendly_name' of undefined
So it seems like the file is not being written but the loop continues, how can i ensure the file is written before moving onto the next loop?
Thanks

If the function is returning a Promise, and you want to do them serially (one at a time) instead of in parallel (start all at the same time), you'll need to wait for each before you start the next using then().
Also, note that your TEAM is just an export of an array (at least as presented), so you can't give it args, which is where the other error comes from.
When you have a list of things to do, the best way to do them is to have a queue which you run until you run out of files. In this case, it looks like your TEAM array is your queue, but since this is an export, I'd recommend not necessarily changing this, but instead copy it to another array which you can alter:
const args = process.argv.slice(2);
const TEAM = require('./teams');
const Excel = require('exceljs');
const workbook = new Excel.Workbook();
const writeNextFile = (queue) => {
// If we have nothing left in the queue, we're done.
if (!queue.length) {
return Promise.resolve();
}
const team = queue.shift(); // get the first element, take it out of the array
console.log("Starting Excel File generation for " + team.team_name);
var fhcw = require('../data_files/home/fhcw/' + team.friendly_name + '_home' + '.json');
fhcw = fhcw.map(Number);
// return this promise chain
return workbook.xlsx.readFile('./excel_files/blank.xlsx')
.then(function() {
var worksheet = workbook.getWorksheet(1);
// Write FHCW
for (i=0; i < fhcw.length; i++) {
col = i+6;
worksheet.getCell('E'+ col).value = fhcw[i];
}
console.log(team);
// not sure what you thought this would TEAM[args] would have translated to, but it wouldn't have been a string, so I put ??? for now
// also, making the assumption that writeFile() returns a Promise.
return workbook.xlsx.writeFile('./excel_files/' + team.??? + '/' + team.friendly_name + '.xlsx');
}).then(() => writeNextFile(queue));
}
writeNextFile(TEAM.slice(0)) // make a copy of TEAM so we can alter it
.then(() => console.log('Done'))
.catch(err => console.error('There was an error', err));
Basically, our function takes an array and will write the first team, then call itself recursively to call the next. Eventually it'll all resolve and you're end up with is a Promise that resolves at the end.
When it comes to Promises, you basically always need to chain them together. You won't be able to use a for loop for them, or any other standard loop for that matter.
If you wanted to write them all at once, it is a little cleaner, because you can just do a non-recursive map for each and use Promise.all to know when they are done.
const writeNextFile = (team) => {
console.log("Starting Excel File generation for " + team.team_name);
var fhcw = require('../data_files/home/fhcw/' + team.friendly_name + '_home' + '.json');
fhcw = fhcw.map(Number);
// return this promise chain
return workbook.xlsx.readFile('./excel_files/blank.xlsx')
.then(function() {
var worksheet = workbook.getWorksheet(1);
// Write FHCW
for (i=0; i < fhcw.length; i++) {
col = i+6;
worksheet.getCell('E'+ col).value = fhcw[i];
}
console.log(team);
// not sure what you thought this would TEAM[args] would have translated to, but it wouldn't have been a string, so I put ??? for now
// also, making the assumption that writeFile() returns a Promise.
return workbook.xlsx.writeFile('./excel_files/' + team.??? + '/' + team.friendly_name + '.xlsx');
});
}
Promise.all(TEAM.map(writeTeamFile))
.then(() => console.log('Done')
.catch(err => console.error('Error'));
Promises are for async, and generally when you have a group of things, you want to do them in parallel because it's faster. That's why the parallel version is a lot cleaner; we don't have to call things recursively.

Since you are not writing to the same file, you can loop through the files and perform read/write operations.
You can bind the index value in the loop, and move ahead.
Example code.
var filesWrittenCount = 0;
for(var team = 0; team < TEAM[args].length; team++){
(function(t){
// call your async function here.
workbook.xlsx.readFile('./excel_files/blank.xlsx').then(function() {
var worksheet = workbook.getWorksheet(1);
// Write FHCW
for (i=0; i < fhcw.length; i++) {
col = i+6;
worksheet.getCell('E'+ col).value = fhcw[i];
}
console.log(TEAM[args][t])
workbook.xlsx.writeFile('./excel_files/' + TEAM[args] + '/' + TEAM[args][t]['friendly_name'] + '.xlsx');
// update number of files written.
filesWrittenCount++;
if (totalFiles == filesWrittenCount) {
// final callback function - This states that all files are updated
done();
}
});
}(team));
}
function done() {
console.log('All data has been loaded');
}
Hope this helps.

Related

Saving data from spawned process into variables in Javascript

I am having issues saving the results from a spawned python process. After converting data into json, I push the data to an array defined within the function before the spawn process is called, but the array keeps returning undefined. I can console.log and show the data correctly, but the array that is returned from the function is undefined. Any input would be greatly appreciated. Thanks in advance.
function sonar_projects(){
const projects = [];
let obj;
let str = '';
const projects_py = spawn('python', ['sonar.py', 'projects']);
let test = projects_py.stdout.on('data', function(data){
let projects = [];
let json = Buffer.from(data).toString()
str += json
let json2 = json.replace(/'/g, '"')
obj = JSON.parse(json2)
console.log(json2)
for(var dat in obj){
var project = new all_sonar_projects(obj[dat].key, obj[dat].name, obj[dat].qualifier, obj[dat].visibility, obj[dat].lastAnalysisDate);
projects.push(project);
}
for (var i= 0; i < projects.length; i++){
console.log(projects[i].key + ' ' + projects[i].name + ' ' + projects[i].qualifier + ' ' + projects[i].visibility + ' ' + projects[i].lastAnalysisDate)
}
console.log(projects)
return projects;
});
}
First of all, going through the NodeJS documentation, we have
Child Process
[child_process.spawn(command, args][, options])
Child Process Class
Stream
Stream Readable Event "data"
Even though projects_py.stdout.on(event_name, callback) accepts an callback, it returns either the EventEmitter-like object, where the events are registered (in this case, stdout that had it's method on called), or the parent element (the ChildProcess named projects_py).
It's because the callback function will be called every time the "data" event occurs. So, if the assign of the event returned the same as the callback function, it'd return only one single time, and then every next happening of the "data" event would be processed by the function, but not would be done.
In this kind of situation, we need a way to collect and compile the data of the projects_py.stdout.on("data", callback) event after it's done.
You already have the collecting part. Now see the other:
Right before you create the on "data" event, we create a promise to encapsulate the process:
// A promise says "we promise" to have something in the future,
// but it can end not happening
var promise = new Promise((resolve, reject) => {
// First of all, we collect only the string data
// as things can come in parts
projects_py.stdout.on('data', function(data){
let json = Buffer.from(data).toString()
str += json
});
// When the stream data is all read,
// we say we get what "we promised", and give it to "be resolved"
projects_py.stdout.on("end", () => resolve(str));
// When something bad occurs,
// we say what went wrong
projects_py.stdout.on("error", e => reject(e));
// With every data collected,
// we parse it (it's most your code now)
}).then(str => {
let json2 = str.replace(/'/g, '"')
// I changed obj to arr 'cause it seems to be an array
let arr = JSON.parse(json2)
//console.log(json2)
const projects = []
// With for-of, it's easier to get elements of
// an object / an array / any iterable
for(var dat of arr){
var project = new all_sonar_projects(
dat.key, dat.name, dat.qualifier,
dat.visibility, dat.lastAnalysisDate
);
projects.push(project);
}
// Template strings `a${variable or expression}-/b`
// are easier to compile things into a big string, yet still fast
for(var i = 0; i < projects.length; i++)
console.log(
`${projects[i].key} ${projects[i].name} ` +
`${projects[i].qualifier} ${projects[i].visibility} ` +
projects[i].lastAnalysisDate
)
console.log(projects)
// Your projects array, now full of data
return projects;
// Finally, we catch any error that might have happened,
// and show it on the console
}).catch(e => console.error(e));
}
Now, if you want to do anything with your array of projects, there are two main options:
Promise (then / catch) way
// Your function
function sonar_projects(){
// The new promise
var promise = ...
// As the working to get the projects array
// is already all set up, you just use it, but in an inner scope
promise.then(projects => {
...
});
}
Also, you can just return the promise variable and do the promise-things with it out of sonar_projects (with then / catch and callbacks).
async / await way
// First of all, you need to convert your function into an async one:
async function sonar_projects(){
// As before, we get the promise
var promise = ...
// We tell the function to 'wait' for it's data
var projects = await promise;
// Do whatever you would do with the projects array
...
}

Async/ await in Firebase Cloud Function says Error Expression has type `void`

I'm trying to use async/await in my firebase function but I am getting an error. I have marked the function as async but when I try to use await inside of it I get error: Expression has type void. Put it on its own line as a statement.
I think this is strange because I think it each await function call is already in its own line as a statement. So, I'm not sure what to do. Any help would be appreciated.
For clarity, I am making a request with the Cheerio web scrape library, and then trying to make two async function calls during each loop with the .each method.
export const helloWorld = functions.https.onRequest((req, response) => {
const options = {
uri: 'https://www.cbssports.com/nba/scoreboard/',
transform: function (body) {
return cheerio.load(body);
}
};
request(options)
.then(($) => {
$('.live-update').each((i, element) => {
const homeTeamAbbr = $(element).find('tbody').children('tr').eq(0).find('a').html().split("alt/").pop().split('.svg')[0];
const awayTeamAbbr = $(element).find('tbody').children('tr').eq(1).find('a').html().split("alt/").pop().split('.svg')[0];
const homeTeam = $(element).find('tbody').children('tr').eq(0).find('a.team').text().trim();
const awayTeam = $(element).find('tbody').children('tr').eq(1).find('a.team').text().trim();
let homeTeamStatsURL = $(element).find('tbody').children('tr').eq(0).find('td').html();
let awayTeamStatsURL = $(element).find('tbody').children('tr').eq(1).find('td').html();
const gameTime = $(element).find('.pregame-date').text().trim();
homeTeamStatsURL = homeTeamStatsURL.match(/href="([^"]*)/)[1] + "roster";
awayTeamStatsURL = awayTeamStatsURL.match(/href="([^"]*)/)[1] + "roster";
const matchupString = awayTeamAbbr + "#" + homeTeamAbbr;
const URLString = "NBA_" + urlDate + "_" + matchupString;
// var docRef = database.collection('NBASchedule').doc("UpcommingSchedule");
// var boxScoreURL = "www.cbssports.com/nba/gametracker/boxscore/" + URLString;
// var setAda = docRef.set({[URLString]:{
// homeTeam: homeTeam,
// awayTeam: awayTeam,
// date: gameTime,
// homeTeamAbbr: homeTeamAbbr,
// awayTeamAbbr: awayTeamAbbr,
// homeTeamStatsURL: homeTeamStatsURL,
// awayTeamStatsURL: awayTeamStatsURL,
// boxScoreURL: boxScoreURL
// }}, { merge: true });
getTeamPlayers(homeTeamStatsURL, matchupString);
getTeamPlayers(awayTeamStatsURL, matchupString);
console.log("retrieved schedule for "+ matchupString + " on " + urlDate)
});
response.send("retrieved schedule");
})
.catch(function (err) {
console.log("error " + err);
});
});
The function I am calling just makes another request and then I'm trying to log some data.
function getTeamPlayers(playerStatsURL, matchupString) {
const options = {
uri: playerStatsURL,
transform: function (body) {
return cheerio.load(body);
}
};
console.log(playerStatsURL + " stats url");
request(options)
.then(($) => {
console.log('inside cheerio')
$('tbody').children('tr').each(function(i, element){
const playerName = $(element).children('td').eq(1).children('span').eq(1).find('a').text().trim();
const injury = $(element).children('td').eq(1).children('span').eq(1).children('.icon-moon-injury').text().trim();
const news = $(element).children('td').eq(1).children('span').eq(1).children('.icon-moon-news').text().trim();
const playerUrl = $(element).children('td').eq(1).children('span').eq(1).find('a').attr('href');
const playerLogsUrl = "https://www.cbssports.com" + playerUrl.replace('playerpage', 'player/gamelogs/2018');
console.log(playerName + ": Inj: " + injury + " News: " + news);
// database.collection('NBAPlayers').add({[playerName]:{
// '01 playerName': playerName,
// '03 playerLogsUrl': playerLogsUrl,
// '04 inj': injury,
// '05 news': news
// }})
// .then(docRef => {
// console.log("ID " + docRef.id);
// //getPlayerLogs(playerLogsUrl, playerName, docRef.id);
// })
// .catch(error => console.error("Error adding document: ", error));
});
});
}
async/await is supported in the Node version 8 (which is deployable as Cloud Functions for Firebase). Specifically, use 8.6.1 (at the time of this writing).
As for awaiting x2 inside a loop - I don't think this is best practice.
Instead, push all these requests into an array, then Promise.all so as to fetch all in parallel.
just in case someone comes after. As Ron stated before, you shouldn't use async/await inside a traditional for loop if the second call doesn't need the value of the first, as you will increase the run time of the code.
Furthermore you can't use async/await inside of a forEach=> or a map=> loop, as it won't stop and wait for the promise to resolve.
The most efficient way is to use Promise.all([]). Here I leave a great youtube video of a crack that explains async/await and Promises => https://www.youtube.com/watch?v=vn3tm0quoqE&t=6s
As to one of the questions in the comments by DarkHorse:
But all I want is for getTeamPlayers() to execute and write to the database so I don't know why I need to return anything.
In Firebase Functions all functions need to return something before the final response.
For example in this case, he has created an http function. Before he ends the function with response.send("retrieved schedule"); you need to finish each and every function triggered. As Firebase Functions will clean up after the final response, erasing and stoping anything that it is still running. So any function that hasn't finish will be killed before doing its job.
Returning a promise is the way Firebase Functions knows when all executions have finished and can clean up.
Hop it helps :)

How do I parse multiple pages?

I have been attempting to parse a sites table data into a json file, which I can do if I do each page one by one, but seeing as there are 415 pages that would take a while.
I have seen and read a lot of StackOverflow questions on this subject but I don't seem able to modify my script so that it;
Scrapes each page and extracts the 50 items with item IDS per page
Do so in a rate limited way so I don't negatively affect the server
The script waits until all requests are done so I can write each item + item id to a JSON file.
I believe you should be able to do this using request-promise and promise.all but I cannot figure it out.
The actual scraping of the data is fine I just cannot make the code, scrape a page, then go to the next URL with a delay or pause inbetween requests.
Code below is the closest I have got, but I get the same results multiple times and I cannot slow the request rate down.
Example of the page URLS:
http://test.com/itemlist/1
http://test.com/itemlist/2
http://test.com/itemlist/3 etc (upto 415)
for (var i = 1; i <= noPages; i++) {
urls.push({url: itemURL + i});
console.log(itemURL + i);
}
Promise.map(urls, function(obj) {
return rp(obj).then(function(body) {
var $ = cheerio.load(body);
//Some calculations again...
rows = $('table tbody tr');
$(rows).each(function(index, row) {
var children = $(row).children();
var itemName = children.eq(1).text().trim();
var itemID = children.eq(2).text().trim();
var itemObj = {
"id" : itemID,
"name" : itemName
};
itemArray.push(itemObj);
});
return itemArray;
});
},{concurrency : 1}).then(function(results) {
console.log(results);
for (var i = 0; i < results.length; i++) {
// access the result's body via results[i]
//console.log(results[i]);
}
}, function(err) {
// handle all your errors here
console.log(err);
});
Apologies for perhaps misunderstand node.js and its modules, I don't really use the language but I needed to scrape some data and I really don't like python.
since you need requests to be run only one by one Promise.all() would not help.
Recursive promise (I'm not sure if it's correct naming) would.
function fetchAllPages(list) {
if (!list || !list.length) return Promise. resolve(); // trivial exit
var urlToFetch = list.pop();
return fetchPage(urlToFetch).
then(<wrapper that returns Promise will be resolved after delay >).
then(function() {
return fetchAllPages(list); // recursion!
});
}
This code still lacks error handling.
Also I believe it can become much more clear with async/await:
for(let url of urls) {
await fetchAndProcess(url);
await <wrapper around setTimeout>;
}
but you need to find /write your own implementation of fetch() and setTimeout() that are async
After input from #skyboyer suggesting using recursive promises I was lead to a GitHub Gist called Sequential execution of Promises using reduce()
Firstly I created my array of URLS
for (var i = 1; i <= noPages; i++) {
//example urls[0] = "http://test.com/1"
//example urls[1] = "http://test.com/2"
urls.push(itemURL + i);
console.log(itemURL + i);
}
Then
var sequencePromise = urls.reduce(function(promise, url) {
return promise.then(function(results) {
//fetchIDsFromURL async function (it returns a promise in this case)
//when the promise resolves I have my page data
return fetchIDsFromURL(url)
.then(promiseWithDelay(9000))
.then(itemArr => {
results.push(itemArr);
//calling return inside the .then method will make sure the data you want is passed onto the next
return results;
});
});
}, Promise.resolve([]));
// async
function fetchIDsFromURL(url)
{
return new Promise(function(resolve, reject){
request(url, function(err,res, body){
//console.log(body);
var $ = cheerio.load(body);
rows = $('table tbody tr');
$(rows).each(function(index, row) {
var children = $(row).children();
var itemName = children.eq(1).text().trim();
var itemID = children.eq(2).text().trim();
var itemObj = {
"id" : itemID,
"name" : itemName
};
//push the 50 per page scraped items into an array and resolve with
//the array to send the data back from the promise
itemArray.push(itemObj);
});
resolve(itemArray);
});
});
}
//returns a promise that resolves after the timeout
function promiseWithDelay(ms)
{
let timeout = new Promise(function(resolve, reject){
setTimeout(function()
{
clearTimeout(timeout);
resolve();
}, ms);
});
return timeout;
}
Then finally call .then on the sequence of promises, the only issue I had with this was returning multiple arrays inside results with the same data in each, so since all data is the same in each array I just take the first one which has all my parsed items with IDs in it, then I wrote it to a JSON file.
sequencePromise.then(function(results){
var lastResult = results.length;
console.log(results[0]);
writeToFile(results[0]);
});

How to handle async Node.js in a loop

I have such a loop :
var i,j,temparray,chunk = 200;
for (i=0,j=document.mainarray.length; i<j; i+=chunk) {
temparray = document.mainarray.slice(i,i+chunk);
var docs = collection.find({ id: { "$in": temparray}}).toArray();
docs.then(function(singleDoc)
{
if(singleDoc)
{
console.log("single doc length : " + singleDoc.length);
var t;
for(t = 0, len = singleDoc.length; t < len;t++)
{
fs.appendFile("C:/Users/x/Desktop/names.txt", singleDoc[t].name + "\n", function(err) {
if(err) {
return console.log(err);
}
});
}
}
});
}
The loop iterates for two times. In first iteration it gets 200 elements, in second, it gets 130 elements. And when I open the .txt file, I see only 130 names. I guess because of the async nature of Node.js, only second part of the array is processed. What should I do to get all parts of the array to be processed? Thanks in advance.
EDIT : I finally turned the code to this :
var generalArr = [];
var i,j,temparray,chunk = 200;
for (i=0,j=document.mainarray.length; i<j; i+=chunk) {
temparray = document.mainarray.slice(i,i+chunk);
generalArr.push(temparray);
}
async.each(generalArr, function(item, callback)
{
var docs = collection.find({ id: { "$in": item}}).toArray();
docs.then(function(singleDoc)
{
if(singleDoc)
{
console.log("single doc length : " + singleDoc.length);
var t;
for(t = 0, len = singleDoc.length; t < len;t++)
{
fs.appendFile("C:/Users/x/Desktop/names.txt", singleDoc[t].name + "\n", function(err) {
if(err) {
return console.log(err);
}
});
}
}
});
callback(null);
})
When I change this line :
var docs = collection.find({ id: { "$in": item}}).toArray();
To this line :
var docs = collection.find({ id: { "$in": item}}).project({ name: 1 }).toArray();
It works, I'm able to print all names. I guess there is a problem with memory when I try without .project(). How can I make this work without using project? Should I change some memory limits? Thanks in advance.
I think your code is unnecessary complicated and appending file in a loop is very expensive when compared to in-memory computation. A better way would be to write to file just once.
var i, j, temparray, chunk = 200;
for (i = 0, j = document.mainarray.length; i < j; i += chunk) {
temparray = document.mainarray.slice(i, i + chunk);
generalArr.push(temparray);
}
const queryPromises = [];
generalArr.forEach((item, index) => {
queryPromises.push(collection.find({ id: { "$in": item } }).toArray());
});
let stringToWrite = '';
Promise.all(queryPromises).then((result) => {
result.forEach((item) => {
item.forEach((element) => {
//create a single string which you want to write
stringToWrite = stringToWrite + "\n" + element.name;
});
});
fs.appendFile("C:/Users/x/Desktop/names.txt", stringToWrite, function (err) {
if (err) {
return console.log(err);
} else {
// call your callback or return
}
});
});
In the code above, I do the following.
Wait for all the db queries to finish
Lets iterate over this list and create one string that we need to write to the file
Write to the file
Once you go asynchronous you cannot go back - all your code needs to be asynchronous. In node 8 you handle this with async and await keywords. In older versions you can use Promise - async/await are just syntax sugar for it anyway.
However, most of the API in node are older than Promise, and so they use callbacks instead. There is a promisify function to update callback functions to promises.
There are two ways to handle this, you can let all the asynchronous actions happen at the same time, or you can chain them one after another (which preserves order but takes longer).
So, collection.find is asynchronous, it either takes a callback function or returns a Promise. I'm going to assume that the API you're using does the latter, but your problem could be the former (in which case look up promisify).
var findPromise = collection.find({ id: { "$in": item}});
Now, at this point findPromise holds the running find action. We say this is a promise that resolves (completes successfully) or rejects (throws an error). We want to queue up an action to do once it completes, and we do that with then:
// The result of collection.find is the collection of matches
findPromise.then(function(docs) {
// Any code we run here happens asynchronously
});
// Code here will run first
Inside the promise we can return further promises (allowing them to be chained - complete one async, then complete the next, then fire the final resolve once all done) or use Promise.all to let them all happen in parallel and resolve once done:
var p = new Promise(function(resolve, reject) {
var findPromise = collection.find({ id: { "$in": item}});
findPromise.then(function(docs) {
var singleDocNames = [];
for(var i = 0; i < docs.length; i++) {
var singleDoc = docs[i];
if(!singleDoc)
continue;
for(var t = 0; t < singleDoc.length; t++)
singleDocNames.push(singleDoc[t].name);
}
// Resolve the outer promise with the final result
resolve(singleDocNames);
});
});
// When the promise finishes log it to the console
p.then(console.log);
// Code inline here will fire before the promise
This is much easier in node 8 with async/await:
async function p() {
// Await puts the rest of this function in the .then() of the promise
const docs = await collection.find({ id: { "$in": item}});
const singleDocNames = [];
for(var i = 0; i < docs.length; i++) {
// ... synchronous code unchanged ...
}
// Resolve the outer promise with the final result
return singleDocNames;
});
// async functions can be treated like promises
p().then(console.log);
If you need to write the results to a text file asynchronously there are a couple of ways to do it - you can wait until the end and write all of them, or chain a promise to write them after each find, though I find parallel IO operations tend to be at more risk of deadlocks.
Code above have multiple issues about asynchronous control flow. Similar code possible can exists, but only if case of using ES7 async/await operators on all async operation.
Of course, you can easily achieve solution by promises sequence. Solution:
let flowPromise = Promise.resolve();
const chunk = 200;
for (let i=0,j=document.mainarray.length; i<j; i+=chunk) {
flowPromise = flowPromise.then(() => {
const temparray = document.mainarray.slice(i,i+chunk);
const docs = collection.find({ id: { "$in": temparray}}).toArray();
return docs.then((singleDoc) => {
let innerFlowPromise = Promise.resolve();
if(singleDoc) {
console.log("single doc length : " + singleDoc.length);
for(let t = 0, len = singleDoc.length; t < len;t++) {
innerFlowPromise = innerFlowPromise.then(() => new Promise((resolve, reject) =>
fs.appendFile(
"C:/Users/x/Desktop/names.txt", singleDoc[t].name + "\n",
err => (err ? reject(err) : resolve())
)
));
}
}
return innerFlowPromise;
}
});
}
flowPromise.then(() => {
console.log('Done');
}).catch((err) => {
console.log('Error: ', err);
})
When use async-like control flow, based on Promises, always remember that every loop and function call sequence will not pause execution till async operation be done, so include all then sequences manually. Or use async/await syntax.
Which version of nodejs are you using? You should use the native async/await support which is built into newer versions nodejs (no libraries required). Also note, fs.appendFile is asyncronous so you need to either use a library like promisify to transform the callback into a promise or just use the appendFileSync and suffer the blocking IO (but might be okay for you, depending on the use case.)
async function(){
...
for(var item of generalArr) {
var singleDoc = await collection.find({ id: { "$in": item}}).toArray();
// if(singleDoc) { this won't do anything, since collection.find will always return something even if its just an empty array
console.log("single doc length : " + singleDoc.length);
var t;
for(t = 0, len = singleDoc.length; t < len;t++){
fs.appendFileSync("C:/Users/x/Desktop/names.txt", singleDoc[t].name + "\n");
}
};
}
var docs = collection.find({ id: { "$in": document.mainarray}}), // returns a cursor
doc,
names = [],
toInsert;
function saveToFile(cb) {
toInsert = names.splice(0,100);
if(!toInsert.length) return cb();
fs.appendFile("C:/Users/x/Desktop/names.txt", toInsert.join("\n"), cb);
}
(function process() {
if(docs.hasNext()) {
doc = docs.next();
doc.forEach(function(d) {
names.push(d.name);
});
if(names.length === 100) {
// save when we have 100 names in memory and clear the memory
saveToFile(function(err) {
process();
});
} else {
process();
}
} else {
saveToFile(function(){
console.log('All done');
});
}
}()); // invoke the function
If you can't solve your issue using core modules and basic nodejs, there is most likely a lack of understanding of how things work or insufficient knowledge about a library (in this case FileSystem module).
Here is how you can solve your issue, without 3th party libraries and such.
'use strict';
const
fs = require('fs');
let chunk = 200;
// How many rounds of array chunking we expect
let rounds = Math.ceil(mainArray.length/chunk);
// copy to temp (for the counter)
let tempRounds = rounds;
// set file name
let filePath = './names.txt'
// Open writable Stream
let myFileStream = fs.createWriteStream(filePath);
// from round: 0-${rounds}
for (let i = 0; i < rounds; i++) {
// assume array has ${chunk} elements left in this round
let tempChunk = chunk;
// if ${chunk} is to big i.e. i=3 -> chunk = 600 , but mainArray.length = 512
// This way we adjust the last round for "the leftovers"
if (mainArray.length < i*chunk) tempChunk = Math.abs(mainArray.length - i*chunk);
// slice it for this round
let tempArray = mainArray.slice(i*chunk, i*chunk + tempChunk);
// get stuff from DB
let docs = collection.find({ id: { "$in": tempArray}}).toArray();
docs.then(function(singleDoc){
// for each name in the doc
for (let j = 0; j < singleDoc.length; j++) {
// write to stream
myFileStream.write(singleDoc[t].name + "\n");
}
// declare round done (reduce tempRounds) and check if it hits 0
if (!--tempRounds) {
// if all rounds are done, end the stream
myFileStream.end();
// BAM! you done
console.log("Done")
}
});
}
The key is to use fs.WritableStreams :)
link here to docs

node.js looping through GETs with promise

I'm new to promises and I'm sure there's an answer/pattern out there but I just couldn't find one that was obvious enough to me to be the right one. I'm using node.js v4.2.4 and https://www.promisejs.org/
This should be pretty easy I think...I need to do multiple blocks of async in a specific order, and one of the middle blocks will be looping through an array of HTTP GETs.
//New Promise = asyncblock1 - FTP List, resolve the returned list array
//.then(asynchblock2(list)) - loop through list array and HTTP GET needed files
//.then(asynchblock3(list)) - update local log
I tried creating a new Promise, resolving it, passing the list to the .then, doing the GET loop, then the file update. I tried using a nested promise.all inside asynchblock2, but it's actually going in reverse order, 3, 2, and 1 due to the timing of those events. Thanks for any help.
EDIT: Ok, this is the pattern that I'm using which works, I just need a GET loop in the middle one now.
var p = new Promise((resolve, reject) => {
setTimeout(() => {
console.log('2 sec');
resolve(1);
},
2000);
}).then(() => {
return new Promise((resolve) => {
setTimeout(() => {
console.log('1.5 sec');
// instead of this section, here I'd like to do something like:
// for(var i = 0; i < dynamicarray.length; i++){
// globalvar[i] = ftpclient.getfile(dynamicarray[i])
// }
// after this loop is done, resolve
resolve(1);
},
1500);
});
}).then(() => {
return new Promise((resolve) => {
setTimeout(() => {
console.log('1 sec');
resolve(1);
},
1000);
});
});
EDIT Here is the almost working code!
var pORecAlert = (function(){
var pa;
var newans = [];
var anstodownload = [];
var anfound = false;//anfound in log file
var nexttab;
var lastchar;
var po;
var fnar = [];
var antext = '';
//-->> This section works fine; it's just creating a JSON object from a local file
try{
console.log('trying');
porfile = fs.readFileSync('an_record_files.json', 'utf8');
if(porfile == null || porfile == ''){
console.log('No data in log file - uploaded_files_data.json being initialized!');
plogObj = [];
}
else{
plogObj = JSON.parse(porfile);
}
}
catch(jpfp){
console.log('Error parsing log file for PO Receiving Alert: ' + jpfp);
return endPORecAlertProgram();
};
if((typeof plogObj) === 'object'){
console.log('an_record_files.json log file found and parsed for PO Receiving Alert!');
}
else{
return mkError(ferror, 'pORecAlert');
};
//finish creating JSON Object
pa = new Client();
pa.connect(ftpoptions);
console.log('FTP Connection for FTP Check Acknowledgement begun...');
pa.on('greeting', function(msg){
console.log('FTP Received Greeting from Server for ftpCheckAcknowledgement: ' + msg);
});
pa.on('ready', function(){
console.log('on ready');
//START PROMISE LIST
var listpromise = new Promise((reslp, rejlp) => {
pa.list('/public_html/test/out', false, (cerr, clist) => {
if(cerr){
return mkError(ferror, 'pORecAlert');
}
else{
console.log('Resolving clist');
reslp(clist);
}
});
});
listpromise.then((reclist) => {
ftpplist:
for(var pcl = 0; pcl < reclist.length; pcl++){
console.log('reclist iteration: ' + pcl);
console.log('checking name: ', reclist[pcl].name);
if(reclist[pcl].name.substring(0, 2) !== 'AN'){
console.log('Not AN - skipping');
continue ftpplist;
}
else{//found an AN
for(var plc = 0; plc < plogObj.length; plc++){
if(reclist[pcl].name === plogObj[plc].anname){
//console.log('Found reclist[pcl].name in local log');
anfound = true;
};
};
if(anfound === false){
console.log('Found AN file to download: ', reclist[pcl].name);
anstodownload.push(reclist[pcl].name);
};
};
};
console.log('anstodownload array:');
console.dir(anstodownload);
return anstodownload;
}).then((fnar) => {
//for simplicity/transparency, here is the array being overwritten
fnar = new Array('AN_17650_37411.699.txt', 'AN_17650_37411.700', 'AN_17650_37411.701', 'AN_17650_37411.702.txt', 'AN_17650_37411.801', 'AN_17650_37411.802.txt');
return Promise.all(fnar.map((gfname) => {
var nsalertnames = [];
console.log('Getting: ', gfname);
debugger;
pa.get(('/public_html/test/out/' + gfname), function(err, anstream){//THE PROBLEM IS THAT THIS GET GETS TRIGGERED AN EXTRA TIME FOR EVERY OTHER FILE!!!
antext = '';
console.log('Get begun for: ', gfname);
debugger;
if(err){
ferror.nsrest_trace = 'Error - could not download new AN file!';
ferror.details = err;
console.log('Error - could not download new AN file!');
console.log('************************* Exiting *************************')
logError(ferror, gfname);
}
else{
// anstream.on('data', (anchunk) => {
// console.log('Receiving data for: ', gfname);
// antext += anchunk;
// });
// anstream.on('end', () => {
// console.log('GET end for: ', gfname);
// //console.log('path to update - gfname ', gfname, '|| end text.');
// fs.appendFileSync(path.resolve('test/from', gfname), antext);
// console.log('Appended file');
// return antext;
// });//end end
};
});//get end
}));//end Promise.all and map
}).then((res99) => {
// pa.end();
// return Promise(() => {
console.log('end all. res99: ', res99);
// //res4(1);
// return 1;
// });
});
});
})();
-->> What happens here:
So I added the almost working code. What is happening is that for every other file, an additional Get request gets made (I don't know how it's being triggered), which fails with an "Unable to make data connection".
So for my iteration over this array of 6, there ends up being 9 Get requests. Element 1 gets requested (works and expected), then 2 (works and expected), then 2 again (fails and unexpected/don't know why it was triggered). Then 3 (works and expected), then 4 (works and expected), then 4 again (fails and unexpected) etc
what you need is Promise.all(), sample code for your app:
...
}).then(() => {
return Promise.all(arry.map(item => ftpclient.getFile(item)))
}).then((resultArray) => {
...
So thanks for the help (and the negative votes with no useful direction!)
I actually reached out to a good nodejs programmer and he said that there seemed to be a bug in the ftp module I was using, and even when trying to use a blackbird .map, the quick succession of requests somehow kicked off an error. I ended up using promise-ftp, blackbird, and promiseTaksQueue - the kicker was that I needed interval. Without it the ftp would end up causing a strange illogical error in the ftp module.
You need the async library. Use the async.eachSeries in situations where you need to use asynchronous operations within a loop, then execute a function when all of those are complete. There are many variations depending on the flow you want but this library does it all.
https://github.com/caolan/async
async.each(theArrayToLoop, function(item, callback) {
// Perform async operation on item here.
doSomethingAsync(item).then(function(){
callback();
})
}, function(err){
//All your async calls are finished continue along here
});

Categories

Resources