nodejs - Help "promisifying" a file read with nested promises - javascript

So I've recently delved into trying to understand promises and the purpose behind them due to javascripts asynchronous behavior. While I "think" I understand, I still struggle with how to promisify something to return the future value, then execute a new block of code to do something else. Two main node modules I'm using:
pg-promise
exceljs
What I'd like to do is read a file, then once fully read, iterate of each worksheet executing DB commands. Then once all worksheets are processed, go back and delete the original file I read. Here is the code I have. I have it working to the point everything writes into the database just fine, even when there are multiple worksheets. What I don't have working is setting it up to identify when all the worksheets have been fully processed, then to go remove the file
workbook.csv.readFile(fileName)
.then(function () {
// this array I was going to use to somehow populate a true/false array.
// Then when done with each sheet, push a true into the array.
// When all elements were true could signify all the processing is done...
// but have no idea how to utilize this!
// So left it in to take up space because wtf...
var arrWorksheetComplete = [];
workbook.eachSheet(function (worksheet) {
console.log(worksheet.name);
db.tx(function (t) {
var insertStatements = [];
for (var i = 2; i <= worksheet._rows.length; i++) {
// here we create a new array from the worksheet, as we need a 0 index based array.
// the worksheet values actually begins at element 1. We will splice to dump the undefined element at index 0.
// This will allow the batch promises to work correctly... otherwise everything will be offset by 1
var arrValues = Array.from(worksheet.getRow(i).values);
arrValues.splice(0, 1);
// these queries are upsert. Inserts will occur first, however if they error on the constraint, an update will occur instead.
insertStatements.push(t.one('insert into rq_data' +
'(col1, col2, col3) ' +
'values($1, $2, $3) ' +
'ON CONFLICT ON CONSTRAINT key_constraint DO UPDATE SET ' +
'(prodname) = ' +
'($3) RETURNING autokey',
arrValues));
}
return t.batch(insertStatements);
})
.then(function (data) {
console.log('Success:', 'Inserted/Updated ' + data.length + ' records');
})
.catch(function (error) {
console.log('ERROR:', error.message || error);
});
});
});
I would like to be able to say
.then(function(){
// everything processed!
removeFile(fileName)
// this probably also wouldn't work as by now fileName is out of context?
});
But I'm super confused when having a promise inside a promise.. I have the db.tx call which is essentially a promise nested inside the .eachSheet function.
Please help a dumb programmer understand! Been beating head against wall for hours on this one. :)

If i understand correctly, you're trying to chain promises.
I suggest you to read this great article on Promises anti-pattern (see 'The Collection Kerfuffle' section)
If you need to execute promises in series, this article suggests to use reduce.
I'll rewrite your snippet to:
workbook.csv.readFile(fileName).then(function () {
processWorksheets().then(function() {
// all worksheets processed!
});
});
function processWorksheets() {
var worksheets = [];
// first, build an array of worksheet
workbook.eachSheet(function (worksheet) {
worksheets.push(worksheet);
});
// then chain promises using Array.reduce
return worksheets.reduce(function(promise, item) {
// promise is the the value previously returned in the last invocation of the callback.
// item is a worksheet
// when the previous promise will be resolved, call saveWorksheet on the next worksheet
return promise.then(function(result) {
return saveWorksheet(item, result);
});
}, Promise.resolve()); // start chain with a 'fake' promise
}
// this method returns a promise
function saveWorksheet(worksheet, result) {
return db.tx(function (t) {
var insertStatements = [];
for (var i = 2; i <= worksheet._rows.length; i++) {
// here we create a new array from the worksheet, as we need a 0 index based array.
// the worksheet values actually begins at element 1. We will splice to dump the undefined element at index 0.
// This will allow the batch promises to work correctly... otherwise everything will be offset by 1
var arrValues = Array.from(worksheet.getRow(i).values);
arrValues.splice(0, 1);
// these queries are upsert. Inserts will occur first, however if they error on the constraint, an update will occur instead.
insertStatements.push(t.one('insert into rq_data' +
'(col1, col2, col3) ' +
'values($1, $2, $3) ' +
'ON CONFLICT ON CONSTRAINT key_constraint DO UPDATE SET ' +
'(prodname) = ' +
'($3) RETURNING autokey',
arrValues));
}
return t.batch(insertStatements);
})
// this two below can be removed...
.then(function (data) {
return new Promise((resolve, reject) => {
console.log('Success:', 'Inserted/Updated ' + data.length + ' records');
resolve();
});
})
.catch(function (error) {
return new Promise((resolve, reject) => {
console.log('ERROR:', error.message || error);
reject();
});
});
}
Don't forget to include the promise module:
var Promise = require('promise');
I haven't tested my code, could contains some typo errors.

Related

Array of objects in javascript not returning as expected

I have a function which returns a list of objects in Javascript, and I'm calling this function from another and attempting to use some of the values from it, but whenever I try to access said values, they come back undefined.
This is my function which generates the list - the idea is that it creates a sqlite3 database if it does not exist, and returns an array containing every event.
function listAllEvents() {
const sqlite3 = require('sqlite3').verbose();
const db = new sqlite3.Database('schedule.db');
const selectionArray = [];
db.serialize(() => {
db.run(`
CREATE TABLE IF NOT EXISTS todo (
name text,
date text,
id text primary key
)
`);
db.all('SELECT * FROM todo ORDER BY date', [], (err, rows) => {
if (err) {
throw err;
}
rows.forEach((row) => {
selectionArray.push(row);
});
});
});
return selectionArray;
}
I call this function from another, but when I try to access values from the array, they don't seem to be working and I can't quite figure it out.
function displayUpcomingEvents() {
const events = listAllEvents();
// console.log(events); <-- This line here! In the console, it correctly states the length of the array
// console.log(events.length) <-- This line, however, returns 0. Why?
// console.log(events[0]) <-- This doesn't work either, it just returns "undefined".
for (let i = 0; i < events.length; i += 1) {
$('#upcomingEvents').after('<li>asdf</li>');
}
}
For example, if I were to create two events in the database, through the console,
events is an Array(2) with indices
- 0: {name: "my_event", date: "2019-06-04", id: "c017c392d446d4b2"}
- 1: {name: "my_event_2", date: "2019-06-04", id: "6d655ac8dd02e3fd"},
events.length returns 0,
and events[0] returns undefined.
Why is this, and what can I do to fix it?
The possible reason why this is happening, is because of the async nature of JS, that means all the console.log statements are getting executed before the successful execution of the listAllEvents() function,
So my suggestion is to try using the promises, and perform all the actions mentioned after the listAllEvents() function only when that function returns a promise.
You can also try making the function async and using await to wait for its successful execution. (Much Smarter Choice will be using async)
Link to ASYNC Functions and Usage
Link to Promises
Also you can check the validity of answer by doing console.log(row) where you are pushing rows to the array. You will observer that the console.log(row) will be executed at the last, after printing events and other log statements.
The problem is that your function is returning the variable before a value is set. The db.serialize function will run asynchronously (outside the normal flow of the program) and the return statement will run immediately after. One thing you can do is use async/await in conjunction with Promise. In this case the the variable results will wait for the promise to be resolved before continuing to the next line.
async function listAllEvents() {
const selectionArray = [];
let promise = new Promise( function (resolve, reject) {
db.serialize(() => {
db.run(
CREATE TABLE IF NOT EXISTS todo (
name text,
date text,
id text primary key
)
);
db.all('SELECT * FROM todo ORDER BY date', [], (err, rows) => {
if (err) {
// add code here to reject promise
throw err;
}
rows.forEach((row) => {
selectionArray.push(row);
});
resolve(selectionArray);// resolve the promise
});
});
});
let results = await promise;
return results;
};
async function displayUpcomingEvents() {
const events = await listAllEvents();
// console.log(events); <-- This line here! In the console, it correctly states the length of the array
// console.log(events.length) <-- This line, however, returns 0. Why?
// console.log(events[0]) <-- This doesn't work either, it just returns "undefined".
for (let i = 0; i < events.length; i += 1) {
$('#upcomingEvents').after('<li>asdf</li>');
}
}
Note here that the displayUpcomingEvents function will also need to be async or you cannot use the await keyword.
Additional reading for Promise keyword MDN: Promise
Additional reading for Async/Await MDN: Asyn/Await

How do I parse multiple pages?

I have been attempting to parse a sites table data into a json file, which I can do if I do each page one by one, but seeing as there are 415 pages that would take a while.
I have seen and read a lot of StackOverflow questions on this subject but I don't seem able to modify my script so that it;
Scrapes each page and extracts the 50 items with item IDS per page
Do so in a rate limited way so I don't negatively affect the server
The script waits until all requests are done so I can write each item + item id to a JSON file.
I believe you should be able to do this using request-promise and promise.all but I cannot figure it out.
The actual scraping of the data is fine I just cannot make the code, scrape a page, then go to the next URL with a delay or pause inbetween requests.
Code below is the closest I have got, but I get the same results multiple times and I cannot slow the request rate down.
Example of the page URLS:
http://test.com/itemlist/1
http://test.com/itemlist/2
http://test.com/itemlist/3 etc (upto 415)
for (var i = 1; i <= noPages; i++) {
urls.push({url: itemURL + i});
console.log(itemURL + i);
}
Promise.map(urls, function(obj) {
return rp(obj).then(function(body) {
var $ = cheerio.load(body);
//Some calculations again...
rows = $('table tbody tr');
$(rows).each(function(index, row) {
var children = $(row).children();
var itemName = children.eq(1).text().trim();
var itemID = children.eq(2).text().trim();
var itemObj = {
"id" : itemID,
"name" : itemName
};
itemArray.push(itemObj);
});
return itemArray;
});
},{concurrency : 1}).then(function(results) {
console.log(results);
for (var i = 0; i < results.length; i++) {
// access the result's body via results[i]
//console.log(results[i]);
}
}, function(err) {
// handle all your errors here
console.log(err);
});
Apologies for perhaps misunderstand node.js and its modules, I don't really use the language but I needed to scrape some data and I really don't like python.
since you need requests to be run only one by one Promise.all() would not help.
Recursive promise (I'm not sure if it's correct naming) would.
function fetchAllPages(list) {
if (!list || !list.length) return Promise. resolve(); // trivial exit
var urlToFetch = list.pop();
return fetchPage(urlToFetch).
then(<wrapper that returns Promise will be resolved after delay >).
then(function() {
return fetchAllPages(list); // recursion!
});
}
This code still lacks error handling.
Also I believe it can become much more clear with async/await:
for(let url of urls) {
await fetchAndProcess(url);
await <wrapper around setTimeout>;
}
but you need to find /write your own implementation of fetch() and setTimeout() that are async
After input from #skyboyer suggesting using recursive promises I was lead to a GitHub Gist called Sequential execution of Promises using reduce()
Firstly I created my array of URLS
for (var i = 1; i <= noPages; i++) {
//example urls[0] = "http://test.com/1"
//example urls[1] = "http://test.com/2"
urls.push(itemURL + i);
console.log(itemURL + i);
}
Then
var sequencePromise = urls.reduce(function(promise, url) {
return promise.then(function(results) {
//fetchIDsFromURL async function (it returns a promise in this case)
//when the promise resolves I have my page data
return fetchIDsFromURL(url)
.then(promiseWithDelay(9000))
.then(itemArr => {
results.push(itemArr);
//calling return inside the .then method will make sure the data you want is passed onto the next
return results;
});
});
}, Promise.resolve([]));
// async
function fetchIDsFromURL(url)
{
return new Promise(function(resolve, reject){
request(url, function(err,res, body){
//console.log(body);
var $ = cheerio.load(body);
rows = $('table tbody tr');
$(rows).each(function(index, row) {
var children = $(row).children();
var itemName = children.eq(1).text().trim();
var itemID = children.eq(2).text().trim();
var itemObj = {
"id" : itemID,
"name" : itemName
};
//push the 50 per page scraped items into an array and resolve with
//the array to send the data back from the promise
itemArray.push(itemObj);
});
resolve(itemArray);
});
});
}
//returns a promise that resolves after the timeout
function promiseWithDelay(ms)
{
let timeout = new Promise(function(resolve, reject){
setTimeout(function()
{
clearTimeout(timeout);
resolve();
}, ms);
});
return timeout;
}
Then finally call .then on the sequence of promises, the only issue I had with this was returning multiple arrays inside results with the same data in each, so since all data is the same in each array I just take the first one which has all my parsed items with IDs in it, then I wrote it to a JSON file.
sequencePromise.then(function(results){
var lastResult = results.length;
console.log(results[0]);
writeToFile(results[0]);
});

Node js : execute function with all iteration

Maybe this is a general issue, and i need a solution to my case : due to the non blocking aspect of javascript , I don't find how can I execute my function with all iteration in for loop , and here is my example ,
var text_list=[]
for (var i = 0; i < 10; i++) {
var element = array[index];
tesseract.process("img"+i+".jpg", options, function (err, text) {
if (err) {
return console.log("An error occured: ", err);
}
text_list.push(text)
});
}
console.log(text_list) //
And the result as if I do :
tesseract.process("img"+9+".jpg"...
tesseract.process("img"+9+".jpg"...
tesseract.process("img"+9+".jpg"...
.
.
.
and what i need is :
tesseract.process("img"+0+".jpg"...
tesseract.process("img"+1+".jpg"...
tesseract.process("img"+2+".jpg"...
.
.
.
Your question does not really explain what result you are getting and your code looks like it's missing parts of the code. So, all I can really do here to help is to explain generically (using your code where possible) how to solve this class of problem.
If you are ending up with a lot of results that all reference the last value of i in your loop, then you are probably trying to reference i in an async callback but because the callback is called sometime later, the for loop has already finished long before the callback executes. Thus, your value of i is sitting on the last value it would have in the for loop. But, your question doesn't actually show code that does that, so this is just a guess based on the limited result you describe. To solve that type of issue, you have make sure you're separately keeping track of i for each iteration of the loop. There are many ways to do that. In ES6, using let in the for loop definition will solve that entire issue for you. One can also construct a closure, use .forEach(), etc...
Async operations with a loop require extra work and coding to deal with. The modern solution is to convert your async operations to use promises and then use features such as Promise.all() to both tell you when all the async operations are done and to keep the results in order for you.
You can also code it manually without promises. Here's a manual version:
const len = 10;
let text_list = new Array(10);
let doneCnt = 0;
let errFlag = false;
// using let here so each invocation of the loop gets its own value of i
for (let i = 0; i < len; i++) {
tesseract.process("img"+i+".jpg", options, function (err, text) {
if (err) {
console.log("An error occured: ", err);
// make sure err is wrapped in error object
// so you can tell errors in text_list array from values
if (!(err instanceof Error)) {
err = new Error(err);
}
text_list[i] = err;
errFlag = true;
} else {
text_list[i] = text;
}
// see if we're done with all the requests
if (++doneCnt === len) {
if (errFlag) {
// deal with situation where there were some errors
} else {
// put code here to process finished text_list array
}
}
});
}
// you can't process results here because async operations are not
// done yet when code here runs
Or, using promises, you can make a "promisified" version of tesseract.process() and then use promise functionality to track multiple async operations:
// make promisified version of tesseract.process()
tesseract.processP = function(img, options) {
return new Promise(function(resolve, reject) {
tesseract.process(img, options, function(err, text) {
if (err) {
reject(err)
} else {
resolve(text);
}
});
});
}
const len = 10;
let promises = [];
for (let i = 0; i < len; i++) {
promises.push(tesseract.processP("img"+i+".jpg", options));
}
Promise.all(promises).then(function(results) {
// process results array here (in order)
}).catch(function(err) {
// handle error here
});

Optimising Parse.com object creation

I have a problem with performance on a relatively simple parse.com routine:
We have two classes and we want to make a cross product of them. One class contains a single object of boilerplate for the new objects (description etc.) the other class contains "large" sets (only 1000s of objects) of variable data (name, geopoint etc). Each new object also has some of its own columns not just data from the parents.
To do this, we do a query on the second class and perform an each operation. In each callback, we create and populate our new object and give it pointers to its parents. We call save() on each object (with some then() clauses for retries and error handling) and push the returned promise into an array. Finally we return status inside a when() promise on that array of save promises.
We originally created all the objects and then performed a saveall on them but couldn't get good enough error handling out of it - so we moved to when() with chains of promises with retries.
Trouble is, it's slow. It doesn't feel like the type of thing a nosql database should be slow at so we're blaming our design.
What's the best practice for cloning a bunch of objects from one class to another? or is it possible to get better results from saveAll failures?
My current code looks like this:
var doMakeALoadOfObjects = function(aJob,aCaller) {
Parse.Cloud.useMasterKey();
return aJob.save().then(function(aJob) {
theNumTasksPerLoc = aJob.get("numberOfTasksPerLocation");
if (theNumTasksPerLoc < 1) {
theNumTasksPerLoc = 1;
}
var publicJob = aJob.get("publicJob");
return publicJob.fetch();
}).then(function(publicJob) {
var locationList = aJob.get("locationList");
return locationList.fetch();
}).then(function(locationList) {
publicReadACL = new Parse.ACL();
publicReadACL.setPublicReadAccess(true);
publicReadACL.setRoleReadAccess("Admin",true);
publicReadACL.setRoleWriteAccess("Admin",true);
// Can't create a promise chain inside the loop so use this function.
var taskSaver = function(task) {
return task.save.then(function success(){
numTasksMade++;
},
function errorHandler(theError) {
numTimeOuts++;
numTaskCreateFails++;
logger.log("FAIL: failed to make a task for job " + aJob.get("referenceString") + " Error: " + JSON.stringify(theError));
});
};
var taskSaverWithRetry = function(task) {
return task.save().then(function() {
numTasksMade++;
return Parse.Promise.as();
}, function(error) {
logger.log("makeJobLive: FAIL saving task. Will try again. " + JSON.stringify(error));
numTimeOuts++;
return task.save().then(function() {
numTasksMade++;
return Parse.Promise.as();
}, function(error) {
numTimeOuts++;
numTaskCreateFails++;
logger.log("makeJobLive: FAIL saving task. Give up. " + JSON.stringify(error));
return Parse.Promise.as();
});
})
}
for (var j = 0; j < theNumTasksPerLoc; j++) {
var Task = Parse.Object.extend("Task");
var task = new Task();
task.set("column",stuff);
// Can't create a promise chain in the loop so use the function above.
taskSaverArray.push(taskSaverWithRetry(task));
}
return Parse.Promise.when(taskSaverArray);
}).then(function() {
}).then(function() {
// happy happy
},function(error){
// we never land here.
});
}
..."looks" like because I've deleted a lot of the object creation code and some housekeeping we do at the same time. I may have deleted some variable definitions too so I doubt this would run as is.

Building a dynamic array of functions for Q.all() in Jscript

I'm trying to to pass a variable number of functions into Q.all()
It works fine if I code the array manually - however I want to build it up in a loop as the system wont know how many times to call the function until runtime - and needs to pass a different ID into it for each AJAX call.
I've tried various methods with no success (e.g. array[i] = function() {func}) - I guess eval() could be a last resort.
Any help would be massively helpful.
// Obviously this array loop wont work as it just executes the functions in the loop
// but the idea is to build up an array of functions to pass into Q
var arrayOfFunctions = [];
for(var i in NumberOfPets) {
arrayOfFunctions[i] = UpdatePets(i);
}
// Execute sequence of Ajax calls
Q.try(CreatePolicy)
.then(updateCustomer)
.then(function() {
// This doesn't work - Q just ignores it
return Q.all(arrayOfFunctions)
// This code below works fine (waits for all pets to be updated) - I am passing in the ID of the pet to be updated
// - But how can I create and pass in a dynamic array of functions to achieve this?
// return Q.all([UpdatePets(1), UpdatePets(2), UpdatePets(3), UpdatePets(4), UpdatePets(5), UpdatePets(5)]);
})
.then(function() {
// do something
})
.catch(function (error) {
// error handling
})
.done();
Thanks in advance.
Q.all doesn't expect an array of functions, but an array of promises. Use
Q.try(CreatePolicy)
.then(updateCustomer)
.then(function() {
var arrayOfPromises = [];
var numberOfPets = pets.length;
for (var i=0; i<numberOfPets; i++)
arrayOfPromises[i] = updatePet(pets[i], i); // or something
return Q.all(arrayOfPromises)
})
.then(function() {
// do something
})
.catch(function (error) {
// error handling
});

Categories

Resources