I have found that write() method of stream.Writable class does not write data sequentially. When I an sending am attachment to the server in chunks, this code assembles data chunks in wrong order if no delay occurs. If I put a debug message like console.log() in the middle of the loop (like to dump the data to watch what is being written, actually), this bug disappears. So, what is the race condition in this code ? Looks like I am enforcing a sequential assembling of the file, so I do not understand what is wrong.
My code:
function join_chunks(company_id,attachment_id,num_chunks) {
var stream;
var file;
var output_filename=ATTACHMENTS_PATH + '/comp' + company_id + '/' + attachment_id + '.data';
var input_filename;
var chunk_data;
var chunk_count=0;
stream=fs.createWriteStream(output_filename,{flags:'w+',mode: 0666});
console.log('joining files:');
for(var i=0;i<num_chunks;i++) {
input_filename=ATTACHMENTS_PATH + '/comp' + company_id + '/' + attachment_id + '-' + (i+1) + '.chunk';
console.log(input_filename);
fs.readFile(input_filename , (err, chunk_data) => {
if (err) throw err;
stream.write(chunk_data,function() {
chunk_count++;
if (chunk_count==num_chunks) {
console.log('join finished. closing stream');
stream.end();
}
});
});
}
}
The console:
joining files:
/home/attachments/comp-2084830518/67-1.chunk
/home/attachments/comp-2084830518/67-2.chunk
/home/attachments/comp-2084830518/67-3.chunk
/home/attachments/comp-2084830518/67-4.chunk
join finished. closing stream
Node version: v6.9.2
stream.write is an asynchronous operation. This means that multiple calls to it may be serviced out of order.
If you want your writes to happen in order, use stream.writeSync, or use the callback argument to stream.write to sequence your writes.
Related
I have a function that generates a bunch of data into a DB for testing, then at the end sends a textual response to the browser. I'm new to node and see the response and text is printing before any of the execution has even completed.
I've tried a bunch of different ways to see sequential execution, below is one of the crazy poor ways, but still not working. What can I do to ensure the for-loop data completes, c.end() executes and then DONE is called?
//Generate the new data to persist into the database
function data_generation(call_back) {
//Force generate random entries into the test DB (eventually change to monitor individual key requests)
for (var i = 0; i < NUM_ENTRIES; i++) {
c.query("INSERT INTO individual_key_log \
(reference_key, access_count, last_updated, last_ip) \
VALUES ('Johnson" + Math.random()*100 + "', " + (Math.random()*100) +", '"+Date.now()+"', '"+req.connection.remoteAddress+"'\
);",
function(err, rows) {
if (err) {
throw err;
}
else {
console.log("Counter: " + counter);
counter++;
}
});
call_back();
}
};
function completion_func() {
console.log("DONE");
res.send("done");
};
data_generation(completion_func);
I got a pretty awkward problem.
I create a pool, connect to the database, create a connection and query, get the results, do a bunch of stuff, then I have to create another connection and query, but actually it has to be dynamically so I loop over my Array teacherHours containing the data.
Then more Code is happening, and I have to create an extra loop, because certain elements of my teacherHours Array have to try multiple times to get the correct response from the upcoming query.
So another loop follows, which is supposed to loop as long as availableHours > 0. Now, here is where it all goes left.
A buch of code happens inside the second loop, I prepare my second query, call connection.query() and inside of the callback function I prepare my third query (after doing some other stuff) and this is actually where Node kicks me out.
It gives me TypeError: Cannot read property 'tid' of undefined. tid needs to be accessed for my third query, so I try to access it just like I did before but Node doesn't allow it.
I know that the queries return useful data (rows) so it can't be a problem of querying but receiving no data. Actually I console.log("the rowRIDS"+rowRIDS); the result of the second query and I see that it returns 2 rows and just after that it gives me the error.
What is also strange to me, all the console.logs inside my my two loops are being logged, and my console.log of the second query (containing the 2 rows) are being logged after the loops ran through, since the queries are nested shouldn't the returned 2 rows and the error appear within the first iteration of the loop, since the code should access the second query at that point.
BTW, I've tried to set a hardcoded number instead of the tid just to get the next property datum to be an error. I kind of got a feeling as if the variable teacherHours is out of scope, but it is supposed to be a global variable.
To get a better feeling of what I'm talking about I copied the code and uncommented all the javascript code, where I populate and calculate stuff. Any help would be really great, its been almost 7 hours of try & error without any luck. Thank You!
pool.getConnection(function(err, connection){
if (err) throw err;
connection.query('SELECT * FROM teachers_teaching_tbl WHERE fwdid = 1 ', function(err, rows, fields) {
if (err) {
console.error('error querying: ' + err.stack);
return;
}
rowArray=rows;
console.log(rowArray);
//
// HERE HAPPENS
// A LOOOOT OF STUFF
//
// AND teacherHours IS BEING POPULATED
//
// THEN COMES A FOR LOOP
for(var i=0; i<teacherHours.length;i++){
//
// MORE STUFF
//
//AND ANOTHER LOOP
while(availableHours>0){//do{ ORIGINALLY I TRIED WITH A DO WHILE LOOP
//
// AGAIN A BUNCH OF STUFF
//
// NOW I'M PREPARING MY NEXT QUERY
//
var myQueryGetFreeRoom=" SELECT rms.rid FROM rooms_tbl as rms WHERE NOT EXISTS ( ";
myQueryGetFreeRoom+=" SELECT NULL FROM classes_tbl as cls ";
myQueryGetFreeRoom+=" WHERE ( (cls.bis > '"+bisMinus1+"' AND cls.bis <= '"+realBis+"' ) OR ( cls.von > '"+bisMinus1+"' AND cls.von < '"+realBis+"' ) ) AND (cls.datum = '"+teacherHours[i].datum.getFullYear()+"-"+(teacherHours[i].datum.getMonth()+1)+"-"+teacherHours[i].datum.getDate()+"') AND (cls.rid=rms.rid) ) ";
//
//
connection.query(myQueryGetFreeRoom, function(err, rowRIDS, fields) {
if (err) {
console.error('error querying: ' + err.stack);
return;
}
roomIDs=rowRIDS;
console.log("the rowRIDS"+rowRIDS);
//
// MORE STUFF
// HAPPENING
//
if(roomIDs.length>0){
//
// PREPARING QUERY NO.3 - WHICH IS WHERE MY ERROR POINTS - TO THE USE OF tid PROPERTY
//
var myQueryBookClass = " INSERT INTO classes_tbl ( rid , tid , belegtAnz, datum, von , bis ) ";
myQueryBookClass+=" VALUES ( "+Math.floor(Math.random() * roomIDs.length)+", "+teacherHours[i].tid+" , 0, '"+teacherHours[i].datum.getFullYear()+"-"+(teacherHours[i].datum.getMonth()+1)+"-"+teacherHours[i].datum.getDate()+"' , '"+bisMinus1+"' , '"+realBis+"' ) ";
console.log("myQueryBookClass: "+myQueryBookClass);
availableHours = 0;
//
// HERE WAS SUPPOSED TO FOLLOW QUERY 3 - myQueryBookClass
//
// BUT SINCE I DONT EVEN GET INSIDE HERE IT IS IN COMMENTS
//
/*connection.query(myQueryBookClass, function(err, insertRows, fields){
if(err){
console.error('error querying: '+err.stack);
return;
}
console.log("Inserted Rows: "+ insertRows);
}); */
} else {
availableHours= availableHours - 1;
//
// STUFF HAPPENING
//
}
});
availableHours= availableHours - 1;
}//while(availableHours>0);
//
}
connection.release(function(err){
if (err){
console.error('error disconnecting: ' + err.stack);
return;
}
});
});
});
I think you are coming from a non-async language like Python, Java, etc. which is why Node, i.e. JavaScript, seems to screw things up for you, but actually it isn't.
The problem you have in your code is that you execute async functions like query synchronously all at the same time in the same while loop. You need to use a module like async which helps to run and collect results asynchronously.
Here is the updated code.
var async = require('async'),
connection;
async.waterfall([
function (cb) {
pool.getConnection(cb);
},
function (conn, cb) {
connection = conn;
connection.query('SELECT * FROM teachers_teaching_tbl WHERE fwdid = 1', cb);
},
function (rows, fields, cb) {
rowArray = rows;
console.log(rowArray);
// HERE HAPPENS
// A LOOOOT OF STUFF
//
// AND teacherHours IS BEING POPULATED
//
// THEN COMES A FOR LOOP
async.eachSeries(teacherHours, function (teacherHour, done) {
// MORE STUFF
//
//AND ANOTHER LOOP
async.whilst(function () {
return availableHours > 0;
}, function (cb) {
// AGAIN A BUNCH OF STUFF
//
// NOW I'M PREPARING MY NEXT QUERY
//
var myQueryGetFreeRoom =
"SELECT rms.rid FROM rooms_tbl AS rms WHERE NOT EXISTS ("
+ "SELECT NULL FROM classes_tbl AS cls"
+ " WHERE ("
+ "(cls.bis > '" + bisMinus1 + "' AND cls.bis <= '" + realBis + "')"
+ " OR (cls.von > '" + bisMinus1 + "' AND cls.von < '" + realBis + "')"
+ ") AND ("
+ "cls.datum = '" + teacherHour.datum.getFullYear() + "-" + (teacherHour.datum.getMonth() + 1) + "-" + teacherHour.datum.getDate() + "'"
+ ") AND cls.rid = rms.rid";
async.waterfall([
function (cb) {
connection.query(myQueryGetFreeRoom, cb);
},
function(rowRIDS, fields, cb) {
roomIDs = rowRIDS;
console.log("the rowRIDS" + rowRIDS);
//
// MORE STUFF
// HAPPENING
//
if (roomIDs.length > 0) {
//
// PREPARING QUERY NO.3 - WHICH IS WHERE MY ERROR POINTS - TO THE USE OF tid PROPERTY
//
var myQueryBookClass = "INSERT INTO classes_tbl (rid, tid, belegtAnz, datum, von, bis) VALUES ("
+ Math.floor(Math.random() * roomIDs.length)
+ ", " + teacherHour.tid
+ ", 0, '" + teacherHour.datum.getFullYear() + "-" + (teacherHour.datum.getMonth() + 1) + "-" + teacherHour.datum.getDate() + "', '" + bisMinus1 + "', '" + realBis + "')";
console.log("myQueryBookClass: " + myQueryBookClass);
availableHours = 0;
//
// HERE WAS SUPPOSED TO FOLLOW QUERY 3 - myQueryBookClass
//
// BUT SINCE I DONT EVEN GET INSIDE HERE IT IS IN COMMENTS
//
connection.query(myQueryBookClass, function (err, insertRows, fields) {
if (err) {
console.error('error querying: '+err.stack);
return;
}
console.log("Inserted Rows: "+ insertRows);
// Here do whatever you need to do, then call the callback;
cb();
});
} else {
--availableHours;
//
// STUFF HAPPENING
//
cb();
}
}
], function (err) {
if (!err) {
// Notice that you are decrementing the `availableHours` twice here and above.
// Make sure this is what you want.
--availableHours;
}
cb(err);
});
}, done);
}, function (err) {
connection.release(function (err) {
if (err) {
console.error('error disconnecting: ' + err.stack);
return;
}
});
});
}
], function (err) {
conn && pool.release(conn);
err && throw err;
});
Next time please format your code properly for better readability which will help yourself to get answers faster, and split your question text into paragraphs for the same purpose.
Explanation
There are four nested async flows:
async.waterfall
-> async.eachSeries
-> async.whilst
-> async.waterfall
Basically, the async.waterfall library allows you to execute a list of functions in series.
Every next function will be executed only after the previous function has returned the response.
To indicate that a function is completed and the results are available, it has to call the callback, in our case it is cb (you can call it whatever you like, eg. callback). The rule is to call it, otherwise, the next function will never be executed because the previous one doesn't seem to have finished its work.
Once the previous function has completed, it calls the provided cb with the following signature:
cb(err, connection);
If there was an error while requesting for a connection, the entire async.waterfall will interrupt and the final callback function will be executed.
If there was no error, the connection will be provided as a second argument. async module passes all arguments of the previous function to the next function as the first, second, etc. arguments which is why the second function receives the conn as the first argument.
Every next function will receive the callback cb as the last argument, which you must eventually call when the job is done.
Thus, in the first async.waterfall flow:
It requests a new database connection.
Once the connection is available, the next function is executed which sends a query to the database.
Waits for the query results, then once the results are available, it is ready to run the next function which iterates over each row.
async.eachSeries allows to iterate over a gives array of values sequentially.
In the second async.eachSeries flow:
It iterates over each element in the teacherHours array sequentially.
Once each element is processed (however you want), you must call the done callback. Again, you could have called this as cb like in the previous async.waterfall or callback. done is just for clarity that the process is done.
Then we have the async.whilst which provides the same logic as the normal while () {} statement but handles the loop asynchronously.
In this third async.whilst flow:
Calls the first function. Its return value indicates whether it has to continue the loop, i.e. call the second asynchronous function.
If the return value is truthful (availableHours > 0), then the second function is called.
When the async function is done, it must call the provided callback cb to indicate that it is over. Then async module will call the first function to check if it has to continue the loop.
In this asynchronous function inside async.whilst we have another async.waterfall because you need to send queries to the database for each teacherHour.
In this last fourth async.watercall flow:
It sends the SELECT query to the database.
Waits for the response. Once the rowRIDS are available, it calls the second function in the waterfall.
If there are rowRIDS (roomIDs.length > 0), it sends the INSERT query to the database.
Once done, it calls the callback cb.
If there were no rowRIDs, it calls the callback cb, too, to indicate that the job is done.
It is a great thing that JavaScript is asynchronous. It might be difficult at the beginning when you convert from other synchronous languages, but once you get the idea, it will be hard to thing synchronously. It becomes so intuitive that you will start thinking why other languages don't work asynchronous.
I hope I could explain the above code thoroughly. Enjoy JavaScript! It's awesome!
I have a little FTP script which basically transfer an entire directory tree (by walking it with fs.readdir) to an FTP server one file at a time (I have to do some analysis on each file as it's uploaded hence the one-at-a-time behaviour).
However, the bit that does a single file (there's another bit for directories which uses c.mkdir rather than c.put) looks like this:
console.log('Transferring [' + ival + ']');
var c = new Ftp();
c.on('ready', function() {
c.put(ival, ival, function(err) {
console.log(err);
});
c.end();
});
As you can see, it's using a very simple method of logging in that failures simply get sent to the console.
Unfortunately, since the FTPs are done asynchronously, errors are being delivered to the console in a sequence totally unrelated to the file name output.
Is there a way to force the FTP to be done synchronously so that errors would immediately follow the file name? Basically, I want the entire sequence from the initial console.log to the final }); to be done before moving on to the next file.
Even if there is, it's not recommended. You generally don't want to block the event loop with such a long synchronous operation.
What would probably be more useful is using recursion or Promises to ensure that things happen in a sequence.
Example:
let ivals = [/* lots of ivals here */];
function putItems(ivals) {
let ival = ivals[0];
console.log('Transferring [' + ival + ']');
var c = new Ftp();
c.on('ready', function() {
c.put(ival, ival, function(err) {
console.log(err);
c.end();
// Don't continue if we're out of items.
if (ivals.length === 1) { return; }
putItems(ivals.slice(1)); // Call again with the rest of the items.
});
});
}
putItems(ivals);
It can probably be done more intelligently by using a nested function and a single FTP context. But you get the point.
Without making things synchronous, you can solve your error logging problem by just logging the name with the error. You can just wrap this in a closure so you can keep track of ival that goes with a particular error:
(function(ival) {
console.log('Transferring [' + ival + ']');
var c = new Ftp();
c.on('ready', function() {
c.put(ival, ival, function(err) {
console.log('[' + ival + ']', err);
});
c.end();
});
})(ival);
Why dont you just push the errors to an array, and when all uploads are done, you will have that array
with all those errors in order ?
I will do something like this:
var errArray = [];
console.log('Transferring [' + ival + ']');
var c = new Ftp();
c.on('ready', function() {
c.put(ival, ival, function(err) {
errArray.push( err );
});
c.end();
});
c.on('end', function() {
errArray.forEach( function( err ){
console.log( err );
})
});
Running the code below to to download and unzip files. It works as intended when I try with one but when I do multiple at the same time I get the following error:
Error: incorrect header check at Zlib._handle.onerror
var downloadUnzipFile = function (mID) {
try {
// Read File
console.log("Started download/unzip of merchant: " + mID + " # " + new Date().format('H:i:s').toString());
request(linkConst(mID))
// Un-Gzip
.pipe(zlib.createGunzip())
// Write File
.pipe(fs.createWriteStream(fileName(mID)))
.on('error', function (err) {
console.error(err);
})
.on('finish', function() {
console.log("CSV created: " + fileName(mID));
console.log("Completed merchant: " + mID + " # " + new Date().format('H:i:s').toString());
//console.log("Parsing CSV...");
//csvReader(fileName);
});
} catch (e) {
console.error(e);
}
}
module.exports = function(sMerchants) {
var oMerchants = JSON.parse(JSON.stringify(sMerchants));
oMerchants.forEach(function eachMerchant(merchant) {
downloadUnzipFile(merchant.merchant_aw_id);
})
};
Any ideas?
Thanks
EDIT:
To clarify, i'd like to run through each item (merchant) in the array (merchants) and download a file + unzip it. The way I currently do it means it this downloading/zipping occurs at the sametime (which I think might be causing the error). When i remove the foreach loop and just try to download/zip one merchant the code works.
Yeah, as you suggest, it's likely that if you try to unzip too many files concurrently, you will run out of memory. Because you are handling streams, the unzip operations are asynchronous, meaning your forEach loop will continue to be called before each unzip operation completes. There are plenty of node packages that allow you to handle async operations so you can run the unzip function sequentially, but the simplest approach might just be to use a recursive function call. E.g.:
var downloadUnzipFile = function (mID) {
try {
// Read File
console.log("Started download/unzip of merchant: " + mID + " # " + new Date().format('H:i:s').toString());
return request(linkConst(mID))
// Un-Gzip
.pipe(zlib.createGunzip())
// Write File
.pipe(fs.createWriteStream(fileName(mID)))
} catch (e) {
console.log(e);
return false;
}
}
module.exports = function(sMerchants) {
var merchants = JSON.parse(JSON.stringify(sMerchants)),
count = 0;
downloadUnzipFile(merchants[count][merchant_aw_id])
.on('error', function(err){
console.log(err);
// continue unzipping files, even if you encounter an error. You can also remove these lines if you want the script to exit.
if(merchants[++count]){
downloadUnzipFile(merchants[count][merchant_aw_id]);
}
})
.on('finish', function() {
if(merchants[++count]){
downloadUnzipFile(merchants[count][merchant_aw_id]);
}
});
};
Haven't tested, of course. The main idea should work thought: call downloadUnzipFile recursively whenever the previous call errors out or finishes, as long as there are still items in the merchants array.
I am trying to use imagemagick to resize my images, then passing it off to image compression tools to compress. I am trying to utilize pngquant`
Here is a snippet of my code:
// Rename and move original file
if (fs.existsSync(tmpPath + '/' + file)) {
fs.renameSync(tmpPath + '/' + file, filePath + '/' + fileName);
// Create new versions of each file
Object.keys(geddy.config.uploader.imageVersions).forEach(function (version) {
console.log(version);
counter += 1;
var opts = geddy.config.uploader.imageVersions[version];
console.log(fileName);
imageMagick.resize({
width: opts.width,
srcPath: filePath + '/' + fileName,
dstPath: filePath + '/' + fileName.replace(/\.[^/.]+$/, "") + '_' + version + fileType
}, finish(filePath + '/' + fileName.replace(/\.[^/.]+$/, "") + '_' + version + fileType));
});
} else {
console.log('Unable to find tmp file!');
}
Then here is my callback finish:
finish = function (file) {
execFile(pngquantPath, ['256', '--force', file], function(err, data) {
console.log(err, data);
});
};
However, every time pngquant it is saying there is no file found. If I take the parameter file and go into the shell, run pngquant file, it runs the process. So I am assuming it is an asynchronous issue (files not there, it tries to run the process).
This is the error I end up with in the console:
{ [Error: spawn EACCES] code: 'EACCES', errno: 'EACCES', syscall: 'spawn' } ''
Any help is appreciated.
imageMagick.resize() expects two arguments: an options object and a callback.
For the second argument, you are not passing finish as your callback. Instead, you are invoking finish and passing whatever finish returns (probably undefined) as the callback argument.
Because you are invoking finish at the same time you make the resize() call, it is running immediately, before ImageMagick can complete its resize operation.
Try changing the resize() line to something like this:
imageMagick.resize({
// (your resize options)
},
function(err) {
if (err) { return console.error(err); } // whatever error handling you need
var thePath = filePath + '/' + fileName.replace(/\.[^/.]+$/, "") + '_' +
version + fileType;
finish(thePath);
});
Now you’re passing a callback function as the second argument, and that function will not be invoked until imageMagick.resize() is done. And of course it has the signature of all Node callbacks, which is that the first argument to the callback is err. When it’s invoked, the callback does some error-checking and then calls finish.