Use Asynchronous IO better - javascript

I am really new to JS, and even newer to node.js. So using "traditional" programming paradigms my file looks like this:
var d = require('babyparse');
var fs = require('fs');
var file = fs.readFile('SkuDetail.txt');
d.parse(file);
So this has many problems:
It's not asynchronous
My file is bigger than the default max file size (this one is about 60mb) so it currently breaks (not 100% sure if that's the reason).
My question: how do I load a big file (and this will be significantly bigger than 60mb for future uses) asynchronously, parsing as I get information. Then as a followup, how do I know when everything is completed?

You should create a ReadStream. A common pattern looks like this. You can parse data as it gets available on the data event.
function readFile(filePath, done) {
var
stream = fs.createReadStream(filePath),
out = '';
// Make done optional
done = done || function(err) { if(err) throw err; };
stream.on('data', function(data) {
// Parse data
out += data;
});
stream.on('end', function(){
done(null, out); // All data is read
});
stream.on('error', function(err) {
done(err);
});
}
You can use the method like:
readFile('SkuDetail.txt', function(err, out) {
// Handle error
if(err) throw err;
// File has been read and parsed
}
If you add the parsed data to the out variable the entire parsed file will be sent to the done callback.

It already is asynchronous, javascript is asynchronous no extra effort is needed from your part. Does your code even work though? I think your parse should be inside a callback of read. Otherwise readfile is skipped and file is null.
In normal situations any io code you write will be "skipped" and the code after it which may be more direct will be executed first.

For the first question since you want to process chunks, Streams might be what you are looking for. #pstenstrm has an example in his answer.
Also, you can check this Node.js documentation link for Streams: https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options
If you want an brief description and example for Streams check this link: http://www.sitepoint.com/basics-node-js-streams/
You can pass a callback to the fs.readFile function to process the content once the file read is complete. This would answer your second question.
fs.readFile('SkuDetail.txt', function(err, data){
if(err){
throw err;
}
processFile(data);
});
You can see Get data from fs.readFile for more details.
Also, you could use Promises for cleaner code with other added benefits. Check this link: http://promise-nuggets.github.io/articles/03-power-of-then-sync-processing.html

Related

JavaScript Why is some code getting executed before the rest?

I've mostly learned coding with OOPs like Java.
I have a personal project where I want to import a bunch of plaintext into a mongodb. I thought I'd try to expand my horizons and do this with using node.js powered JavaScript.
I got the code working fine but I'm trying to figure out why it is executing the way it is.
The output from the console is:
1. done reading file
2. closing db
3. record inserted (n times)
var fs = require('fs'),
readline = require('readline'),
instream = fs.createReadStream(config.file),
outstream = new (require('stream'))(),
rl = readline.createInterface(instream, outstream);
rl.on('line', function (line) {
var split = line.split(" ");
_user = "#" + split[0];
_text = "'" + split[1] + "'";
_addedBy = config._addedBy;
_dateAdded = new Date().toISOString();
quoteObj = { user : _user , text : _text , addedby : _addedBy, dateadded : _dateAdded};
db.collection("quotes").insertOne(quoteObj, function(err, res) {
if (err) throw err;
console.log("record inserted.");
});
});
rl.on('close', function (line) {
console.log('done reading file.');
console.log('closing db.')
db.close();
});
(full code is here: https://github.com/HansHovanitz/Import-Stuff/blob/master/importStuff.js)
When I run it I get the message 'done reading file' and 'closing db' and then all of the 'record inserted' messages. Why is that happening? Is it because of the delay in inserting a record in the db? The fact that I see 'closing db' first makes me think that the db would be getting closed and then how are the records being inserted still?
Just curious to know why the program is executing in this order for my own peace of mind. Thanks for any insight!
In short, it's because of asynchronous nature of I/O operations in the used functions - which is quite common for Node.js.
Here's what happens. First, the script reads all the lines of the file, and for each line initiates db.insertOne() operation, supplying a callback for each of them. Note that the callback will be called when the corresponding operation is finished, not in the middle of this process.
Eventually the script reaches the end of the input file, logs two messages, then invokes db.close() line. Note that even though 'insert' callbacks (that log 'inserted' message) are not called yet, the database interface has already received all the 'insert' commands.
Now the tricky part: whether or not DB interface succeeds to store all the DB records (in other words, whether or not it'll wait until all the insert operations are completed before closing the connection) is up both to DB interface and its speed. If write op is fast enough (faster than reading the file line), you'll probably end up with all the records been inserted; if not, you can miss some of them. That's why it's a safest bet to close the connection to database not in the file close (when the reading is complete), but in insert callbacks (when the writing is complete):
let linesCount = 0;
let eofReached = false;
rl.on('line', function (line) {
++linesCount;
// parsing skipped for brevity
db.collection("quotes").insertOne(quoteObj, function(err, res) {
--linesCount;
if (linesCount === 0 && eofReached) {
db.close();
console.log('database close');
}
// the rest skipped
});
});
rl.on('close', function() {
console.log('reading complete');
eofReached = true;
});
This question describes the similar problem - and several different approaches to solve it.
Welcome to the world of asynchronicity. Inserting into the DB happens asynchronously. This means that the rest of your (synchronous) code will execute completely before this task is complete. Consider the simplest asynchronous JS function setTimeout. It takes two arguments, a function and a time (in ms) after which to execute the function. In the example below "hello!" will log before "set timeout executed" is logged, even though the time is set to 0. Crazy right? That's because setTimeout is asynchronous.
This is one of the fundamental concepts of JS and it's going to come up all the time, so watch out!
setTimeout(() => {
console.log("set timeout executed")
}, 0)
console.log("hello!")
When you call db.collection("quotes").insertOne you're actually creating an asynchronous request to the database, a good way to determine if a code will be asynchronous or not is if one (or more) of its parameters is a callback.
So the order you're running it is actually expected:
You instantiate rl
You bind your event handlers to rl
Your stream starts processing & calling your 'line' handler
Your 'line' handler opens asynchronous requests
Your stream ends and rl closes
...
4.5. Your asynchronous requests return and execute their callbacks
I labelled the callback execution as 4.5 because technically your requests can return at anytime after step 4.
I hope this is a useful explanation, most modern javascript relies heavily on asynchronous events and it can be a little tricky to figure out how to work with them!
You're on the right track. The key is that the database calls are asychronous. As the file is being read, it starts a bunch of async calls to the database. Since they are asynchronous, the program doesn't wait for them to complete at the time they are called. The file then closes. As the async calls complete, your callbacks runs and the console.logs execute.
Your code reads lines and immediately after that makes a call to the db - both asynchronous processes. When the last line is read the last request to the db is made and it takes some time for this request to be processed and the callback of the insertOne to be executed. Meanwhile the r1 has done it's job and triggers the close event.

Meteor synchronous and asynchronous call to read a file

I am new to Meteor. I am using following code to read a file stored at server.
Client side
Meteor.call('parseFile', (err, res) => {
if (err) {
alert(err);
} else {
Session.set("result0",res[0]);
Session.set("result1",res[1]);
Session.set("result2",res[2]);
}
});
let longitude = Session.get("result0");
let latitude = Session.get("result1");
var buildingData = Session.get("result2");
Server Side
Meteor.methods({
'parseFile'() {
var csv = Assets.getText('buildingData.csv');
var rows = Papa.parse(csv).data;
return rows;
}
})
The problem is while I make a call it takes time to send the result back and hence wherever i am using latitude and longitude its giving undefined and page breaks. So, is there any solution to avoid this problem. One of the solution can be to make a synchronous call and wait for result to be returned.
You can make the server method run synchronously using the futures package, which should force the client to wait for the method to complete.
It might look something like this:
Meteor.methods({
'parseFile'() {
var future = new Future();
var csv = Assets.getText('buildingData.csv');
var rows = Papa.parse(csv).data;
future.return(rows);
future.wait();
}
});
This would require you installing the futures package linked above and setting up your includes properly in file containing your Meteor.methods() definitions. You might also look into good error handling inside your method.
UPDATE:
The link to the Future package is an NPM package, which you can read about here. The link above is to the atmosphere package, which looks like an old wrapper package.

Meteor synchronous call to function

Good Day.
Been wracking the 'ol noggin for a way to solve this.
In a nutshell, I have a form that has a number of text inputs as well as an input file element to upload said file to AWS S3 (via lepozepo:s3 ver5.1.4 package). The nice thing about this package is that it does not need the server, thus keeping resources in check.
This S3 package uploads the file to my configured bucket and returns the URL to access the image among a few other data points.
So, back to the form. I need to put the AWS URL returned into the database along with the other form data. HOWEVER, the S3 call takes more time than what the app waits for since it is async, thus the field within my post to Meteor.call() is undefined only because it hasn't waited long enough to get the AWS URL.
I could solve this by putting the Meteor.call() right into the callback of the S3 call. However, I was hoping to avoid that as I'd much rather have the S3 upload be its own Module or helper function or even a function outside of any helpers as it could be reused in other areas of the app for file uploads.
Psudo-code:
Template.contacts.events({
'submit #updateContact': function(e,template){
s3.upload({file:inputFile, path:client},function(error,result){
if(error){
// throw error
}else{
var uploadInfo = result;
}
});
formInfo = {name:$('[name=name]').val(),file:uploadInfo}; // <= file is undefined because S3 hasn't finished yet
Meteor.call('serverMethod',formInfo, function(e,r){
if(e){
// throw error message
}else{
// show success message
}
});
});
I could put the formInfo and the Meteor.call() in the s3 callback, but that would result in more complex code and less code reuse where IMO this is a perfect place for code reuse.
I've tried wrapping the s3 in it's own function with and without a callback. I've tried using reactiveVars. I would think that updating the db another time with just the s3 file info would make the s3 abstraction more complex as it'd need to know the _id and such...
Any ideas?
Thanks.
If you are using javascript it's best to embrace callbacks!
What is it about using callbacks like this that you do not like, or believe is modular or reusable?
As shown below the uploader function does nothing but wrap s3.upload. But you mention this is psudeocode, so I presume that you left out logic you want included in the modular call to s3.upload (include it here), but decouple the logic around handling the response (pass in a callback).
uploader = function(s3_options, cb) {
s3.upload(s3_options, function(error,result){
if(error){
cb(error);
}else{
cb(null, result);
}
});
};
Template.contacts.events({
'submit #updateContact': function(e,template){
cb = function(error, uploadInfo) {
formInfo = {name:$('[name=name]').val(),file:uploadInfo};
Meteor.call('serverMethod',formInfo, function(e,r){
if(e){
// throw error message
}else{
// show success message
}
});
uploader({file:inputFile, path:client}, cb); // you don't show where `inputFile` or `client` come from
}
});

createReadStream().pipe() Callback

Sorry in advance, I have a couple of questions on createReadStream() here.
Basically what I'm doing is dynamically building a file and streaming it to the user using fs once it is finished. I'm using .pipe() to make sure I'm throttling correctly (stop reading if buffer's full, start again once it's not, etc.) Here's a sample of my code I have so far.
http.createServer(function(req, res) {
var stream = fs.createReadStream('<filepath>/example.pdf', {bufferSize: 64 * 1024})
stream.pipe(res);
}).listen(3002, function() {
console.log('Server listening on port 3002')
})
I've read in another StackOverflow question (sorry, lost it) that if you're using the regular res.send() and res.end() that .pipe() works great, as it calls the .send and .end and adds throttling.
That works fine for most cases, except I'm wanting to remove the file once the stream is complete and not using .pipe() means I'm going to have to handle throttling myself just to get a callback.
So I'm guessing that I'll want to create my own fake "res" object that has a .send() and .end() method that does what the res usually does, however on the .end() I'll put additional code to clean up the generated file. My question is basically how would I pull that off?
Help with this would be much appreciated, thanks!
The first part about downloading can be answered by Download file from NodeJS Server.
As for removing the file after it has all been sent, you can just add your own event handler to remove the file once everything has been sent.
var stream = fs.createReadStream('<filepath>/example.pdf', {bufferSize: 64 * 1024})
stream.pipe(res);
var had_error = false;
stream.on('error', function(err){
had_error = true;
});
stream.on('close', function(){
if (!had_error) fs.unlink('<filepath>/example.pdf');
});
The error handler isn't 100% needed, but then you don't delete the file if there was an error while you were trying to send it.

How do I read a file in Node.js?

In Node.js, I want to read a file, and then console.log() each line of the file separated by \n. How can I do that?
Try this:
var fs=require('fs');
fs.readFile('/path/to/file','utf8', function (err, data) {
if (err) throw err;
var arr=data.split('\n');
arr.forEach(function(v){
console.log(v);
});
});
Try reading the fs module documentation.
Please refer to the File System API's in node.js, there is also few similar questions on SO, there is one of them
There are many ways to read a file in Node. You can learn about all of them in the Node documentation about the File System module, fs.
In your case, let's assume that you want to read a simple text file, countries.txt that looks like this;
Uruguay
Chile
Argentina
New Zealand
First you have to require() the fs module at the top of your JavaScript file, like this;
var fs = require('fs');
Then to read your file with it, you can use the fs.readFile() method, like this;
fs.readFile('countries.txt','utf8', function (err, data) {});
Now, inside the {}, you can interact with the results of the readFile method. If there was an error, the results will be stored in the err variable, otherwise, the results will be stored in the data variable. You can log the data variable here to see what you're working with;
fs.readFile('countries.txt','utf8', function (err, data) {
console.log(data);
});
If you did this right, you should get the exact contents of the text file in your terminal;
Uruguay
Chile
Argentina
New Zealand
I think that's what you want. Your input was separated by newlines (\n), and the output will be as well since readFile doesn't change the contents of the file. If you want, you can make changes to the file before logging the results;
fs.readFile('calendar.txt','utf8', function (err, data) {
// Split each line of the file into an array
var lines=data.split('\n');
// Log each line separately, including a newline
lines.forEach(function(line){
console.log(line, '\n');
});
});
That will add an extra newline between each line;
Uruguay
Chile
Argentina
New Zealand
You should also account for any possible errors that happen while reading the file by adding if (err) throw err on the line right before you first access data. You can put all of that code together in a script called read.js like this;
var fs = require('fs');
fs.readFile('calendar.txt','utf8', function (err, data) {
if (err) throw err;
// Split each line of the file into an array
var lines=data.split('\n');
// Log each line separately, including a newline
lines.forEach(function(line){
console.log(line, '\n');
});
});
You can then run that script in your Terminal. Navigate to the directory that contains both countries.txt and read.js, and then type node read.js and hit enter. You should see the results logged out on the screen. Congratulations! You've read a file with Node!

Categories

Resources