JavaScript Why is some code getting executed before the rest? - javascript

I've mostly learned coding with OOPs like Java.
I have a personal project where I want to import a bunch of plaintext into a mongodb. I thought I'd try to expand my horizons and do this with using node.js powered JavaScript.
I got the code working fine but I'm trying to figure out why it is executing the way it is.
The output from the console is:
1. done reading file
2. closing db
3. record inserted (n times)
var fs = require('fs'),
readline = require('readline'),
instream = fs.createReadStream(config.file),
outstream = new (require('stream'))(),
rl = readline.createInterface(instream, outstream);
rl.on('line', function (line) {
var split = line.split(" ");
_user = "#" + split[0];
_text = "'" + split[1] + "'";
_addedBy = config._addedBy;
_dateAdded = new Date().toISOString();
quoteObj = { user : _user , text : _text , addedby : _addedBy, dateadded : _dateAdded};
db.collection("quotes").insertOne(quoteObj, function(err, res) {
if (err) throw err;
console.log("record inserted.");
});
});
rl.on('close', function (line) {
console.log('done reading file.');
console.log('closing db.')
db.close();
});
(full code is here: https://github.com/HansHovanitz/Import-Stuff/blob/master/importStuff.js)
When I run it I get the message 'done reading file' and 'closing db' and then all of the 'record inserted' messages. Why is that happening? Is it because of the delay in inserting a record in the db? The fact that I see 'closing db' first makes me think that the db would be getting closed and then how are the records being inserted still?
Just curious to know why the program is executing in this order for my own peace of mind. Thanks for any insight!

In short, it's because of asynchronous nature of I/O operations in the used functions - which is quite common for Node.js.
Here's what happens. First, the script reads all the lines of the file, and for each line initiates db.insertOne() operation, supplying a callback for each of them. Note that the callback will be called when the corresponding operation is finished, not in the middle of this process.
Eventually the script reaches the end of the input file, logs two messages, then invokes db.close() line. Note that even though 'insert' callbacks (that log 'inserted' message) are not called yet, the database interface has already received all the 'insert' commands.
Now the tricky part: whether or not DB interface succeeds to store all the DB records (in other words, whether or not it'll wait until all the insert operations are completed before closing the connection) is up both to DB interface and its speed. If write op is fast enough (faster than reading the file line), you'll probably end up with all the records been inserted; if not, you can miss some of them. That's why it's a safest bet to close the connection to database not in the file close (when the reading is complete), but in insert callbacks (when the writing is complete):
let linesCount = 0;
let eofReached = false;
rl.on('line', function (line) {
++linesCount;
// parsing skipped for brevity
db.collection("quotes").insertOne(quoteObj, function(err, res) {
--linesCount;
if (linesCount === 0 && eofReached) {
db.close();
console.log('database close');
}
// the rest skipped
});
});
rl.on('close', function() {
console.log('reading complete');
eofReached = true;
});
This question describes the similar problem - and several different approaches to solve it.

Welcome to the world of asynchronicity. Inserting into the DB happens asynchronously. This means that the rest of your (synchronous) code will execute completely before this task is complete. Consider the simplest asynchronous JS function setTimeout. It takes two arguments, a function and a time (in ms) after which to execute the function. In the example below "hello!" will log before "set timeout executed" is logged, even though the time is set to 0. Crazy right? That's because setTimeout is asynchronous.
This is one of the fundamental concepts of JS and it's going to come up all the time, so watch out!
setTimeout(() => {
console.log("set timeout executed")
}, 0)
console.log("hello!")

When you call db.collection("quotes").insertOne you're actually creating an asynchronous request to the database, a good way to determine if a code will be asynchronous or not is if one (or more) of its parameters is a callback.
So the order you're running it is actually expected:
You instantiate rl
You bind your event handlers to rl
Your stream starts processing & calling your 'line' handler
Your 'line' handler opens asynchronous requests
Your stream ends and rl closes
...
4.5. Your asynchronous requests return and execute their callbacks
I labelled the callback execution as 4.5 because technically your requests can return at anytime after step 4.
I hope this is a useful explanation, most modern javascript relies heavily on asynchronous events and it can be a little tricky to figure out how to work with them!

You're on the right track. The key is that the database calls are asychronous. As the file is being read, it starts a bunch of async calls to the database. Since they are asynchronous, the program doesn't wait for them to complete at the time they are called. The file then closes. As the async calls complete, your callbacks runs and the console.logs execute.

Your code reads lines and immediately after that makes a call to the db - both asynchronous processes. When the last line is read the last request to the db is made and it takes some time for this request to be processed and the callback of the insertOne to be executed. Meanwhile the r1 has done it's job and triggers the close event.

Related

Code not executed in sequence

I have a document in my cloudant db with _id of mittens13. I tried to query it and alert, one in the query statement and another one outside the query statement.
However, the one outside the query statement were called first and it gave an alert of undefined and then it gave another alert hello which was the item in the doc. May I know why?
Javascript code
function queryDB() {
var price;
db.get("mittens13", function (err, response) {
console.log(err || response);
alert(response.title);
price = response.title;
});
alert(price);
}
Details of the doc in my db
{
"_id": "mittens13",
"_rev": "1-78ef016a3534df0764bbf7178c35ea11",
"title": "hello",
"occupation": "kitten123"
}
Question: Why is alert(price); producing undefined?
The reason why your alert(price)shows undefined, even though the code is written after your db.get code, is because db.get is asynchronous.
Because it is an asynchronous call, your program will not wait for the response of db.get before it continues. So before your db.get comes back, your program has already reached the alert(price); line. It looks and sees that the only other code that has been written regarding price is var price;. Which, if you tried to print, would result in undefined.
You should research ajax and callbacks.
db.get is asyncronous, so when alert(price) is called the function before it is actually still running (on a different thread). I think the correct way would be:
db.get("mittens13", function (err, response) {
console.log(err || response);
alert(response.title);
price = response.title;
}).then(function(){ alert(price) };
the .then allows the alert(price) to run only after the previous task as finished, it also runs on the same thread (i believe, somebody will probably correct me). also a small side note, you should probably add some error checking and if you catach an error be sure to cancel the task continuation (.then)

nodeJS wait for event that cannot be promisified

I'm trying to read an STDIN PIPE from my nodejs file and make a POST request to an URL with every line given fom STDIN then wait for the response, read next line, send, wait etc.
'use strict';
const http = require('http');
const rl = require('readline').createInterface(process.stdin,null);
rl.on('line', function (line) {
makeRequest(line); // I need to wait calling the next callback untill the previous finishes
}).on('close',function(){
process.exit(0);
});
the problem is, rl.on('line') will instantly read thousands of lines from my pipe and launch thousands of requests instantly what will lead into an EMFILE exception. I know this is the expected behavior of non-blocking IO but in this case, one cannot use promises/futures because .on('line') is a callback itself and I cannot manipulate it to not trigger without loosing data from my input.
So, if callbacks cannot be used and timeout hacks aren't elegant enough how can one break out of the curse of nonblockIO?
Keep a counter of active requests (increment on send, decrement on response). Once the counter exceeds a constant (say, 200), (check on every 'line' event) call rl.pause(). On every response, check if the counter is smaller than your constant, and if it is, call rl.resume(). This should limit the rate of requests and current lines in memory, and fix your problem.
Node's readline class has pause and resume functions that defer to the underlying stream equivalents. These functions are specifically made for throttling parts of a pipeline to assist with bottlenecks. See the following example from the stream.Readable.pause documentation:
var readable = getReadableStreamSomehow();
readable.on('data', (chunk) => {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(() => {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});
That gives you fine grained control over how much data flows into your URL fetching code.

Javascript event sequence, pg, postgresql, why?

I am trying to write a javascript file in express to talk to a postgresql database. More precisely, I want to write a function that takes SQL as an input parameter and returns the stringified json. I can assume memory is not an issue given these table sizes. This is paid work making an internal use tool for a private business.
My most recent attempt involved the query callback putting the value into a global variable, but even that still fails because the outermost function returns before the json string is defined. Here is the relevant code:
var dbjson;
function callDB(q) {
pg.connect(connectionString, function(err, client, done) {
if (err) {
console.error('error fetching client from pool', err);
} else {
client.query(q, [], function(err, result) {
client.query('COMMIT');
done();
if (err) {
console.error('error calling query ' + q, err);
} else {
dbjson = JSON.stringify(result.rows);
console.log('1 ' + dbjson);
}
console.log('2 ' + dbjson);
});
console.log('3 ' + dbjson);
}
console.log('4 ' + dbjson);
});
console.log('5 ' + dbjson);
}
The SQL in my test is "select id from users".
The relevant console output is:
5 undefined
GET /db/readTable?table=users 500 405.691 ms - 1671
3 undefined
4 undefined
1 [{"id":1},{"id":2},{"id":3},{"id":4}]
2 [{"id":1},{"id":2},{"id":3},{"id":4}]
Why do the console logs occur in the order that they do?
They are consistent in the order.
I attempted to write a polling loop to wait for the global variable to be set using setTimeout in the caller and then clearing the timeout within the callback but that failed, I think, because javascript is single threaded and my loop did not allow other activity to proceed. Perhaps I was doing that wrong.
While I know I could have each function handle its own database connection and error logging, I really hate repeating the same code.
What is a better way to do this?
I am relatively new to express and javascript but considerably more experienced with other languages.
Presence of the following line will break everything for you:
client.query('COMMIT');
You are trying to execute an asynchronous command in a synchronous manner, and you are calling done(), releasing the connection, before that query gets a chance to execute. The result of such invalid disconnection would be unpredictable, especially since you are not handling any error in that case.
And why are you calling a COMMIT there in the first place? That in itself looks completely invalid. COMMIT is used for closing the current transaction, that which you do not even open there, so it doesn't exist.
There is a bit of misunderstanding there in terms of asynchronous code usage and the database also. If you want to have a good start at both, I would suggest to have a look at pg-promise.

Parse Cloud Code Ending Prematurely?

I'm writing a job that I want to run every hour in the background on Parse. My database has two tables. The first contains a list of Questions, while the second lists all of the user\question agreement pairs (QuestionAgreements). Originally my plan was just to have the client count the QuestionAgreements itself, but I'm finding that this results in a lot of requests that really could be done away with, so I want this background job to run the count, and then update a field directly on Question with it.
Here's my attempt:
Parse.Cloud.job("updateQuestionAgreementCounts", function(request, status) {
Parse.Cloud.useMasterKey();
var query = new Parse.Query("Question");
query.each(function(question) {
var agreementQuery = new Parse.Query("QuestionAgreement");
agreementQuery.equalTo("question", question);
agreementQuery.count({
success: function(count) {
question.set("agreementCount", count);
question.save(null, null);
}
});
}).then(function() {
status.success("Finished updating Question Agreement Counts.");
}, function(error) {
status.error("Failed to update Question Agreement Counts.")
});
});
The problem is, this only seems to be running on a few of the Questions, and then it stops, appearing in the Job Status section of the Parse Dashboard as "succeeded". I suspect the problem is that it's returning prematurely. Here are my questions:
1 - How can I keep this from returning prematurely? (Assuming this is, in fact, my problem.)
2 - What is the best way of debugging cloud code? Since this isn't client side, I don't have any way to set breakpoints or anything, do I?
status.success is called before the asynchronous success calls of count are finished. To prevent this, you can use promises here. Check the docs for Parse.Query.each.
Iterates over each result of a query, calling a callback for each one. If the callback returns a promise, the iteration will not continue until that promise has been fulfilled.
So, you can chain the count promise:
agreementQuery.count().then(function () {
question.set("agreementCount", count);
question.save(null, null);
});
You can also use parallel promises to make it more efficient.
There are no breakpoints in cloud code, that makes Parse really hard to use. Only way is logging your variables with console.log
I was able to utilize promises, as suggested by knshn, to make it so that my code would complete before running success.
Parse.Cloud.job("updateQuestionAgreementCounts", function(request, status) {
Parse.Cloud.useMasterKey();
var promises = []; // Set up a list that will hold the promises being waited on.
var query = new Parse.Query("Question");
query.each(function(question) {
var agreementQuery = new Parse.Query("QuestionAgreement");
agreementQuery.equalTo("question", question);
agreementQuery.equalTo("agreement", 1);
// Make sure that the count finishes running first!
promises.push(agreementQuery.count().then(function(count) {
question.set("agreementCount", count);
// Make sure that the object is actually saved first!
promises.push(question.save(null, null));
}));
}).then(function() {
// Before exiting, make sure all the promises have been fulfilled!
Parse.Promise.when(promises).then(function() {
status.success("Finished updating Question Agreement Counts.");
});
});
});

Node js for loop timer

I'm trying to test which Database, which can implemented in Node.js has the fastest time in certain tasks.
I'm working with a few DBs here, and every time I try to time a for loop, the timer quickly ends, but the task wasn't actually finished. Here is the code:
console.time("Postgre_write");
for(var i = 0; i < limit; i++){
var worker = {
id: "worker" + i.toString(),
ps: Math.random(),
com_id: Math.random(),
status: (i % 3).toString()
}
client.query("INSERT INTO test(worker, password, com_id, status) values($1, $2, $3, $4)", [worker.id, worker.ps, worker.com_id, worker.status]);
}
console.timeEnd("Postgre_write");
when I tried like a few hundred I thought the results were true. But when I test higher numbers like 100,000, the console outputs 500ms, but when I check the PGadmin app, the inserts were still processing.
Obviously I got something wrong here, but I don't know what. I rely heavily on the timer data and I need to find a way to time these operations correctly. Help?
Node.js is asynchronous. This means that your code will not necessarily execute in the order that you have written it.
Take a look at this article http://www.i-programmer.info/programming/theory/6040-what-is-asynchronous-programming.html which offers a pretty decent explanation of asynchronous programming.
In your code, your query is sent to the database and begins to execute. Since Node is asynchronous, it does not wait for that line to finish executing, but instead goes to the next line (which stops your timer).
When your query is finished running, a callback will occur and notify Node that the query has finished. If you specify a callback function it will be run at that point. Think of event-driven programming, which the query's completion being an event.
To solve your problem, stop the timer in the callback function of the query.

Categories

Resources