I'm learning Node.js through the learnyounode project. I have completed the first few assignments and they all seemed reasonably straightforward.
Then, I got to the 'Async Juggling' one, and the assignment's description went completely over my head in terms of what I need to do.
The gist of it, is I need to write a Javascript that accepts 3 URLs as arguments, but that associates the correct response to the correct server. The assignment itself notes that you cannot naively assume that things will be properly associated with the correct URL.
The (incorrect) code I came up with proved that restriction true:
var http = require('http');
var bl = require('bl');
var httpCallback = function(response) {
var pipeHandler = function (err, data) {
if(err)
return console.error(err);
console.log(data.toString());
};
response.pipe(bl(pipeHandler));
};
var juggleAsyncConnections = function(connA, connB, connC) {
http.get(connA, httpCallback);
http.get(connB, httpCallback);
http.get(connC, httpCallback);
};
juggleAsyncConnections(process.argv[2], process.argv[3], process.argv[4]);
The problem, and thus my question, is, what is the correct way to handle asynchronous connection juggling, and what are the underlying concepts I need to understand to do it correctly?
Note: I've seen other questions, like "OMG why doesn't my solution work?" I'm not asking that, I deliberately set out to see the 'naive' solution fail for myself. I don't understand the underlying principles of why it doesn't work, or what principles actually do work. Additionally, I'm not asking for someone to 'solve the problem for me.' If the general algorithm can be explained, I can probably implement it on my own.
Counting callbacks is one of the fundamental ways of managing async in Node. [...]
That's an important piece.
You know how many inputs there are (3), and, because of that, you know how many outputs there should be. Keep a running tally as responses come back, then check if you received the full set before printing to the screen. You also want to keep the original order in mind (now if there were only a datatype that had numeric indexes... :grin:).
Good luck!
Related
I have a question I don't understand how this code works.
ans.map((val, indx) => {
const options = {
host: 'www.xxxx.com',
path: '/path'
port: 80,
path: path,
method: 'GET',
};
console.log(val)
send.getJSON(options, (code ,result) => {
console.log("oke22");
});
})
For [1,2,3], the output I get is:
1
2
3
oke
oke
oke
Why is the output not the following instead?
1
oke
2
oke
3
The issue is your .getJSON is asynchronous, running in synchronous code, this isn't bad, however it's handled slightly differently.
The design of Node.js uses an event-loop to provide asynchronicity on a single-threaded language like JavaScript.
So your callback won't actually be called until the .getJSON() has completed.
https://jsfiddle.net/uva5o10d/
Have a look here, I've made you an example to demonstrate what I mean, I simply fill an array with values (all 1s for this example), and set a callback function up using setTimeout (this delays by 1s), notice however, the program will continue to run. (Event-loop)
At the bottom of the file, notice the test(), this calls the version you currently have (very similar), including the call to a longer-running job, like retrieving data from an API.
Comment out test() and uncomment working() and you'll see the different, you may want to use console.log(value + " " + oke) inside your .getJSON, producing the results you're looking for.
https://codeforgeek.com/asynchronous-programming-in-node-js/
Also side-note, forEach would probably be a better method for iterating over an array, unless you're wanted a transformed array back (map).
I've attached some resources I think you may find helpful:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array
https://www.techopedia.com/definition/3821/iteration (you're already on track but always worth having a read if you're not overly familiar).
As I understand it, there are three somewhat distinct reasons to put multiple IndexedDB operations in a single transaction rather than using a unique transaction for each operation:
Performance. If you’re doing a lot of writes to an object store, it’s much faster if they happen in one transaction.
Ensuring data is written before proceeding. Waiting for the “oncomplete” event is the only way to be sure that a subsequent IndexedDB query won’t return stale data.
Performing an atomic set of DB operations. Basically, “do all of these things, but if one of them fails, roll it all back”.
#1 is fine, most databases have the same characteristic.
#2 is a little more unique, and it causes issues when considered in conjunction with #3. Let’s say I have some simple function that writes something to the database and runs a callback when it's over:
function putWhatever(obj, cb) {
var tx = db.transaction("whatever", "readwrite");
tx.objectStore("whatever").put(obj);
tx.oncomplete = function () { cb(); };
}
That works fine. But now if you want to call that function as a part of a group of operations you want to atomically commit or fail, it's impossible. You'd have to do something like this:
function putWhatever(tx, obj, cb) {
tx.objectStore("whatever").put(obj).onsuccess = function () { cb(); };
}
This second version of the function is very different than the first, because the callback runs before the data is guaranteed to be written to the database. If you try to read back the object you just wrote, you might get a stale value.
Basically, the problem is that you can only take advantage of one of #2 or #3. Sometimes the choice is clear, but sometimes not. This has led me to write horrible code like:
function putWhatever(tx, obj, cb) {
if (tx === undefined) {
tx = db.transaction("whatever", "readwrite");
tx.objectStore("whatever").put(obj);
tx.oncomplete = function () { cb(); };
} else {
tx.objectStore("whatever").put(obj).onsuccess = function () { cb(); };
}
}
However even that still is not a general solution and could fail in some scenarios.
Has anyone else run into this problem? How do you deal with it? Or am I simply misunderstanding things somehow?
The following is just opinion as this doesn't seem like a 'one right answer' question.
First, performance is an irrelevant consideration. Avoid this factor entirely, unless later profiling suggests a material problem. Chances of perf issues are ridiculously low.
Second, I prefer to organize requests into transactions solely to maintain integrity. Integrity is paramount. Integrity as I define it here simply means that the database at any one point in time does not contain conflicting or erratic data. Essentially the database is never able to enter into a 'bad' state. For example, to impose a rule that cross-store object references point to valid and existing objects in other stores (a.k.a. referential integrity), or to prevent duplicated requests such as a double add/put/delete. Obviously, if the app were something like a bank app that credits/debits accounts, or a heart-attack monitor app, things could go horribly wrong.
My own experience has led me to believe that code involving indexedDB is not prone to the traditional facade pattern. I found that what worked best, in terms of organizing requests into different wrapping functions, was to design functions around transactions. I found that quite often there are very few DRY violations because every request is nearly always unique to its transactional context. In other words, while a similar 'put object' request might appear in more than one transaction, it is so distinct in its behavior given its separate context that it merits violating DRY.
If you go the function per request route, I am not sure why you are checking if the transaction parameter is undefined. Have the caller create the function and then pass it to the requests in turn. Expect the tx to always be defined and do not over-zealously guard against it. If it is ever not defined there is either a serious bug in indexedDB or in your calling function.
Explicitly, something like:
function doTransaction1(db, onComplete) {
var tx = db.transaction(...);
tx.onComplete = onComplete;
doRequest1(tx);
doRequest2(tx);
doRequest3(tx);
}
function doRequest1(tx) {
var store = tx.objectStore(...);
// ...
}
// ...
If the requests should not execute in parallel, and must run in a series, then this indicates a larger and more difficult design issue.
I am new to databases and these MongoDB/Mongoose.js async functions have annoyed the hell out of me over the last few hours. I have written and rewritten this bit so many times:
router.get('/districts', function(req, res) {
districtNames = [];
// I'm using mongoose-simpledb, so db.District refers to the districts collection
db.District.find(function(err, found) {
found.forEach(function(element) {
findParentProv(element, districtNames);
});
res.render('districts', {title: "Districts page", district_list: districtNames});
})
});
function findParentProv(element, namesArray) {
db.Province.findById(element.parent, function(err, found) {
console.log(found.name);
namesArray.push(element.name + " " + found.name);
});
}
I want to get all items in the districts collection, follow their parent field (which contains an ObjectID), find that item from the provinces collection and push both their names as a string into districtNames.
How should I do this?
Well, you do seem to be on the right track.
The one major issue I recognize in your solution is that after kicking off all the async queries for parents, you immediately return the (most likely empty) districtNames array, without waiting for the queries to finish.
This is indeed very annoying, and not surprisingly so. MongoDB is a non-relational DB, and so join operations like what you're trying to do aren't easy to get right.
The solution that would probably require the least fundamental changes to what you're doing would be to wait on all the queries before calling res.render. The most basic way to do this would be to check the length of namesArray/districtNames after pushing each element, and once you see it's gotten to the desired size, only then calling render. There are, however, more standardized ways of doing this, and I'd suggest looking into something like Async (specifically async.parallel) or a Promise framework such as Bluebird.
Now, another approach to solving this problem is de-normalizing the data. For someone with a relational background this probably sound appalling, but in Mongo it might actually be a valid solution to just include the province names along with their IDs in the districts collection, in which case your one initial query should be sufficient.
Another approach, which might be suitable if you're dealing with relatively small collections, would be to run 2 queries, 1 for all the districts and 1 for all the provinces, and do the correlation in-app. Obviously, this isn't a very efficient solution, and should definitely be avoided if there's any chance the collections contain, or will in the future contain, more than a handful of objects.
Best bet moving forward is to use ES6 Promise patterns to help with your callback patterns..
suggested modules:
lodash [optional] has a lot of useful methods, not needed here, but you may need, for example _.flatten, or _.assign
i-promise will give you a native Promise (node 0.11.3+) or a scripted implementation
es6-promise is the fallback for i-promise to use
promisify-patch is an inline promisify for specific methods.
Install the modules required for your use (in this example).
npm install --save es6-promise i-promise promisify-patch
Use Promise pattern with your example.
require('promisify-patch').patch();
var Promise = require('i-promise')
;
//returns a promise to resolve to your list for display
function getDistricts() {
//gets all of the db.District
return db.District.find.bind(db.District).promise()
//after districts retrieved
.then(function(districts){
//resolve an array of promises, will return an array of results
return Promise.all(districts.map(getDistrictProv)); //map each district via getDistrictProv
});
}
//returns a promise to resolve a specific district/province name
function getDistrictProv(district){
return db.Provice.findById.bind(db.Province).promise(element.parent)
.then(function(province){
return district.name + ' ' + province.name;
});
}
...
//express handler
router.get('/districts', function(req, res, next) {
//get the district names
getDistricts()
//then process the rendering with the names
.then(function(names){
res.render('districts', {title: "Districts page", district_list: names});
})
//if there was an error in the promise chain
// pass it along, so it can be handled by another express plugin
.catch(next)
});
Disclosure: I made i-promise and promisify-patch to make situations like this easier to convert node-style callbacks into promise chains.
NOTE: If you are creating general purpose libraries for Node or the Browser that are not flow-control related, you should at least implement the node-style callback implementation.
Further, you may wish to look into co, koa for using generators as well.
The question seemed to be how to control the flow of data, in which promises are likely the best answer. If your issue is trying to fit non-relational data into a relational box or vice-versa, may want to re-evaluate your data structure as well...
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3
You should probably have some key data for parents/children replicated to those affected documents in other collections. There are configuration options via Mongoose to support this, but that doesn't mean you should avoid the consideration.
If you do many/large join operations like this it will negatively affect your performance. This isn't meant to be a religious comment only that MongoDB/SQL or other SQL vs. NoSQL considerations should be made depending on your actual needs.
The data in question seems to be highly cacheable data that may well be better with a relational/sql database.
The problem is simple: fetch rows from a database and pass them to an interface. F.e. one implementation of this interface will write this data to an XML file.
I'm looking for a pattern so:
the interface only has one method instead of 3: beginWrite / write / endWrite
it shouldn't fetch all rows at once, but instead 'feed' the interface row-by-row.
I don't want to pass the mongodb cursor to the interface, because interface implementation should not rely on a specific database driver.
Interface
function IBackend(implementation){
// removed code that merges implementation with this interface
// because it is irrelevant to this question.
}
IBackend.prototype.beginWrite = function(callback) {};
IBackend.prototype.write = function(row, callback) {};
IBackend.prototype.endWrite = function(callback) {};
Idea
Something I came up with was to only define one function:
IBackend.prototype.writeAll = function(callback) {};
then the implementation of this interface calls the callback passing a writeOne and end callback as arguments so it can be used as:
backend.writeAll(function(writeOneCallback, endCallback) {
collection.find().each(function(err, doc){
if (err) throw new Error(err);
writeOneCallback(doc);
});
endCallback();
});
But then, the passing of the writeOne and end callback depends on the implementation and isn't specified in the interface. So if anyone knows an elegant solution for this, I would love to hear it :)
The first thing that comes to mind is streams. While it has more than method, and as such doesn't answer your question, it is:
Proven
Seems appropriate
Plays nicely with many other parts of the node.js ecosystem
Doesn't need all rows at once, e.g.
There are also many streams already available that will reduce your development time, and many other streams you can pipe your stream's output to that will also reduce dev time (xml, zip, send over http, etc.).
See the excellent Streams Handbook (https://github.com/substack/stream-handbook) for more, and examples of such "other streams".
If you are still not convinced, I can think of only 2 scenarios:
Your interface also guarantees that the implementation does not need to be "flushed" (that is: does not maintain a state between calls to write() that will eventually need to be summarized, flushed, output, etc. Examples of such states: buffers, stats that will be appended/prepended to the output, ...).
Your interface does not guarantee that.
If #1 above is correct, then simply have write(), that accepts either an array of rows of a single row, and does the processing immediately. The guarantee by #1 above implies that at any given moment, the output from your interface constitutes a valid, coherent state.
If #2 above is correct (as would be the example for outputting an XML file), then I don't see a way around at least write() and end().
EDIT
thx to all the answers,
and finally I decide to use some tools like Step,
all I need is "flow control" and don't want any thing else which may slow down the performance (I don't know how much exactly it would effect or the effect just can be ignored).
So I just create a little tool for flow control:
line.js
/**
* Create the "next" function
*
* #param {Array} tasks
* #param {Number} index
* #param {Number} last
*/
var next = function(tasks, index, last) {
if (index == last) {
return tasks[index + 1];
}
else {
return function(data) {
var nextIndex = index + 1;
tasks[nextIndex](next(tasks, nextIndex, last), data);
};
}
};
/**
* Invoke functions in line.
*/
module.exports = function() {
var tasks = arguments,
last = tasks.length - 2;
tasks[0](next(tasks, 0, last));
};
usage:
var line = require("line.js");
line(function(next) {
someObj.find(function(err, docs) {
// codes
next(docs);
});
}, function(next, docs) {
// codes
});
Hope this helps.
EDIT END
As all know,
Node's built-in or third-part modules often provides async API,
and using "callback" function for dealing the results.
It's cool but sometimes would code like this:
//some codes
}
}
}
}
codes like this are hard to read.
I know "deferred" library can solve such problem,
Is there any good "deferred" module for Node?
And How is the performance if I code Node with "deferred"?
It is a large problem with Node-based code; you frequently grow "callback pyramids". There are several approaches to dealing with the problem:
Code style:
Use this annoyance as an opportunity to break your code into bite sized chunks. It means you're likely going to have a proliferation of tiny named funcs - that's probably just fine, though! You might also find more opportunities for reuse.
Flow-control Libraries
There are exactly 593.72 billion flow control libraries out there. Here's some of the more popular ones:
Step super basic serial & parallel flow management.
seq is a heavier but more feature-full flow control library.
There's plenty more. Search the npm registry for "flow" and "flow control" (sorry, doesn't appear to be linkable)
Language Extensions
There are several attempts to provide a more synchronous-feeling syntax on top of JavaScript (or CoffeeScript), often based on the concepts behind the tame paper.
TameJS is the OkCupid team's answer to this.
IcedCoffeeScript they've also ported TameJS over CoffeeScript as a fork.
streamline.js is very similar to TameJS.
StratifiedJS is a heavier approach to the problem.
This route is a deal-breaker for some:
It's not standard JavaScript; if you are building libraries/frameworks/etc, finding help will be more difficult.
Variable scope can behave in unexpected ways, depending on the library.
The generated code can be difficult to debug & match to the original source.
The Future:
The node core team is very aware of the problem, and are also working on lower level components to help ease the pain. It looks like they'll be introducing a basic version of domains in v0.8, which provide a way of rolling up error handling (avoiding the common return err if err pattern, primarily).
This should start to lay a great foundation for cleaner flow control libraries, and start to pave the way for a more consistent way of dealing with callback pyramids. There's too much choice out there right now, and the community isn't close to agreeing on even a handful of standards yet.
References:
Mixu's Node book has an awesome chapter on this subject.
There are tons of "deferred libraries". Have a look there http://eirikb.github.com/nipster/#promise and there http://eirikb.github.com/nipster/#deferred. To pick one, it's only a matter of style & simplicity :)
If you really don't like that, there's always the alternative of using named functions, which will reduce the indentation.
Instead of
setTimeout(function() {
fs.readFile('file', function (err, data) {
if (err) throw err;
console.log(data);
})
}, 200);
You can do this:
function dataHandler(err, data)
{
if (err) throw err;
console.log(data);
}
function getFile()
{
fs.readFile('file', dataHandler);
}
setTimeout(getFile, 200);
The same thing, no nesting.
There are some libraries that may be useful in some scenarios, but as a whole you won't be excited after using them for everything.
According to the slowness issues. Since node.js is async, the wrapped functions are not such a big performance consumer.
You could look here for deferred-like library
https://github.com/kriszyp/node-promise
Also this question is very similar
What nodejs library is most like jQuery's deferreds?
And as a final bonus I suggest you take a look at CoffeeScript. It is a language, which compiles to javascript and has more beautiful syntax, since the function braces are removed
I usually like to use the async.js library as it offers a few different options on how to execute the code