The problem is simple: fetch rows from a database and pass them to an interface. F.e. one implementation of this interface will write this data to an XML file.
I'm looking for a pattern so:
the interface only has one method instead of 3: beginWrite / write / endWrite
it shouldn't fetch all rows at once, but instead 'feed' the interface row-by-row.
I don't want to pass the mongodb cursor to the interface, because interface implementation should not rely on a specific database driver.
Interface
function IBackend(implementation){
// removed code that merges implementation with this interface
// because it is irrelevant to this question.
}
IBackend.prototype.beginWrite = function(callback) {};
IBackend.prototype.write = function(row, callback) {};
IBackend.prototype.endWrite = function(callback) {};
Idea
Something I came up with was to only define one function:
IBackend.prototype.writeAll = function(callback) {};
then the implementation of this interface calls the callback passing a writeOne and end callback as arguments so it can be used as:
backend.writeAll(function(writeOneCallback, endCallback) {
collection.find().each(function(err, doc){
if (err) throw new Error(err);
writeOneCallback(doc);
});
endCallback();
});
But then, the passing of the writeOne and end callback depends on the implementation and isn't specified in the interface. So if anyone knows an elegant solution for this, I would love to hear it :)
The first thing that comes to mind is streams. While it has more than method, and as such doesn't answer your question, it is:
Proven
Seems appropriate
Plays nicely with many other parts of the node.js ecosystem
Doesn't need all rows at once, e.g.
There are also many streams already available that will reduce your development time, and many other streams you can pipe your stream's output to that will also reduce dev time (xml, zip, send over http, etc.).
See the excellent Streams Handbook (https://github.com/substack/stream-handbook) for more, and examples of such "other streams".
If you are still not convinced, I can think of only 2 scenarios:
Your interface also guarantees that the implementation does not need to be "flushed" (that is: does not maintain a state between calls to write() that will eventually need to be summarized, flushed, output, etc. Examples of such states: buffers, stats that will be appended/prepended to the output, ...).
Your interface does not guarantee that.
If #1 above is correct, then simply have write(), that accepts either an array of rows of a single row, and does the processing immediately. The guarantee by #1 above implies that at any given moment, the output from your interface constitutes a valid, coherent state.
If #2 above is correct (as would be the example for outputting an XML file), then I don't see a way around at least write() and end().
Related
Context
I'm building a general purpose game playing A.I. framework/library that uses the Monte Carlo Tree Search algorithm. The idea is quite simple, the framework provides the skeleton of the algorithm, the four main steps: Selection, Expansion, Simulation and Backpropagation. All the user needs to do is plug in four simple(ish) game related functions of his making:
a function that takes in a game state and returns all possible legal moves to be played
a function that takes in a game state and an action and returns a new game state after applying the action
a function that takes in a game state and determines if the game is over and returns a boolean and
a function that takes in a state and a player ID and returns a value based on wether the player has won, lost or the game is a draw. With that, the algorithm has all it needs to run and select a move to make.
What I'd like to do
I would love to make use of parallel programming to increase the strength of the algorithm and reduce the time it needs to run each game turn. The problem I'm running into is that, when using Child Processes in NodeJS, you can't pass functions to the child process and my framework is entirely built on using functions passed by the user.
Possible solution
I have looked at this answer but I am not sure this would be the correct implementation for my needs. I don't need to be continually passing functions through messages to the child process, I just need to initialize it with functions that are passed in by my framework's user, when it initializes the framework.
I thought about one way to do it, but it seems so inelegant, on top of probably not being the most secure, that I find myself searching for other solutions. I could, when the user initializes the framework and passes his four functions to it, get a script to write those functions to a new js file (let's call it my-funcs.js) that would look something like:
const func1 = {... function implementation...}
const func2 = {... function implementation...}
const func3 = {... function implementation...}
const func4 = {... function implementation...}
module.exports = {func1, func2, func3, func4}
Then, in the child process worker file, I guess I would have to find a way to lazy load require my-funcs.js. Or maybe I wouldn't, I guess it depends how and when Node.js loads the worker file into memory. This all seems very convoluted.
Can you describe other ways to get the result I want?
child_process is less about running a user's function and more about starting a new thread to exec a file or process.
Node is inherently a single-threaded system, so for I/O-bound things, the Node Event Loop is really good at switching between requests, getting each one a little farther. See https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/
What it looks like you're doing is trying to get JavaScript to run multiple threads simultaniously. Short answer: can't ... or rather it's really hard. See is it possible to achieve multithreading in nodejs?
So how would we do it anyway? You're on the right track: child_process.fork(). But it needs a hard-coded function to run. So how do we get user-generated code into place?
I envision a datastore where you can take userFn.ToString() and save it to a queue. Then fork the process, and let it pick up the next unhandled thing in the queue, marking that it did so. Then write to another queue the results, and this "GUI" thread then polls against that queue, returning the calculated results back to the user. At this point, you've got multi-threading ... and race conditions.
Another idea: create a REST service that accepts the userFn.ToString() content and execs it. Then in this module, you call out to the other "thread" (service), await the results, and return them.
Security: Yeah, we just flung this out the window. Whether you're executing the user's function directly, calling child_process#fork to do it, or shimming it through a service, you're trusting untrusted code. Sadly, there's really no way around this.
Assuming that security isn't an issue you could do something like this.
// Client side
<input class="func1"> // For example user inputs '(gamestate)=>{return 1}'
<input class="func2">
<input class="func3">
<input class="func4">
<script>
socket.on('syntax_error',function(err){alert(err)});
submit_funcs_strs(){
// Get function strings from user input and then put into array
socket.emit('functions',[document.getElementById('func1').value,document.getElementById('func2').value,...
}
</script>
// Server side
// Socket listener is async
socket.on('functions',(funcs_strs)=>{
let funcs = []
for (let i = 0; i < funcs_str.length;i++){
try {
funcs.push(eval(funcs_strs));
} catch (e) {
if (e instanceof SyntaxError) {
socket.emit('syntax_error',e.message);
return;
}
}
}
// Run algorithm here
}
I am using protractor to test a series of webpages (actually mostly one webpage of Angular design, but others as well). I have created a series of page objects to handle this. To minimize code maintenance I have created an object with key value pairs like so:
var object = {
pageLogo: element(by.id('logo')),
searchBar: element.all(by.className('searchThing')),
...
};
The assumption being that I will only need to add something to the object to make it usable everywhere in the page object file. Of course, the file has functions (assuming you are not familiar with the page object pattern) as such:
var pageNamePageObject = function () {
var object = {...}; //list of elements
this.get = function() {
brower.get('#/someWebTag');
}
this.getElementText = function(someName){
if (typeof someName == 'number')
... (convert or handle exception, whatever)
return object[name].getText();
}
...
*Note these are just examples and these promises can be handled in a variety of ways in the page object or main tests
The issue comes from attempting to "cycle" through the object. Given that the particular test is attempting to verify, among other things, that all the elements are on the particular web page I am attempting to loop through these objects using the "isPresent()" function. I have made many attempts, and for brevities sake I will not list them here, however they include creating a wrapper promise (using "Q", which I must admit I have no idea how it works) and attempting to run the function in the 'expect' hoping that the jasmine core will wait for all the looping promises resolve and then read the output (it was more of a last ditch effort really).
You should loop like you did before on all of the elements, if you want it in a particular order, create a recursive function that just calls itself with the next element in the JSON.
Now, to handle jasmine specs finishing before and that stuff.
this function needs to be added to protractor's flow control for it to wait to continue, read more about it here. and also, dont use Q in protractor, use protractor's implementation of webdriverJS promises.
Also, consider using isDisplayed instead, assuming you want it to be dispalayed on the page.
So basically, your code skeleton will look like this:
it(.....{
var flow = webdriver.promise.controlFlow();
return webdriver.execute(function () {//your checks on the page here,
//if you need extract to external function as i described in my first paragraph
well i think that should provide you enough info on how to handle waiting for promises in protractor, hope i helped.
I'm working on a small blog engine where the user can create blog entry and possible to link tags to an entry. It is many-to-many relation, but due to that Breeze cannot yet manage this relation I have to expose the join table to breeze so that I can persist the data step-by-step. And my problem is here.
Tables:
BlogEntry
BlogEntryTag
Tag
Scenario:
user opens the "new blog entry" form or selects an existing one to be edited
enters the text, etc
selects one or more tags
Business logic:
create a new entity by Breeze / query the selected one
save the blog entry (1st server call which gives back the blog_id if the blog entry is new one)
check the already existing connections between the tags and blog entry, if the blog entry is edited then the already existing blogEntry-tag relations might change ( 2nd server call)
based on the tag name selecting the tag_id from tag table (3rd server call)
create the BlogEntrytag entities by breeze
persist the BlogEntrytag entities into database ( 4th server call)
I think the order must be consecutive.
I have this code and as you can see the attached screenshot the console logging marked by '_blogEntryEnttity' does not wait until the data returns from the server and it will be executed before the console logging marked by '_blogEntryEnttity inside'. The code will throw a reference exception when it tries to set up the title property a few line later.
var blogEntryEntityQueryPromise = datacontext.blogentry.getById(_blogsObject.id);
blogEntryEntityQueryPromise.then(function (result)
{
console.log('result', result);
_blogEntryEntity = result[0];
console.log('_blogEntryEnttity inside', _blogEntryEntity);
//if I need synchronous execution then I have to put the code here which must be executed consecutively
});
console.log('_blogEntryEnttity', _blogEntryEntity);
}
//mapping the values we got
_blogEntryEntity.title = _blogsObject.title;
_blogEntryEntity.leadWithMarkup = _blogsObject.leadWithMarkup;
_blogEntryEntity.leadWithoutMarkup = _blogsObject.leadWithoutMarkup;
_blogEntryEntity.bodyWithMarkup = _blogsObject.bodyWithMarkup;
_blogEntryEntity.bodyWithoutMarkup = _blogsObject.bodyWithoutMarkup;
console.log('_blogEntryEnttity', _blogEntryEntity);
The example comes from here.
My question is that, why it is not wait until the data comes back? What is the way of handling cases like this?
However, I figured out that, if I need synchronous execution then I should place the code into the success method following the data retrieving from the promise. However, I really don't like this solution because my code will be ugly after a while and hard to maintain.
The datacontext.blogentry.getById looks like below and the implementation is in an abstract class, you can find the code below too. The whole repository pattern comes from John Papa's course on Pluralsight.
Repository class method
function getById(id)
{
return this._getById(this.entityName, id);
}
Abstract repository class method. According to Breeze's documentation page the EntityQuery class' execute method returns a Promise.
function _getById(resource, id) {
var self = this;
var manager = self.newManager;
var Predicate = breeze.Predicate;
var p1 = new Predicate('id', '==', id);
return EntityQuery.from(resource)
.where(p1)
.using(manager).execute()
.then(success).catch(_queryFailed);
function success(data) {
return data.results;
}
}
I appreciate your help in advance!
I don't think you need all these round trips. I'd do this:
Query all available Tag entities, so they'll be in the EntityManager's cache (you need these to populate the UI anyway).
If it's an existing BlogEntry, just query the BlogEntry and all its associated BlogEntryTag entities; Breeze will connect the BlogEntryTags to their associated Tags in the cache. You'll add/delete BlogEntryTags if the user selects/unselects Tags for the BlogEntry.
var query = EntityQuery.from("BlogEntries").where("id", "==", id).expand("BlogEntryTags");
If it's a new BlogEntry, it won't have any BlogEntryTags. You'll create these when you save, after the user selects some tags.
Save the added/updated BlogEntry and any added/deleted BlogEntryTag entities to the database in a single saveChanges call.
See the Presenting Many-to-Many doc and its associated plunker for a deeper dive. The UI is different from what you want, but the underlying concepts are useful.
why it is not wait until the data comes back?
Because promises don't magically synchronize execution. They're still asynchronous, they still rely on callbacks.
What is the way of handling cases like this?
You need to put the code that should wait in the then callback.
However, I really don't like this solution because my code will be ugly after a while and hard to maintain.
Not really, you can write concise and elegant asynchronous code with promises. If your code is becoming too much spaghetti, abstract parts of it in own functions. You should be able to get to a clean and flat promise chain.
As I understand it, there are three somewhat distinct reasons to put multiple IndexedDB operations in a single transaction rather than using a unique transaction for each operation:
Performance. If you’re doing a lot of writes to an object store, it’s much faster if they happen in one transaction.
Ensuring data is written before proceeding. Waiting for the “oncomplete” event is the only way to be sure that a subsequent IndexedDB query won’t return stale data.
Performing an atomic set of DB operations. Basically, “do all of these things, but if one of them fails, roll it all back”.
#1 is fine, most databases have the same characteristic.
#2 is a little more unique, and it causes issues when considered in conjunction with #3. Let’s say I have some simple function that writes something to the database and runs a callback when it's over:
function putWhatever(obj, cb) {
var tx = db.transaction("whatever", "readwrite");
tx.objectStore("whatever").put(obj);
tx.oncomplete = function () { cb(); };
}
That works fine. But now if you want to call that function as a part of a group of operations you want to atomically commit or fail, it's impossible. You'd have to do something like this:
function putWhatever(tx, obj, cb) {
tx.objectStore("whatever").put(obj).onsuccess = function () { cb(); };
}
This second version of the function is very different than the first, because the callback runs before the data is guaranteed to be written to the database. If you try to read back the object you just wrote, you might get a stale value.
Basically, the problem is that you can only take advantage of one of #2 or #3. Sometimes the choice is clear, but sometimes not. This has led me to write horrible code like:
function putWhatever(tx, obj, cb) {
if (tx === undefined) {
tx = db.transaction("whatever", "readwrite");
tx.objectStore("whatever").put(obj);
tx.oncomplete = function () { cb(); };
} else {
tx.objectStore("whatever").put(obj).onsuccess = function () { cb(); };
}
}
However even that still is not a general solution and could fail in some scenarios.
Has anyone else run into this problem? How do you deal with it? Or am I simply misunderstanding things somehow?
The following is just opinion as this doesn't seem like a 'one right answer' question.
First, performance is an irrelevant consideration. Avoid this factor entirely, unless later profiling suggests a material problem. Chances of perf issues are ridiculously low.
Second, I prefer to organize requests into transactions solely to maintain integrity. Integrity is paramount. Integrity as I define it here simply means that the database at any one point in time does not contain conflicting or erratic data. Essentially the database is never able to enter into a 'bad' state. For example, to impose a rule that cross-store object references point to valid and existing objects in other stores (a.k.a. referential integrity), or to prevent duplicated requests such as a double add/put/delete. Obviously, if the app were something like a bank app that credits/debits accounts, or a heart-attack monitor app, things could go horribly wrong.
My own experience has led me to believe that code involving indexedDB is not prone to the traditional facade pattern. I found that what worked best, in terms of organizing requests into different wrapping functions, was to design functions around transactions. I found that quite often there are very few DRY violations because every request is nearly always unique to its transactional context. In other words, while a similar 'put object' request might appear in more than one transaction, it is so distinct in its behavior given its separate context that it merits violating DRY.
If you go the function per request route, I am not sure why you are checking if the transaction parameter is undefined. Have the caller create the function and then pass it to the requests in turn. Expect the tx to always be defined and do not over-zealously guard against it. If it is ever not defined there is either a serious bug in indexedDB or in your calling function.
Explicitly, something like:
function doTransaction1(db, onComplete) {
var tx = db.transaction(...);
tx.onComplete = onComplete;
doRequest1(tx);
doRequest2(tx);
doRequest3(tx);
}
function doRequest1(tx) {
var store = tx.objectStore(...);
// ...
}
// ...
If the requests should not execute in parallel, and must run in a series, then this indicates a larger and more difficult design issue.
I am new to databases and these MongoDB/Mongoose.js async functions have annoyed the hell out of me over the last few hours. I have written and rewritten this bit so many times:
router.get('/districts', function(req, res) {
districtNames = [];
// I'm using mongoose-simpledb, so db.District refers to the districts collection
db.District.find(function(err, found) {
found.forEach(function(element) {
findParentProv(element, districtNames);
});
res.render('districts', {title: "Districts page", district_list: districtNames});
})
});
function findParentProv(element, namesArray) {
db.Province.findById(element.parent, function(err, found) {
console.log(found.name);
namesArray.push(element.name + " " + found.name);
});
}
I want to get all items in the districts collection, follow their parent field (which contains an ObjectID), find that item from the provinces collection and push both their names as a string into districtNames.
How should I do this?
Well, you do seem to be on the right track.
The one major issue I recognize in your solution is that after kicking off all the async queries for parents, you immediately return the (most likely empty) districtNames array, without waiting for the queries to finish.
This is indeed very annoying, and not surprisingly so. MongoDB is a non-relational DB, and so join operations like what you're trying to do aren't easy to get right.
The solution that would probably require the least fundamental changes to what you're doing would be to wait on all the queries before calling res.render. The most basic way to do this would be to check the length of namesArray/districtNames after pushing each element, and once you see it's gotten to the desired size, only then calling render. There are, however, more standardized ways of doing this, and I'd suggest looking into something like Async (specifically async.parallel) or a Promise framework such as Bluebird.
Now, another approach to solving this problem is de-normalizing the data. For someone with a relational background this probably sound appalling, but in Mongo it might actually be a valid solution to just include the province names along with their IDs in the districts collection, in which case your one initial query should be sufficient.
Another approach, which might be suitable if you're dealing with relatively small collections, would be to run 2 queries, 1 for all the districts and 1 for all the provinces, and do the correlation in-app. Obviously, this isn't a very efficient solution, and should definitely be avoided if there's any chance the collections contain, or will in the future contain, more than a handful of objects.
Best bet moving forward is to use ES6 Promise patterns to help with your callback patterns..
suggested modules:
lodash [optional] has a lot of useful methods, not needed here, but you may need, for example _.flatten, or _.assign
i-promise will give you a native Promise (node 0.11.3+) or a scripted implementation
es6-promise is the fallback for i-promise to use
promisify-patch is an inline promisify for specific methods.
Install the modules required for your use (in this example).
npm install --save es6-promise i-promise promisify-patch
Use Promise pattern with your example.
require('promisify-patch').patch();
var Promise = require('i-promise')
;
//returns a promise to resolve to your list for display
function getDistricts() {
//gets all of the db.District
return db.District.find.bind(db.District).promise()
//after districts retrieved
.then(function(districts){
//resolve an array of promises, will return an array of results
return Promise.all(districts.map(getDistrictProv)); //map each district via getDistrictProv
});
}
//returns a promise to resolve a specific district/province name
function getDistrictProv(district){
return db.Provice.findById.bind(db.Province).promise(element.parent)
.then(function(province){
return district.name + ' ' + province.name;
});
}
...
//express handler
router.get('/districts', function(req, res, next) {
//get the district names
getDistricts()
//then process the rendering with the names
.then(function(names){
res.render('districts', {title: "Districts page", district_list: names});
})
//if there was an error in the promise chain
// pass it along, so it can be handled by another express plugin
.catch(next)
});
Disclosure: I made i-promise and promisify-patch to make situations like this easier to convert node-style callbacks into promise chains.
NOTE: If you are creating general purpose libraries for Node or the Browser that are not flow-control related, you should at least implement the node-style callback implementation.
Further, you may wish to look into co, koa for using generators as well.
The question seemed to be how to control the flow of data, in which promises are likely the best answer. If your issue is trying to fit non-relational data into a relational box or vice-versa, may want to re-evaluate your data structure as well...
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3
You should probably have some key data for parents/children replicated to those affected documents in other collections. There are configuration options via Mongoose to support this, but that doesn't mean you should avoid the consideration.
If you do many/large join operations like this it will negatively affect your performance. This isn't meant to be a religious comment only that MongoDB/SQL or other SQL vs. NoSQL considerations should be made depending on your actual needs.
The data in question seems to be highly cacheable data that may well be better with a relational/sql database.