As I understand it, there are three somewhat distinct reasons to put multiple IndexedDB operations in a single transaction rather than using a unique transaction for each operation:
Performance. If you’re doing a lot of writes to an object store, it’s much faster if they happen in one transaction.
Ensuring data is written before proceeding. Waiting for the “oncomplete” event is the only way to be sure that a subsequent IndexedDB query won’t return stale data.
Performing an atomic set of DB operations. Basically, “do all of these things, but if one of them fails, roll it all back”.
#1 is fine, most databases have the same characteristic.
#2 is a little more unique, and it causes issues when considered in conjunction with #3. Let’s say I have some simple function that writes something to the database and runs a callback when it's over:
function putWhatever(obj, cb) {
var tx = db.transaction("whatever", "readwrite");
tx.objectStore("whatever").put(obj);
tx.oncomplete = function () { cb(); };
}
That works fine. But now if you want to call that function as a part of a group of operations you want to atomically commit or fail, it's impossible. You'd have to do something like this:
function putWhatever(tx, obj, cb) {
tx.objectStore("whatever").put(obj).onsuccess = function () { cb(); };
}
This second version of the function is very different than the first, because the callback runs before the data is guaranteed to be written to the database. If you try to read back the object you just wrote, you might get a stale value.
Basically, the problem is that you can only take advantage of one of #2 or #3. Sometimes the choice is clear, but sometimes not. This has led me to write horrible code like:
function putWhatever(tx, obj, cb) {
if (tx === undefined) {
tx = db.transaction("whatever", "readwrite");
tx.objectStore("whatever").put(obj);
tx.oncomplete = function () { cb(); };
} else {
tx.objectStore("whatever").put(obj).onsuccess = function () { cb(); };
}
}
However even that still is not a general solution and could fail in some scenarios.
Has anyone else run into this problem? How do you deal with it? Or am I simply misunderstanding things somehow?
The following is just opinion as this doesn't seem like a 'one right answer' question.
First, performance is an irrelevant consideration. Avoid this factor entirely, unless later profiling suggests a material problem. Chances of perf issues are ridiculously low.
Second, I prefer to organize requests into transactions solely to maintain integrity. Integrity is paramount. Integrity as I define it here simply means that the database at any one point in time does not contain conflicting or erratic data. Essentially the database is never able to enter into a 'bad' state. For example, to impose a rule that cross-store object references point to valid and existing objects in other stores (a.k.a. referential integrity), or to prevent duplicated requests such as a double add/put/delete. Obviously, if the app were something like a bank app that credits/debits accounts, or a heart-attack monitor app, things could go horribly wrong.
My own experience has led me to believe that code involving indexedDB is not prone to the traditional facade pattern. I found that what worked best, in terms of organizing requests into different wrapping functions, was to design functions around transactions. I found that quite often there are very few DRY violations because every request is nearly always unique to its transactional context. In other words, while a similar 'put object' request might appear in more than one transaction, it is so distinct in its behavior given its separate context that it merits violating DRY.
If you go the function per request route, I am not sure why you are checking if the transaction parameter is undefined. Have the caller create the function and then pass it to the requests in turn. Expect the tx to always be defined and do not over-zealously guard against it. If it is ever not defined there is either a serious bug in indexedDB or in your calling function.
Explicitly, something like:
function doTransaction1(db, onComplete) {
var tx = db.transaction(...);
tx.onComplete = onComplete;
doRequest1(tx);
doRequest2(tx);
doRequest3(tx);
}
function doRequest1(tx) {
var store = tx.objectStore(...);
// ...
}
// ...
If the requests should not execute in parallel, and must run in a series, then this indicates a larger and more difficult design issue.
Related
Context
I'm building a general purpose game playing A.I. framework/library that uses the Monte Carlo Tree Search algorithm. The idea is quite simple, the framework provides the skeleton of the algorithm, the four main steps: Selection, Expansion, Simulation and Backpropagation. All the user needs to do is plug in four simple(ish) game related functions of his making:
a function that takes in a game state and returns all possible legal moves to be played
a function that takes in a game state and an action and returns a new game state after applying the action
a function that takes in a game state and determines if the game is over and returns a boolean and
a function that takes in a state and a player ID and returns a value based on wether the player has won, lost or the game is a draw. With that, the algorithm has all it needs to run and select a move to make.
What I'd like to do
I would love to make use of parallel programming to increase the strength of the algorithm and reduce the time it needs to run each game turn. The problem I'm running into is that, when using Child Processes in NodeJS, you can't pass functions to the child process and my framework is entirely built on using functions passed by the user.
Possible solution
I have looked at this answer but I am not sure this would be the correct implementation for my needs. I don't need to be continually passing functions through messages to the child process, I just need to initialize it with functions that are passed in by my framework's user, when it initializes the framework.
I thought about one way to do it, but it seems so inelegant, on top of probably not being the most secure, that I find myself searching for other solutions. I could, when the user initializes the framework and passes his four functions to it, get a script to write those functions to a new js file (let's call it my-funcs.js) that would look something like:
const func1 = {... function implementation...}
const func2 = {... function implementation...}
const func3 = {... function implementation...}
const func4 = {... function implementation...}
module.exports = {func1, func2, func3, func4}
Then, in the child process worker file, I guess I would have to find a way to lazy load require my-funcs.js. Or maybe I wouldn't, I guess it depends how and when Node.js loads the worker file into memory. This all seems very convoluted.
Can you describe other ways to get the result I want?
child_process is less about running a user's function and more about starting a new thread to exec a file or process.
Node is inherently a single-threaded system, so for I/O-bound things, the Node Event Loop is really good at switching between requests, getting each one a little farther. See https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/
What it looks like you're doing is trying to get JavaScript to run multiple threads simultaniously. Short answer: can't ... or rather it's really hard. See is it possible to achieve multithreading in nodejs?
So how would we do it anyway? You're on the right track: child_process.fork(). But it needs a hard-coded function to run. So how do we get user-generated code into place?
I envision a datastore where you can take userFn.ToString() and save it to a queue. Then fork the process, and let it pick up the next unhandled thing in the queue, marking that it did so. Then write to another queue the results, and this "GUI" thread then polls against that queue, returning the calculated results back to the user. At this point, you've got multi-threading ... and race conditions.
Another idea: create a REST service that accepts the userFn.ToString() content and execs it. Then in this module, you call out to the other "thread" (service), await the results, and return them.
Security: Yeah, we just flung this out the window. Whether you're executing the user's function directly, calling child_process#fork to do it, or shimming it through a service, you're trusting untrusted code. Sadly, there's really no way around this.
Assuming that security isn't an issue you could do something like this.
// Client side
<input class="func1"> // For example user inputs '(gamestate)=>{return 1}'
<input class="func2">
<input class="func3">
<input class="func4">
<script>
socket.on('syntax_error',function(err){alert(err)});
submit_funcs_strs(){
// Get function strings from user input and then put into array
socket.emit('functions',[document.getElementById('func1').value,document.getElementById('func2').value,...
}
</script>
// Server side
// Socket listener is async
socket.on('functions',(funcs_strs)=>{
let funcs = []
for (let i = 0; i < funcs_str.length;i++){
try {
funcs.push(eval(funcs_strs));
} catch (e) {
if (e instanceof SyntaxError) {
socket.emit('syntax_error',e.message);
return;
}
}
}
// Run algorithm here
}
I am new to databases and these MongoDB/Mongoose.js async functions have annoyed the hell out of me over the last few hours. I have written and rewritten this bit so many times:
router.get('/districts', function(req, res) {
districtNames = [];
// I'm using mongoose-simpledb, so db.District refers to the districts collection
db.District.find(function(err, found) {
found.forEach(function(element) {
findParentProv(element, districtNames);
});
res.render('districts', {title: "Districts page", district_list: districtNames});
})
});
function findParentProv(element, namesArray) {
db.Province.findById(element.parent, function(err, found) {
console.log(found.name);
namesArray.push(element.name + " " + found.name);
});
}
I want to get all items in the districts collection, follow their parent field (which contains an ObjectID), find that item from the provinces collection and push both their names as a string into districtNames.
How should I do this?
Well, you do seem to be on the right track.
The one major issue I recognize in your solution is that after kicking off all the async queries for parents, you immediately return the (most likely empty) districtNames array, without waiting for the queries to finish.
This is indeed very annoying, and not surprisingly so. MongoDB is a non-relational DB, and so join operations like what you're trying to do aren't easy to get right.
The solution that would probably require the least fundamental changes to what you're doing would be to wait on all the queries before calling res.render. The most basic way to do this would be to check the length of namesArray/districtNames after pushing each element, and once you see it's gotten to the desired size, only then calling render. There are, however, more standardized ways of doing this, and I'd suggest looking into something like Async (specifically async.parallel) or a Promise framework such as Bluebird.
Now, another approach to solving this problem is de-normalizing the data. For someone with a relational background this probably sound appalling, but in Mongo it might actually be a valid solution to just include the province names along with their IDs in the districts collection, in which case your one initial query should be sufficient.
Another approach, which might be suitable if you're dealing with relatively small collections, would be to run 2 queries, 1 for all the districts and 1 for all the provinces, and do the correlation in-app. Obviously, this isn't a very efficient solution, and should definitely be avoided if there's any chance the collections contain, or will in the future contain, more than a handful of objects.
Best bet moving forward is to use ES6 Promise patterns to help with your callback patterns..
suggested modules:
lodash [optional] has a lot of useful methods, not needed here, but you may need, for example _.flatten, or _.assign
i-promise will give you a native Promise (node 0.11.3+) or a scripted implementation
es6-promise is the fallback for i-promise to use
promisify-patch is an inline promisify for specific methods.
Install the modules required for your use (in this example).
npm install --save es6-promise i-promise promisify-patch
Use Promise pattern with your example.
require('promisify-patch').patch();
var Promise = require('i-promise')
;
//returns a promise to resolve to your list for display
function getDistricts() {
//gets all of the db.District
return db.District.find.bind(db.District).promise()
//after districts retrieved
.then(function(districts){
//resolve an array of promises, will return an array of results
return Promise.all(districts.map(getDistrictProv)); //map each district via getDistrictProv
});
}
//returns a promise to resolve a specific district/province name
function getDistrictProv(district){
return db.Provice.findById.bind(db.Province).promise(element.parent)
.then(function(province){
return district.name + ' ' + province.name;
});
}
...
//express handler
router.get('/districts', function(req, res, next) {
//get the district names
getDistricts()
//then process the rendering with the names
.then(function(names){
res.render('districts', {title: "Districts page", district_list: names});
})
//if there was an error in the promise chain
// pass it along, so it can be handled by another express plugin
.catch(next)
});
Disclosure: I made i-promise and promisify-patch to make situations like this easier to convert node-style callbacks into promise chains.
NOTE: If you are creating general purpose libraries for Node or the Browser that are not flow-control related, you should at least implement the node-style callback implementation.
Further, you may wish to look into co, koa for using generators as well.
The question seemed to be how to control the flow of data, in which promises are likely the best answer. If your issue is trying to fit non-relational data into a relational box or vice-versa, may want to re-evaluate your data structure as well...
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3
You should probably have some key data for parents/children replicated to those affected documents in other collections. There are configuration options via Mongoose to support this, but that doesn't mean you should avoid the consideration.
If you do many/large join operations like this it will negatively affect your performance. This isn't meant to be a religious comment only that MongoDB/SQL or other SQL vs. NoSQL considerations should be made depending on your actual needs.
The data in question seems to be highly cacheable data that may well be better with a relational/sql database.
The problem is simple: fetch rows from a database and pass them to an interface. F.e. one implementation of this interface will write this data to an XML file.
I'm looking for a pattern so:
the interface only has one method instead of 3: beginWrite / write / endWrite
it shouldn't fetch all rows at once, but instead 'feed' the interface row-by-row.
I don't want to pass the mongodb cursor to the interface, because interface implementation should not rely on a specific database driver.
Interface
function IBackend(implementation){
// removed code that merges implementation with this interface
// because it is irrelevant to this question.
}
IBackend.prototype.beginWrite = function(callback) {};
IBackend.prototype.write = function(row, callback) {};
IBackend.prototype.endWrite = function(callback) {};
Idea
Something I came up with was to only define one function:
IBackend.prototype.writeAll = function(callback) {};
then the implementation of this interface calls the callback passing a writeOne and end callback as arguments so it can be used as:
backend.writeAll(function(writeOneCallback, endCallback) {
collection.find().each(function(err, doc){
if (err) throw new Error(err);
writeOneCallback(doc);
});
endCallback();
});
But then, the passing of the writeOne and end callback depends on the implementation and isn't specified in the interface. So if anyone knows an elegant solution for this, I would love to hear it :)
The first thing that comes to mind is streams. While it has more than method, and as such doesn't answer your question, it is:
Proven
Seems appropriate
Plays nicely with many other parts of the node.js ecosystem
Doesn't need all rows at once, e.g.
There are also many streams already available that will reduce your development time, and many other streams you can pipe your stream's output to that will also reduce dev time (xml, zip, send over http, etc.).
See the excellent Streams Handbook (https://github.com/substack/stream-handbook) for more, and examples of such "other streams".
If you are still not convinced, I can think of only 2 scenarios:
Your interface also guarantees that the implementation does not need to be "flushed" (that is: does not maintain a state between calls to write() that will eventually need to be summarized, flushed, output, etc. Examples of such states: buffers, stats that will be appended/prepended to the output, ...).
Your interface does not guarantee that.
If #1 above is correct, then simply have write(), that accepts either an array of rows of a single row, and does the processing immediately. The guarantee by #1 above implies that at any given moment, the output from your interface constitutes a valid, coherent state.
If #2 above is correct (as would be the example for outputting an XML file), then I don't see a way around at least write() and end().
I'm creating some algorithms that are very performance heavy, e.g. evolutionary and artificial intelligence. What matters to me is that my update function gets called often (precision), and I just can't get setInterval to update faster than once per millisecond.
Initially I wanted to just use a while loop, but I'm not sure that those kinds of blocking loops are a viable solution in the Node.js environment. Will Socket.io's socket.on("id", cb) work if I run into an "infinite" loop? Does my code somehow need to return to Node.js to let it check for all the events, or is that done automatically?
And last (but not least), if while loops will indeed block my code, what is another solution to getting really low delta-times between my update functions? I think threads could help, but I doubt that they're possible, my Socket.io server and other classes need to somehow communicate, and by "other classes" I mean the main World class, which has an update method that needs to get called and does the heavy lifting, and a getInfo method that is used by my server. I feel like most of the time the program is just sitting there, waiting for the interval to fire, wasting time instead of doing calculations...
Also, I'd like to know if Node.js is even suited for these sorts of tasks.
You can execute havy algorithms in separate thread using child_process.fork and wait results in main thread via child.on('message', function (message) { });
app.js
var child_process = require('child_process');
var child = child_process.fork('./heavy.js', [ 'some', 'argv', 'params' ]);
child.on('message', function(message) {
// heavy results here
});
heavy.js
while (true) {
if (Math.random() < 0.001) {
process.send({ result: 'wow!' });
}
}
I have a dojo.store.Memory wrapped in a dojo.data.ObjectStore which I am then plugging into a dataGrid. I want to delete an item from the store and have the grid update. I have tried every combonation I can think of with no success. For example:
var combinedStore = new dojo.data.ObjectStore({objectStore: new dojo.store.Memory({data: combinedItems})});
combinedStore.fetch({query:{id: 'itemId'}, onComplete: function (items) {
var item = items[0];
combinedStore.deleteItem(item);
combinedGrid.setStore(combinedStore);
}});
combinedGrid.setStructure(gridLayout);
This throws no errors but combinedStore.objectStore.data still has the item that was meant to be deleted and the grid still displays the item. (The also seems to be a complete mismatch between combinedStore.objectStore.data and combinedStore.objectStore.index);
There's a simple solution, luckily! The delete is successfully happening, however, you need to save the ObjectStore after the deletion for it to be committed.
Change your code to look like this:
onComplete: function (items) {
var item = items[0];
combinedStore.deleteItem(item);
combinedStore.save();
combinedGrid.setStore(combinedStore);
}
That little save should do the trick. (Please note: the save must occur after the deleteItem - if you put it outside the fetch block, do to being asynchronous, it will actually happen before the onComplete!)
Working example: http://pastehtml.com/view/b34z5j2bc.html (Check your console for results.)
This does seem rather poorly documented at present in the new dojo.store documentation.
The old dojo.data.api.Write documentation make it fairly clear. An excerpt from http://dojotoolkit.org/reference-guide/dojo/data/api/Write.html:
Datastores that implement the Write interface act as a two-phase
intermediary between the client and the ultimate provider or service
that handles the data. This allows for the batching of operations,
such as creating a set of new items and then saving them all back to
the persistent store with one function call.
The save API is defined as asynchronous. This is because most
datastores will be talking to a server and not all I/O methods for
server communication can perform synchronous operations.
Datastores track all newItem, deleteItem, and setAttribute calls on
items so that the store can both save the items to the persistent
store in one chunk and have the ability to revert out all the current
changes and return to a pristine (unmodified) data set.
Revert should only revert the store items on the client side back to
the point the last save was called.
dojo.store has evolved from dojo.data and seems to follow many of its behavioral aspects.
The new dojo.store documentation http://www.sitepen.com/blog/2011/02/15/dojo-object-stores/ and http://www.sitepen.com/blog/2011/02/15/dojo-object-stores/ manages to talk specifically about the delete operation without mentioning having to call save() (in fact I can't find the word 'save' on that page at all).
I'm staying away from dojo.store as long as possible, hopefully it will be easier to follow in 1.7 or later, whenever I'm forced to use it for real :)