Automatic persistance of node.js objects in database

Automatic persistance of node.js objects in database - javascript

While experimenting with some data indexing using node.js objects (arrays, maps...) that takes some time to populate (from DB data) at every startup of the script, I wished my node.js objects could be automatically and transparently persisted in my database.
Having used MongoDB from node.js as well as other databases (including SQL-based) for some time now, I'm quite aware that the query/update mechanisms are very different between javascript objects (synchronous access, no queries) and database records (asynchronous access, queries, ...). However, I'm still hoping that a solution to make a javascript var persisted, at least for indices, can exist and be helpful.
Basically, I'm thinking of something like HTML5's LocalStorage, but for node.js servers.
Do you think this idea is interesting, feasible, or maybe it already exists?
EDIT: Work in progress: https://github.com/adrienjoly/persistent-harmony

A first thing to make clear, is that databases serve two purposes: persistence and convenient/efficient querying.
If you only need the first because you absolutely know up front that no other program is going to access your persistent data, you could take a look at the literature on orthogonal persistence, which is exactly the concept that you're suggesting here. An example is the KEN protocol that was successfully implemented in the WaterKen Java server.
There is some work to integrate the protocol into Google's V8 JavaScript runtime, which could lead to Nodeken, a Node.js with orthogonal persistence.
One of the difficulties of getting orthogonal persistence right is to map transactional semantics to a f.e. object-oriented programming system. The approach taken by V8-ken is to treat a single event loop execution of your JavaScript runtime as a transaction. In other words, the state of the virtual machine is persisted at the end of each "turn" in response of some event (incoming web request, server reply, user interface event, all asynchronous operations (IO), etc.). This however requires a modified runtime such as V8-ken, but evolutions in ECMAScript, such as proxies, look promising to be able to implement such features more conveniently.
In many cases (think web applications) though, persistent data needs to be accessible by different programs, requiring a "real" database for data to be easily exported, migrated, queried, etc.
Hybrid approaches could of course be possible...

%> npm search persistent storage
closet JSON persistent storage with methods chainability and callbacks for asynchronous use. =ganglio 2013-01-29 18:41 0.0.7 json persistent storag
ewdDOM Persistent lightweight DOM using Mumps Global Storage =robtweed 2013-02-02 14:39 0.0.4
fs-persistent-object Tiny Node library for persisting small runtime objects on filesystem =oleksiyk 2013-04-09 09:13 0.0.1 persistent tiny storage
headstorage A persistent storage for Node.js =headhsu2568 2012-11-20 13:41 0.0.0 storage
level-store A streaming storage engine based on LevelDB. =juliangruber 2013-06-21 19:55 3.3.2 leveldb levelup stream persistent
node-persist Super-easy (and fast) persistent data structures in Node.js, modeled after HTML5 localStorage =benmonro 2013-04-09 17:33 0.0.1 node persist
persistent-hash-trie Pure string:val storage, using structural sharing =hughfdjackson 2013-05-24 19:24 0.4.1 persistent hash trie pure functional d
perstore Perstore is a cross-platform JavaScript object store interface for mapping persistent objects to various different storage mediums using an in
shelf.js A modular, powerful wrapper library for persistent objects in the browser and Node.js =shakty 2013-05-24 08:10 0.4.7 persistance localStorag
stay Persistent scuttlebutt instances for browser and node =juliangruber 2012-12-11 21:54 0.1.0 persistent scuttlebutt persistence loc
Looks like the closest match would be node-persist
=)
EDIT: Here may be a better alternative solution...
#adrienjoly You know prototyping is still fairly high level and may not be (in the long run) as efficient as you are thinking.
You may be better off developing a module in C/C++ exposing a high level API for node.js to take advantage of.
I think I have a post about getting your feet wet with this type of node.js development (it stemmed from an original tutorial I followed here)
I do believe that method is however outdated and a newer method is to use the node-gyp tool. Some additional resources and examples: node-gyp projects, uRSA (I have a small pull request with this one), bcrypt etc..
My assumption in this is that you may bind the module extension to a db api such as oracle or postgres etc., and by writing a low level module linking to a low level API while exposing a high level API for developers to implement the persistent configuration options with API calls for slicing, indices, etc the performance would be optimal vs. trying to have node.js interpret your prototyping shim

Maybe this is what you're looking for?
https://github.com/yangli1990/flydb
or
npm install flydb
Now in javascript
var flydb = require('flydb');
flydb.test = "hello world"; //flydb.test will now persist
To run this
node --harmony-proxies <your commands>
e.g
node --harmony-proxies app

Related

Is memcached necessary?

I've recently used memcached and faced a question. Why we can't just use a dictionary from language instead of Memcached (e.g. Set() or Map() in javascript). Because in fact, Memcached is just a dictionary for temporary data. It seems to me that we violate the single responsibility principle but I don't understand how to form a fully correct answer.

Using memcached or a language construct depends on what you are trying to solve and the context of your application.
If you can share memory using a language construct (like Set or Map) throughout the entire application and the amount of data to cache fits well within the limits of a single process, it's hard to justify the cost of doing a network call to fetch cache data + the overhead of maintaining a memcache server.
On the other hand, if you can't share memory throughout your entire application (e.g. imagine many servers / isolated processes), or the amount of data to cache is greater than what you'd like in a process, then using a cache server (like memcached) makes more sense.
These are not mutually exclusive, applications can leverage both (cache server + in-memory) to achieve multi-level cache. Check out this post about how Stack Overflow does it.
The only case where I can see an application's local cache violating the SRP, is if it stores data unrelated to the application and makes it available to others.

Options for updating/migrating remote client standalone db schema during updates

Scenario:
I am currently working on a desktop-based application involving multiple independent production sites, each having its own server, database and local clients.
The software undergoes continuous automated updates, as part of these updates each production site's database (postgresql) will need to undergo schema migrations (while maintaining persistence/transformation on the existing data during the migration).
Accessing the client machines manually (e.g. SSH / Remote Desktop) to run scripts/perform the update is not an option. The client would need to be self-sufficient in checking, downloading and installing the updates.
Problem:
I have been looking around for a few months and have not yet suitable option apart from writing SQL scripts and using a separate service to run it one by one or using the updated app to run it.
Question:
What tools would you suggest would be suitable for this scenario to ease the migration process?

It is a generally an accepted practice to run SQL scripts when doing database migrations. Tools like ActiveRecord (in the ruby world) provide an intermediate layer (a DSL), but are essentially wrappers around SQL.
There is a good reason for that - a migration is what is being done on the SQL server and the only strictly defined interface for manipulating tables in SQL servers is the PL/SQL language language flavor that the particular RDBMS server is using.
This means, that if SQL scripts are an acceptable option for migrations (acceptable in the sense that the team knows it, nobody objects to this, etc.) you should use them.
To make the task easier (the way ruby does it) you could take a look at: http://migrate4j.sourceforge.net/
They describe the tool as The initial intent of migrate4j was to make a Java version of Ruby's db:migrate. But this is not a fundamental requirement.

Does Node.js or another JS server tech support native shared memory between requests without serialization?

When I search for node.js shared memory, I find only third party solutions that rely on inter-process communication, OS memory mapped files, third party caching servers, and/or string serialization. These are less efficient methods compared to accessing native objects directly inside the same process.
To clarify some more, I'd prefer shared memory that works this way:
node.js would have some kind of global object scope that every
request can see.
The size of the global scope would only be limited by system memory
size.
Node would need to provide "named locks" or some other method of
locking. (I think node.js doesn't need this unless you use workers)
The solution must be written in pure Javascript (no "C" add-on)
There is no serialization of the stored object.
I'm primarily a ColdFusion/CFML developer. All I have to do is this to share native objects between requests.
application.myVar={ complex: "object", arr:["anything", "i", "want"] };
I already wrote an application that stores massive lookup tables, partial html, and more in memory using ColdFusion. There is no I/O or interprocess communication for most requests. Can this be done in node.js?

Node.js is already a single threaded server, so any request would be handled within the same scope.
And there are already libs just for that: In memory storage on nodejs server
If you need to scale you can move on to some other store later on (redis, mongodb etc).

What are the potential complexities / problems unique to a large NodeJS Application

I come from a "traditional web application" background: think Java, .NET, PHP, ColdFusion, etc.
In evaluating NodeJS for use as the primary server-side technology for non-trivial applications, I'm wondering what complexities, problems, challenges a team of developers and operations folks might expect to face which are unique to NodeJS. In short, I'd like to reduce my unknowns. Some (not all) examples:
How well does it lend itself to large-team development? What unique challenges exist, for Node, in a team of 20 or 50 or 200 developers?
What unique challenges exist with respect to Database access? "Enterprisey" data access concerns are handled mostly trivially in Java (connection pools, security, etc via Spring). Is this the case with Node?
Reporting-heavy applications often require Excel, PDF, even PNG export... How does Node fare in this type of application?
Does Node present any unique challenges with respect to development-time debugging?
What unique operations challenges exist? Everything from server restarts / hot code swap / load balancing to tools for monitoring and managing a production cluster.
And so forth. What lessons exist for developing, maintaining, and production-managing a 100+K LoC codebase, deployed across a farm of servers, touched by dozens of developers?

I'll try to answer some of your questions
How well does it lend itself to large-team development? What unique challenges exist, for Node, in a team of 20 or 50 or 200 developers?
It does the same, as every other language does; nothing! Teams of programmers normally use versioning systems such as git, svn, mercurial etc. to deal with co-workers working on the same files
What unique challenges exist with respect to Database access? "Enterprisey" data access concerns are handled mostly trivially in Java (connection pools, security, etc via Spring). Is this the case with Node?
Node is Database agnostic. You can use any database with node where drivers/wrappers exist for it (same as with PHP). There are many drivers/wrappers for relational Databases (MySQL, SQLite) and NoSQL (MongoDB)
Reporting-heavy applications often require Excel, PDF, even PNG export... How does Node fare in this type of application?
Node can, as php and others do, access the normal shell. So if you handle an Image with php, you are using a wrapper for ImageMagick or GD. GD needs to be installed on the WebServer for it to work, because php sends the commands down to the command line. Same with node; find a wrapper (http://search.npmjs.org) and use the feature you desire.
Does Node present any unique challenges with respect to development-time debugging?
Node is Javascript, so it doesn't compile, but does JIT. So a failure will only be detected as soon as it is executed. You'd want a local env for every developer next to your staging/dev machine and live servers so they can test the code prior to commiting it to the staging server
What unique operations challenges exist? Everything from server restarts / hot code swap / load balancing to tools for monitoring and managing a production cluster.
No "Unique" challanges I am aware of. You'll deal with monitoring/heartbeats as with all other languages (yeah, there are languages that do this, like Erlang). You'll need a service like forever or supervisord to startup node on server restarts, but thats the same with Apache+PHP/Perl.
(Node has a Cluster module which helps you handle multiple workers
(for multi-core servers))
look at Git for managing your code. And choose the language based on what you want to be doing (high concurrency, scalability)

I will comment on the things for which I am qualified:
Connection Pooling to data sources.
The standard Node HTTP(S) server tools implement pooling by default, and there are knobs that you can configure to improve or control performance. The community being very active, there are lots of other projects (like poolee) that implement either general-purpose or specialized connection pooling. You will need to look around. In fact, given your traditional Webdev background...
1.1 Sidenote: Third-party libraries
When developing with Node, you may find yourself using a lot of third-party libraries. depending on your particular background, this may seem strange. To understand why, consider .NET or Java: the standard libraries for those platforms are gargantuan. As a result, when using those platforms you might pick very few third-party tools or plugins to help you get your work done. Node, by comparison, has an intentionally narrow and strictly controlled standard library. All the necessary APIs to unify "the way things are written" in the interest of server performance are there, but no more.
To help manage third-party libraries, the npm package manager was designed and included very early with node. It's known to be of high quality. The authors clearly anticipated a lot of third-party use.
Compute Power
You mentioned image export. Javascript libraries for all of these things exist, so as far as "it can be done easily", it can. Keep in mind that if you have a compute-heavy task, Javascript may not be the most effective language to implement the core computation. The v8 engine allows you to write modules in C, if you need to, but forwarding a request to a specialized backend server is the thing that Node does extremely well.
Operations Challenges
Node.js will not scale up to your hardware. If your server has multiple cores (which by now it most certainly does), you will need to run multiple server processes on the same physical hardware to achieve high utilization. This may make a different picture for operations: lots more processes than you would normally see, those processes groupable by physical or virtual machine.
Breaking your server into processes is not a bad thing for reliability, by the way: One large traditional process will crash on an unhandled exception, taking down with it eight (or whatever) cores worth of active sessions. Multiple Node processes will still crash individually, but with a proportionally smaller impact. There's probably room to explore the point, "how many processes is too many?", taking into account both performance and monitorability. It may actually be worth it to use more Node processes per server than there are available cores.

Is there a way to do multi-threaded coding in NodeJS?

Based on my understanding, only I/O in NodeJS is non-blocking. If we do, for example, lots of heavy math operations, other users cannot access to the server until it's done.
I was wondering if there is a non-blocking way to do heavy computation in NodeJS? Just curious.

If you have long-running calculations that you want to do with Node, then you will need to start up a separate process to handle those calculations. Generally this would be done by creating some number of separate worker processes and passing the calculations off to them. By doing this, you keep the main Node event loop unblocked.
On the implementation side of things, you have two main options.
The first being that you manually spin off child processes using Node's child-process API functions. This one is nice because your calculations wouldn't even have to be javascript. The child process running the calculations could even be C or something.
Alternatively, the Web Worker specification, has several implementations available through NPM if you search for 'worker'. I can't speak to how well these work since I haven't used any, but they are probably the way to go.
Update
I'd go with option one now. The current child process APIs support sending messages and objects between processes easily in a worker-like way, so there is little reason to use a separate worker module.

You can use Hook.io to run a separate node process for your heavy computation and communicate between the two. Hook.io is particularly useful because it has auto-healing meshes meaning that if one of your hooks (processes) crashes it can be restarted automatically.

Use multiple NodeJS instances and communicate over sockets.

Use multiple node instances and communicate over node-zeromq, HTTP, plain TCP sockets, IPC (e.g. unix domain sockets), JSON-RPC or other means. Or use the web workers API as suggested above.
The multiple instances approach has its advantages and disadvantages. Disadvantages are that there is a burden of starting those instances and implementing own exchange protocols. Advantages are that scaling to many computers (as opposed to many cores/processors within a single computer) is possible.

I think this is way to late, but this is an awesome feature of nodejs you've to know about.
The only way abot multithreading is by spawning new processes, right so far.
But nodejs has an implemented message feature between spawned node-forks.
http://nodejs.org/docs/latest/api/child_processes.html#child_process.fork
Great work from the devs, you can pass objects etc. as message to your childs and backwards

You can use node cluster module.
https://nodejs.org/api/cluster.html

I would use JXCore - it's a polished nodejs based engine which runs your code but has several options including the multi threading you are searching for. Running this in production is a charm!
Project's source: https://github.com/jxcore/jxcore
Features include:
Support for core Node.JS features
Embeddable Interface
Publish to Mobile Platforms (Android, iOS ..)
Supports Multiple JavaScript Engines
Multi-threading Capabilities
Process Configuration & Monitor
In-memory File System
Application Packaging
Support for the latest JavaScript features (ES6, ASM.JS ...)
Support for Universal Windows Platform (uwp) api

Develop Reference

JavaScript is the programming language of the Web.