I've recently used memcached and faced a question. Why we can't just use a dictionary from language instead of Memcached (e.g. Set() or Map() in javascript). Because in fact, Memcached is just a dictionary for temporary data. It seems to me that we violate the single responsibility principle but I don't understand how to form a fully correct answer.
Using memcached or a language construct depends on what you are trying to solve and the context of your application.
If you can share memory using a language construct (like Set or Map) throughout the entire application and the amount of data to cache fits well within the limits of a single process, it's hard to justify the cost of doing a network call to fetch cache data + the overhead of maintaining a memcache server.
On the other hand, if you can't share memory throughout your entire application (e.g. imagine many servers / isolated processes), or the amount of data to cache is greater than what you'd like in a process, then using a cache server (like memcached) makes more sense.
These are not mutually exclusive, applications can leverage both (cache server + in-memory) to achieve multi-level cache. Check out this post about how Stack Overflow does it.
The only case where I can see an application's local cache violating the SRP, is if it stores data unrelated to the application and makes it available to others.
Related
When I search for node.js shared memory, I find only third party solutions that rely on inter-process communication, OS memory mapped files, third party caching servers, and/or string serialization. These are less efficient methods compared to accessing native objects directly inside the same process.
To clarify some more, I'd prefer shared memory that works this way:
node.js would have some kind of global object scope that every
request can see.
The size of the global scope would only be limited by system memory
size.
Node would need to provide "named locks" or some other method of
locking. (I think node.js doesn't need this unless you use workers)
The solution must be written in pure Javascript (no "C" add-on)
There is no serialization of the stored object.
I'm primarily a ColdFusion/CFML developer. All I have to do is this to share native objects between requests.
application.myVar={ complex: "object", arr:["anything", "i", "want"] };
I already wrote an application that stores massive lookup tables, partial html, and more in memory using ColdFusion. There is no I/O or interprocess communication for most requests. Can this be done in node.js?
Node.js is already a single threaded server, so any request would be handled within the same scope.
And there are already libs just for that: In memory storage on nodejs server
If you need to scale you can move on to some other store later on (redis, mongodb etc).
I have questions regarding stored JavaScript Procedures. After reading the Blog Entry from PointBeing, I have some questions.
Is there an advantage to storing my code in the DB? I mean functions like lookups for documents, not adding numbers like the example from PointBeing.
Is MongoDB stored javascript faster than node.js javascript?
Is MongoDB stored javascript queries cached and are they any faster?
I'm interested in MongoDB stored javascript performance compared to Node.js Javascript.
Evaluating functions stored in db.system.js ("Stored procedures", when you would like to call them that) is deprecated. The articles on the db.eval shell function and the eval database command have a "Deprecated since version 3.0" warning and the article on server-sided javascript doesn't mention it anymore. So you should avoid using it. One reason is that you can not run a javascript function when you use sharding. So when you build an application which requires eval, you prevent it from scaling in the future. Another is that javascript functions undermine the permission concept. They always need to be run as admin, which makes it impossible to establish a sane permission system. This is especially problematic from a security standpoint considering that server-sided scripts which use user-provided data can potentially be vulnerable to arbitrary script injections.
The advantage of server-sided javascript is that it runs on the database server. This reduces latency between application server and database server when you need to perform a large number of queries. But you can get the same advantage by opening a mongo shell on the database server and executing it there.
The latency advantage is only relevant when you perform multiple queries from your script. When you have only one query, you will still have the latency when invoking the script. So you gain nothing except unnecessary complexity.
There is no additional caching or other optimization for server-sided javascript. Even worse: It will get reparsed and reinterpreted everytime you run it. So it might even be slower than javascript in your application server.
Further, many complex queries which would require script support to implement only with find() can often be expressed using aggregation which will in most cases be far faster than doing the same with find() and javascript because the aggregation framework is implemented in C++ and has access to the raw BSON documents.
The hilarious thing is that blog post ( http://pointbeing.net/weblog/2010/08/getting-started-with-stored-procedures-in-mongodb.html ) was written when JS only took single threaded global lock.
That means there was no con-currency features or more granular lock associated with it (the lock still being a problem and con-currency is only achieved through multiple isolates still). Just because you see it in some random blog post does not mean it should be used.
To answer your questions directly:
Nope. In fact the disadvantage is that the calling user needs full admin rights. This means you give every single privilege to your web user since the inbuilt JS enigne has hooks for everything, including administration functions as such it requires admin rights in order to run.
Calling JS from JS to JS to C++ in JS? No
No, MongoDB caching does not work like that. I recommend you read the fundamentals documentation: http://docs.mongodb.org/manual/faq/fundamentals/
While experimenting with some data indexing using node.js objects (arrays, maps...) that takes some time to populate (from DB data) at every startup of the script, I wished my node.js objects could be automatically and transparently persisted in my database.
Having used MongoDB from node.js as well as other databases (including SQL-based) for some time now, I'm quite aware that the query/update mechanisms are very different between javascript objects (synchronous access, no queries) and database records (asynchronous access, queries, ...). However, I'm still hoping that a solution to make a javascript var persisted, at least for indices, can exist and be helpful.
Basically, I'm thinking of something like HTML5's LocalStorage, but for node.js servers.
Do you think this idea is interesting, feasible, or maybe it already exists?
EDIT: Work in progress: https://github.com/adrienjoly/persistent-harmony
A first thing to make clear, is that databases serve two purposes: persistence and convenient/efficient querying.
If you only need the first because you absolutely know up front that no other program is going to access your persistent data, you could take a look at the literature on orthogonal persistence, which is exactly the concept that you're suggesting here. An example is the KEN protocol that was successfully implemented in the WaterKen Java server.
There is some work to integrate the protocol into Google's V8 JavaScript runtime, which could lead to Nodeken, a Node.js with orthogonal persistence.
One of the difficulties of getting orthogonal persistence right is to map transactional semantics to a f.e. object-oriented programming system. The approach taken by V8-ken is to treat a single event loop execution of your JavaScript runtime as a transaction. In other words, the state of the virtual machine is persisted at the end of each "turn" in response of some event (incoming web request, server reply, user interface event, all asynchronous operations (IO), etc.). This however requires a modified runtime such as V8-ken, but evolutions in ECMAScript, such as proxies, look promising to be able to implement such features more conveniently.
In many cases (think web applications) though, persistent data needs to be accessible by different programs, requiring a "real" database for data to be easily exported, migrated, queried, etc.
Hybrid approaches could of course be possible...
%> npm search persistent storage
closet JSON persistent storage with methods chainability and callbacks for asynchronous use. =ganglio 2013-01-29 18:41 0.0.7 json persistent storag
ewdDOM Persistent lightweight DOM using Mumps Global Storage =robtweed 2013-02-02 14:39 0.0.4
fs-persistent-object Tiny Node library for persisting small runtime objects on filesystem =oleksiyk 2013-04-09 09:13 0.0.1 persistent tiny storage
headstorage A persistent storage for Node.js =headhsu2568 2012-11-20 13:41 0.0.0 storage
level-store A streaming storage engine based on LevelDB. =juliangruber 2013-06-21 19:55 3.3.2 leveldb levelup stream persistent
node-persist Super-easy (and fast) persistent data structures in Node.js, modeled after HTML5 localStorage =benmonro 2013-04-09 17:33 0.0.1 node persist
persistent-hash-trie Pure string:val storage, using structural sharing =hughfdjackson 2013-05-24 19:24 0.4.1 persistent hash trie pure functional d
perstore Perstore is a cross-platform JavaScript object store interface for mapping persistent objects to various different storage mediums using an in
shelf.js A modular, powerful wrapper library for persistent objects in the browser and Node.js =shakty 2013-05-24 08:10 0.4.7 persistance localStorag
stay Persistent scuttlebutt instances for browser and node =juliangruber 2012-12-11 21:54 0.1.0 persistent scuttlebutt persistence loc
Looks like the closest match would be node-persist
=)
EDIT: Here may be a better alternative solution...
#adrienjoly You know prototyping is still fairly high level and may not be (in the long run) as efficient as you are thinking.
You may be better off developing a module in C/C++ exposing a high level API for node.js to take advantage of.
I think I have a post about getting your feet wet with this type of node.js development (it stemmed from an original tutorial I followed here)
I do believe that method is however outdated and a newer method is to use the node-gyp tool. Some additional resources and examples: node-gyp projects, uRSA (I have a small pull request with this one), bcrypt etc..
My assumption in this is that you may bind the module extension to a db api such as oracle or postgres etc., and by writing a low level module linking to a low level API while exposing a high level API for developers to implement the persistent configuration options with API calls for slicing, indices, etc the performance would be optimal vs. trying to have node.js interpret your prototyping shim
Maybe this is what you're looking for?
https://github.com/yangli1990/flydb
or
npm install flydb
Now in javascript
var flydb = require('flydb');
flydb.test = "hello world"; //flydb.test will now persist
To run this
node --harmony-proxies <your commands>
e.g
node --harmony-proxies app
How can I save the application state for a node.js Application that consists mostly of HTTP request?
I have a script in Node.JS that works with a RESTful API to import a large number (10,000+) of products into an E-Commerce application. The API has a limit on the amount of requests that can be made and we are staring to brush up against that limit. On a previous run the script exited with a Error: connect ETIMEDOUT probably due to exceeding API limits. I would like to be able to try connecting 5 times and if that fails resume after an hour when the limit has been restored.
It would also be beneficial to save the progress throughout in case of a crash (power goes down, network crashes etc). And be able to resume the script from the point it left off.
I know that Node.js operates as a giant event-queue, all http requests and their callbacks get put into that queue (together with some other events). This makes it a prime target for saving the state of current execution. Other pleasant (not totally necessary for this project) would be being able to distribute the work among several machines on different networks to increase throughput.
So is there an existing way to do it? A framework perhaps? Or do I need to implement this myself, in that case, any useful resources on how this can be done would be appreciated.
I'm not sure what you mean when you say
I know that Node.js operates as a giant event-queue, all http requests and their callbacks get put into that queue (together with some other events). This makes it a prime target for saving the state of current execution
Please feel free to comment or expound on this if you find it relevant to the answer.
That said, if you're simply looking for a persistence mechanism for this particular task, I might recommend Redis, for a few reasons:
It allows atomic operations on many data types; for example, if you had an entry in Redis called num_requests_made that represented the number of requests made, you could increment this number easily in Redis using INCR num_requests_made, and it's guaranteed to be atomic, making it easier to scale to multiple workers.
It has several data types that could prove useful for your needs; for example, a simple string could represent the number of API requests made during a certain period of time (as in the previous bullet point); you might store details on failed API request that need to be resubmitted in a list; etc.
It provides pub/sub mechanisms which would allow you to communicate easily between multiple instances of the program.
If this sounds interesting or useful and you're not already familiar with Redis, I highly recommend trying out the interactive tutorial, which introduces you to a few data types and commands for them. Another good piece of reading material is A fifteen minute introduction to Redis data types.
If your web application must run on embedded or low memory devices, is there any facility in JavaScript to manage low memory conditions at runtime so that you can use as much memory as possible for caching data, but be able to reliably purge such a cache as required?
An example would be an application that has a local logical data store, like a has map of data object that it uses rather than making new requests on the server repeatedly. I'd like to be able to fill that cache up to a watermark that can be determined at runtime in my JavaScript application.
I've not found anything thus far, but I'm hopeful I'm just missing something.
No. The browser doesn't expose memory usage statistics to JavaScript.
If you're trying to implement caching, you're probably better off leveraging the browser's cache (e.g, using Expires: headers on AJAX responses) rather than trying to implement your own cache in JS.
You can use localstorage or IndexDB to cache results for you (on most devices/browsers the actual size limits are quite good known), so there would not be any need for you to know the memory consumptions in javascript.
If you are using phonegap (http://www.phonegap.com) you can grap memory warnings easily within native code and dispatch them to javascript. I did that in several situations when it was possible to clear up the DOM and recalculate it later if needed.