What is good way to store big JSON objects in couchDb? - javascript

I work on a web app which store projects data. Data are saved in a couchDb database A. The app pull and push data with a local pouchDb database B, which is sync with A.
So the app can also work offline. When user has connection back, changes made on localDb B during offline time are sent to A using a classic replication.
I store 1 document per project in couchDb, it is a big JSON object with lot of data (project todos, collaborators, advancements, risks, problems, etc...).
It is working like a charm, but I have some problems, and it seems I use pouchDb in wrong way. Situation example:
User A is offline and he adds a todo on project 1.
User B is online and he adds a new collaborator on project 1.
User B changes are pushed to couchDb by the automatic sync.
The project 1 _rev has been incremented.
User B pulls its own changes from couchDb, because the app downloads all documents on any couchDb changes detected. Weird... Idk how to prevent that. But the app still work fine so it's not a big problem.
User A gets its connection back.
User A changes are ignored because of older _rev. But the user did a modification on a different project property, can couchDb detect that himself and merge with newer _rev ?
I clearly see my problem is I'm using 1 document per project. I could use thousands documents to store each properties of each project and my problem woudn't happens, but it seems quite weird: To retrieve all data of a project I would fully scan my database, check document type (collaborator, todos, ...?), and check if the document is linked to the project by adding a new _projectId property to any document.
Currently I just have to request one document, which contains all project data, then I manipulate my JSON easily. It's quite convenient to handle.
How to manage this ? A project may contains averagely 10 to 10 000 properties that multiple users can edit being online or offline.

But the user did a modification on a different project property, can couchDb detect that himself and merge with newer _rev ?
PouchDB/CouchDB conflict handling is described in the PouchDB guide: http://pouchdb.com/guides/conflicts.html
the app downloads all documents on any couchDb changes detected. Weird... Idk how to prevent that.
This is standard PouchDB/CouchDB behavior - you asked it to sync the whole database, so it synced the whole database. :) You can prevent it by using filtered-replication: http://pouchdb.com/api.html#filtered-replication.
How to manage this ? A project may contains averagely 10 to 10 000 properties that multiple users can edit being online or offline.
It really really depends on your data, how frequently it may change, what the unique identifier of a single "property" is... Storing 10,000 separate documents in PouchDB/CouchDB is not a crazy idea, though, and may help you out when it comes to conflicts, since only those individual documents can ever be in conflict.
In general, I'd recommend you read the guide to conflict resolution as described above and review your options. There's also a plugin that may help you with conflict resolution: https://github.com/jo/pouch-resolve-conflicts

Related

How to store an object that is retrievable from the client-side in Javascript

Beginner question: I built a simple draggable to-do list that caches the state in a single object (tasks, containers and index) - currently, it's storing it in local storage. I am working on the server side using express and node.js, but I am confused as to where I would simply store the object. Would a database like mongodb be a good choice...or is there an even simpler option? I assume I can keep the project static and have the server side just receive and serve up JSON? Thanks!
If you plan to integrate it with backend server, it is actually a good idea to store the object in a database. The benefit is, you can still maintain the state of your to-do-list no matter on which machine you are logging in. If you access your to-do-list app from the browser of your smartphone or desktop, they both still point to a single source of truth, which is your database. Think of it as a Trello board that is in-sync on every device. In your database, you may record the task status, task ID, description, etc. If you want to go further, you can group this information per user, so every user will have their own to-do-list. (which is not possible if you rely on conventional local storage). With database, you can extend the functionality beyond simple to-do-list. Alternatively, you may consider a much simpler solution by recording the object as JSON file and storing it in your server. This solution is feasible albeit limited flexibility.
I would recommend MongoDB Atlas and Firebase Realtime Database as both are beginner friendly and easy to use. Both are free-of-charge on limited usage and hosted in the cloud.

How to work with databases in pouchdb

I'm making a list of tasks to learn how to use PouchDB / CouchDB, the application is quite simple, would have authentication and the user would create their tasks.
My question is regarding how to store each user's information in the database. Should I create a database for each user with their tasks? Or is there a way to put all of the tasks of all users into a database called "Tasks" and somehow filter the synchronization so that PouchDB does not synchronize the whole database (including other users' tasks) that is in the server?
(I have read the pouchdb documentation a few times and I have not been able to define this, if it is documented, please inform me where.)
You can use both approaches to fulfill your use case:
Database per user
A database per user, is the db-per-user pattern in CouchDB. CouchDB can handle the database creation/deletion each time a user is created/deleted in CouchDB. In this case each PouchDB client will replicate the complete user database.
You can enable it in the server config
This is a proper approach if the users data is isolated and you don't need to share information between users. In this case you can have some scalability issues if you need you sync many user databases with another one in CouchDB. See this post.
Single database for every user
You need to use the filtered-replication feature in CouchDB/PouchDB. This post explains how to use it.
With this approach you can replicate a subset of the CouchDB database in PouchDB
As you have a single database is easier to share info between users
But, this approach has some performance problems. The filtering process is very inefficient. As it has to process the whole dataset, including the deleted documents to determine the set of documents to be included in the replication. This filtering is done in a couchdb external process in the server which add more cost to the process.
If you need to use the filtering approach it is better to use a Mango Selector for this purpose as it is evaluated in the CouchDB main process and it could be indexed. See options.selector in the PouchDB replication filtering options.
Conclusion
Which is better? depends on your use case... In any case you should consider the scalability issues in both cases:
In the case of filtered replication, you will face some issues as the number of documents grow if you have to filter the complete dataset. This is reported to be 10x faster when using mango selectors.
In the case of db-per-user, you will have some issues if you need to consolidate the different user databases in a single one when the number of users grow.
Both pattern are valid. The only difference is that in order to use the filtered replication, you need to provide access to the main database.
Since it's in javascript, it's easy to get credentials and then access the main database. This would give users the ability to see everyone's data.
A more secure approach would be to use a database-per-user pattern. Each database will be protected by the user's credentials.

Improve client-server data sync functionality with deltas

The app
I have a web app that currently uses AppCache for offline functionality since users of the system need to create documents offline. The document is first created offline and when internet access is available, the user can click "sync" which will send the document to the server and save it as a revision. To be more specific, the app does not save the change delta as a revision (the exact field modified) but rather the whole document in its entirety. So in other words, a "snapshot" document is saved.
The problem
Users can login from different browsers and devices and work on their documents. When they click "sync", if the server's document is newer, the entire client's version will be overridden by the server's. This leads to one main issue that is depicted in the image below.
The scenario above occurs because of the current implementation which does not rely on deltas (small changes) and rather relies on snapshot revisions.
Some questions
1) My research indicates that I should be upgrading the "sync" mechanism to be expressed in deltas (small changes that can be applied independently). Is this a sound approach?
2) Should each delta be applied independently?
2) According to my research, revision deltas have a numeric value and not a timestamp. What should the value for this be exactly? How would I ensure both the server and the client agree on what the revision number should be?
Stack information
Angular on the frontend
IndexedDB to save documents locally (offline mode)
Postgres DB with JSONB in the backend
What your describing is a version control issue like in this question. The choice is yours with how to resolve. Here are a few examples of other products with this problem:
Google docs: A makes edit offline, B makes edit online, A goes online, Sync, Google Docs combines A and B's edits
Apple notes: Same as Google Docs
Git/Subversion: Throw an error, ask user to resolve conflicts
Wunderlist: Last edit overwrites previous
For your case, this simplest solution is to use Wunderlist's approach, but it seems that may cause a usability issue. What do your users expect to happen?
Answering your questions directly:
A custom sync implementation is necessary if you don't want overwrites.
This is a usability decision, what does the user expect?
True, revisions are numeric (e.g r1, r2). To get server agreement, alter the return value of the last sync request. You can return the entire model to the client each time (or just a 200 OK if a normal sync happened). If a model is returned to the client, update the client with the latest model.
In any case, the server should always be the source of truth. This post provides some good advice on server/mobile referential integrity:
To track inserts you need a Created timestamp ... To track updates you need to track a LastUpdate timestamp on your rows ... To track deletes you need a tombstone table.
Note that when you do a sync, you need to check the time offset between the server and the mobile device, and you need to have a method for resolving conflicts. Inserts are no big deal (they shouldn't conflict), but updates could conflict, and a delete could conflict with an update.

How do I give users imidiate feedback in a CQRS web application

I have a CQRS application with eventual consistency between the event store and the read model. In it I have a list of items and under the list a "Create new" button. When a user successfully creates a new item he is directed back to the list but since the read model has not been updated yet (eventual consistency) the item is missing in the list.
I want to fake the entry in the list until the read model has been updated.
How do I best do that and how do I remove it when the new item is present in the actual list? I expect delays of about 60 seconds for the read model to catch up.
I do realize that there are simpler ways to achieve this behavior without CQRS but the rest of the application really benefits from CQRS.
If it matters the application is a c# mvc4 application. I've been thinking of solutions involving HTML5 Web Storage but want to know what the best practice is for solving this kind of problem.
In this situation, you can present the result in the UI with total confidence. There is no difference in presenting this information directly and reading it from the read model.
Your domain objects are up to date with the UI and that's what really matters here. Moreover, if you valid correctly your AR state in every operation and you keep track of the concurrency with the AR's version then you're safe and your model will be protected against invalid operations.
At the end, what are the probability of your UI going out of sync? This can happen if you there are many users modifying the information you're displaying at the same time. This can be avoided by creating task based UI and following the rule 'one command/operation in the AR per request'.
The read model can be unsynced until the denormalizers do their job.
In the other hand, if the command will generate a conversation (long running operation) between a saga and AR's then you cannot do this and must warn the user about it.
It doesn't matter that's a asp.net mvc app. The only solution I see, besides just telling the user to wait a bit, is to have another but this time synchronous event handler that generate the same model (of course the actual model generation should be encapsulated in a service) and sends it to a memory cache.
Being everything in memory makes it very fast and being synchronous means it's automatically executed before the request ends. I'm assuming the command is executed syncronously too.
Then in your query repository you also consider results from cache, removing it if that result is already returned by the db.
Personally, for things that I know I want to be available to the user and where the read model generation is trivial, I would use only synchronous event handlers. The user doesn't mind waiting a few seconds when submitting something and if updating a read model takes a few seconds, you know you have a backend problem.
I see that eventual consistency is applicable to application only if application environment has multiple front-end servers hosting application and all these servers has own copy of read model. All servers uses same copy of event store.
When something is changed to event store, read model that is used to read result to user must be updated in sync with event store. Rest of servers and read models managed by them can be updated using eventual consistency.
This way result to user (list of items) can be read from local read model copy because it is already updated in sync. No need for special complex fake updates/rollbacks.
Only case when user can see incomplete list is that user hits F5 to refresh list after update change and load balancing directs user request to front-end server which read model is not yet updated (60 second delay), but this can be avoided so that load balancing does not change users server in middle of session.
So, if application has only one front-end server, eventual consistency is not very usable or it does not give any benefits without some special fake updates/rollbacks with read model...

Firebase offline cache & original firebase.js source code

My question is a follow-up to this topic. I love the simplicity and performance of Firebase from what I have seen so far.
As I understand, firebase.js syncs data snapshots from the server into an object in Javascript memory. However there is currently no functionality to cache this data to disk.
As a result:
Applications are required to have a connection when they start-up, thus there is no true offline access.
Bandwidth is wasted every time an app starts up by re-transmitting all previous data.
Since the snapshot data is sitting in memory as a Javascript object, it should be quite trivial to serialize it as JSON and save it to localStorage, so the exact application state can be loaded next time the app is started, online or not. But as the firebase.js code is minified and cryptic I have no idea where to look.
PouchDB handles this very well on a CouchDB backend. (But it lacks the quick response time and simplicity of Firebase.)
So my questions are:
1. What data would I need to serialize to save a snapshot to localStorage? How can I then load this back into Firebase when the app starts?
2. Where can I download the original non-minified dev source code for firebase.js?
(By the way, two features that would help Firebase blow the competition out of the water: offline caching and map reduce.)
Offline caching and map reduce-like functionality are both in development. The firebase.js source is available here for dev and debugging.
You can serialize a snapshot locally using exportVal to preserve all priority data. If you aren't using priorities, a simple value will do:
var fb = new Firebase(URL);
fb.once('value', function(snapshot) {
console.log('values with priorities', snapshot.exportVal());
console.log('values without priorities', snapshot.val());
});
Later, if Firebase is offline (use .info/connected to help determine this) when your app is loaded, you can call .set() to put that data back into the local Firebase. When/if Firebase comes online, it will be synced.
However, this is truly only suitable for static data that only one person will access and change. Consider, for example, the fallout if I download the data, keep it locally for a week, and it's modified by several other users during that time, then I load my app offline, make one minor change, and then come online. My stale changes would blow away all the work done in between.
There are lots of ways to deal with this--conflict resolution, using security rules and update counters/timestamps to detect stale data and prevent regressions--but this isn't a simple affair and needs deep consideration before you head down this route.

Categories

Resources