I'm making a list of tasks to learn how to use PouchDB / CouchDB, the application is quite simple, would have authentication and the user would create their tasks.
My question is regarding how to store each user's information in the database. Should I create a database for each user with their tasks? Or is there a way to put all of the tasks of all users into a database called "Tasks" and somehow filter the synchronization so that PouchDB does not synchronize the whole database (including other users' tasks) that is in the server?
(I have read the pouchdb documentation a few times and I have not been able to define this, if it is documented, please inform me where.)
You can use both approaches to fulfill your use case:
Database per user
A database per user, is the db-per-user pattern in CouchDB. CouchDB can handle the database creation/deletion each time a user is created/deleted in CouchDB. In this case each PouchDB client will replicate the complete user database.
You can enable it in the server config
This is a proper approach if the users data is isolated and you don't need to share information between users. In this case you can have some scalability issues if you need you sync many user databases with another one in CouchDB. See this post.
Single database for every user
You need to use the filtered-replication feature in CouchDB/PouchDB. This post explains how to use it.
With this approach you can replicate a subset of the CouchDB database in PouchDB
As you have a single database is easier to share info between users
But, this approach has some performance problems. The filtering process is very inefficient. As it has to process the whole dataset, including the deleted documents to determine the set of documents to be included in the replication. This filtering is done in a couchdb external process in the server which add more cost to the process.
If you need to use the filtering approach it is better to use a Mango Selector for this purpose as it is evaluated in the CouchDB main process and it could be indexed. See options.selector in the PouchDB replication filtering options.
Conclusion
Which is better? depends on your use case... In any case you should consider the scalability issues in both cases:
In the case of filtered replication, you will face some issues as the number of documents grow if you have to filter the complete dataset. This is reported to be 10x faster when using mango selectors.
In the case of db-per-user, you will have some issues if you need to consolidate the different user databases in a single one when the number of users grow.
Both pattern are valid. The only difference is that in order to use the filtered replication, you need to provide access to the main database.
Since it's in javascript, it's easy to get credentials and then access the main database. This would give users the ability to see everyone's data.
A more secure approach would be to use a database-per-user pattern. Each database will be protected by the user's credentials.
Related
I have built a web application using AngularJS (front-end) and PHP/MySQL (back-end).
I was wondering if there is a way to "watch" the MySQL database (without Node.js), so if one user adds some data to it, the changes are synced to other users too.
E.g. I know Firebase does that, but it's object oriented database and I am unable to do the advanced queries there like I do with SQL.
I was thinking to use $interval and $http and do ajax requests, so that way I could detect changes in the database. Well, that's possible, but it'll then do thousands of http requests to the server everyday and plus interpret php on each request.
I believe nothing is impossible, I just need an idea to do this, which I don't have, so that's why I am asking for a help here.
If you want a form of "real-time communication" you'll likely have to incorporate some form of long-polling from the client. Unless you use web sockets, but that's a big post about a bunch of different things. You're right to be concerned about bandwidth and demand on the DB though. So here's my suggestion:
If you don't have experience with web sockets then log your events in a separate table/view and use the pub/sub method to subscribe entities to an event, and broadcast that event to the table. Then long-poll against the watcher view to see when changes may have occurred. If one did occur then you query for the exact value.
Another option would be to use some query system with "deciders" that hold messages. Take a look at Amazon's SQS platform for a better explanation of how this could work. Basically you have a queue that holds messages and a decider chooses where to store the message using some hash or sorting method (to reduce run time). When the client requests an update, the decider finds any messages that would apply based on the hash/sort and returns them. Then you just have to decide how and when to destruct the messages.
The second option would require a lot more tinkering though, so it's really about your preference. I think what you'll find the difficulty to be is that most solutions have to deal with the fact that the message has to be delivered 1 or More times and you'll need to track when someone received the message and if it can now be deleted from the queue/event table or if you still need to wait. Otherwise you'll consume a lot of memory.
I am developing a web application using NodeJS & SailsJS frameworks. Now I am going to develop searching functionality. There are around 5000 records from which I want to search on one attribute.
I know I can search it using mogodb query. What if I get all the records in javascript at frontend and search from it? What is good way to search? At backend using db query or at fronend using javascript searching?
If you search in the frontend then you have to load the entire dataset into the frontend and keep it synchronised for every query. This is not a good idea.
Use database queries - that is what they are designed for, and you only need to transfer the results.
It's all about your app and users expectations on it. You definitely shouldn't use client-side search if you have:
Short-living data which couldn't be cached (like list of users who are online).
Huge dataset which a) couldn't be cached or b) wouldn't be cached (most visitors woudn't use search). But the size limit depends on the app.
Complex computation intensive search (like full-text search).
In other cases it can work. And searching even millions of data records could run under 100 ms, what is faster than common network delay required to receive a response from server.
Advantages of client search:
fast: no network latency.
powerful queries: query can use all JS capabilities with engine optimization advantages.
Disadvantages:
load full dataset (critical on huge amounts of data).
require synchronization strategy: full reload, partial updates, CRDT, etc.
Do it in backend only using db query, which is good practice.It will reduce execution time.
Should not do this kind of check in client side as you have to send the whole database to client and loop through the records several times to fetch the desired records.
I am considering using node.js to build an API like service but am having trouble understanding whether to use a data structure vs. storing information in a database/text file.
Basically the program would allow for a user to come on line and collect that users geo-coded location. Then the service would store that information in either a javascript data structure or store it into a database or text file. Then another user would log on and I would connect them with a user who is closes to them.
My question is, if I have a datastructure (some sort of custom implemented sorted list based off of geo-codes) would all of that information be volatile and I would loose it if the program crashed?
Would it be more preferable to store the information in a text file or database even though the access and write of that information would take longer?
Also, if I was using the data structure approach, would that make it more difficult to scale the application if I needed to expand to additional servers?
Any thoughts?
My question is, if I have a datastructure (some sort of custom
implemented sorted list based off of geo-codes) would all of that
information be volatile and I would loose it if the program crashed?
Yes, it would be volatile and you would lose it if the program crashed. All Javascript data is kept in RAM.
Would it be more preferable to store the information in a text file or
database even though the access and write of that information would
take longer?
When exactly to save data to a persistent store is highly dependent upon the details of the situation. You could have only a disk store or you could have a RAM store that is periodically stored to disk or you could have a combination (a RAM cache in front of a persistent store). Using a database will generally do a lot of caching for you.
Also, if I was using the data structure approach, would that make it
more difficult to scale the application if I needed to expand to
additional servers?
If you want to share a live data store among multiple servers, then you have to use something other than just Javascript data stored in node.js memory. The usual solution is to go with some sort of external database which itself can be either in-memory (like Redis) or disk-store (like Mongo or Couchbase) which all the different servers can then access.
Progressing along with my isomorphic javascript crusade, I put Meteor on a hold while I played a bit more with the MEAN stack. To ward off any further procastination, I've decided to finish my original prototype community application. Now, my biggest issue with Meteor isn't reactivity, it's session/common data.
I know Meteor's native session system is based off of the reactive concept, and cookies don't "exist" because Meteor operates on "the wire". Though let's say I were building an application on the LAMP or mean STACK, and I was creating a user interface. I'd use cookies/sessions to control user activity. If Meteor operates off of reactivity, how do I maintain persistence?
I have searched through atmosphere for packages that fit my criteria, and I ran into a couple of packages that store "presistent sessions". Though these interfaces operate off of the client, not the server; hence my code would be exposed client, therefor setting the application up for exploitaton.
All that being stated, I know Meteor has it's standard user interface. What I'm trying to do here is understand Meteor, and gain experience for future endevours.
Meteor has a built-in login system that keeps track of the logged-in user, which is one of the main reasons people use cookies. If you want to store other data on the client in a persistent way, you can use the HTML5 localStorage API.
I think what you're referring to is that something like PHP lets you store data in a "SESSION" variable that is actually stored on the server, but persisted between different requests from the same client.
If this is what you are looking for, there are several approaches that will give you similar functionality:
Store data associated with a particular user, and use the userId that Meteor gives you to only publish it for that user (using Meteor.publish)
Have a randomly generated client ID that is stored in localStorage, and pass that in when calling subscriptions or methods to authenticate as that client. This will work in the case where the user is not logged in, and will give you a very similar result to cookies/session in PHP. You will still store the actual data in the database on the server, but you will know which data is associated with a particular client by the unique ID.
It's true that Meteor's Session variable is named in a way that can be confusing if you are coming from PHP where SESSION means something totally different.
Does this answer your question?
I work on a web app which store projects data. Data are saved in a couchDb database A. The app pull and push data with a local pouchDb database B, which is sync with A.
So the app can also work offline. When user has connection back, changes made on localDb B during offline time are sent to A using a classic replication.
I store 1 document per project in couchDb, it is a big JSON object with lot of data (project todos, collaborators, advancements, risks, problems, etc...).
It is working like a charm, but I have some problems, and it seems I use pouchDb in wrong way. Situation example:
User A is offline and he adds a todo on project 1.
User B is online and he adds a new collaborator on project 1.
User B changes are pushed to couchDb by the automatic sync.
The project 1 _rev has been incremented.
User B pulls its own changes from couchDb, because the app downloads all documents on any couchDb changes detected. Weird... Idk how to prevent that. But the app still work fine so it's not a big problem.
User A gets its connection back.
User A changes are ignored because of older _rev. But the user did a modification on a different project property, can couchDb detect that himself and merge with newer _rev ?
I clearly see my problem is I'm using 1 document per project. I could use thousands documents to store each properties of each project and my problem woudn't happens, but it seems quite weird: To retrieve all data of a project I would fully scan my database, check document type (collaborator, todos, ...?), and check if the document is linked to the project by adding a new _projectId property to any document.
Currently I just have to request one document, which contains all project data, then I manipulate my JSON easily. It's quite convenient to handle.
How to manage this ? A project may contains averagely 10 to 10 000 properties that multiple users can edit being online or offline.
But the user did a modification on a different project property, can couchDb detect that himself and merge with newer _rev ?
PouchDB/CouchDB conflict handling is described in the PouchDB guide: http://pouchdb.com/guides/conflicts.html
the app downloads all documents on any couchDb changes detected. Weird... Idk how to prevent that.
This is standard PouchDB/CouchDB behavior - you asked it to sync the whole database, so it synced the whole database. :) You can prevent it by using filtered-replication: http://pouchdb.com/api.html#filtered-replication.
How to manage this ? A project may contains averagely 10 to 10 000 properties that multiple users can edit being online or offline.
It really really depends on your data, how frequently it may change, what the unique identifier of a single "property" is... Storing 10,000 separate documents in PouchDB/CouchDB is not a crazy idea, though, and may help you out when it comes to conflicts, since only those individual documents can ever be in conflict.
In general, I'd recommend you read the guide to conflict resolution as described above and review your options. There's also a plugin that may help you with conflict resolution: https://github.com/jo/pouch-resolve-conflicts