I have a group chat app. Each chat document has a node called
chatUsers:[1,2,3,4,5]
where 1-5 is a user Id on that node.
I need to pull all chats where I am a user, so I use the array-contains operator. My issue is there is also another node called archivedChat. That node tells if I archived the chat.
ie:
archivedChat:[1,2]
meaning users 1 and 2 have archived this chat. I want to get all chats where I am a user and I have not archived, and then all chats I am a user and have archived.
firebase prevents using these two operators together, and I understand I can filter on the front end, but I'd need all records retrieved that. I could have 1000 chat rooms/documents, so I do not want to query the entire collection, I'd much much prefer doing 2 separate queries. Here is where I am at:
query(
roomsRef,
where(USERS_PATH, 'array-contains', currentUserId),
where(ARCHIVE_USERS_FIELD, 'not-in', [[currentUserId]]),
orderBy(LAST_UPDATED_FIELD, 'desc'),
limit(roomsPerPage),
startAfter(lastRoom)
I can think of no way to do this. Since the chat is the same whether it is archived or not, and my archived flag just shows all archived chats in a different area and effects how I display it, I really do not want to move it into another collection....
Any help?
I'd recommend adding a third field that essentially combines the information from the other user lists. If you only want to show the document for users that are in the USERS_PATH and are not in ARCHIVE_USERS_FIELD, add a field (say SHOW_USERS) that contains just the UIDs of those users.
This type of data duplication is quite common when using NoSQL databases, where you often have to model/augment your data to fit with the specific use-cases you have.
After learning the limitations, i think i am going to go to mongodb cloud and ditch firestore.
Firestore is very good, but if you want complex queries ie(users are part of teams, and we want both team chats to be pulled along with user chats, team chats a user is not on, user chats archived, and team
chats archived from a users view, there is no great way to query.
Lastly if i could have 1000 team
Chats all with listeners on the room user online status’s, etc i can easily exceed quota limits.
Mongodb requires a server layer but there are no server limits, and better query capabilities.
Much more complex to build as subscribing to documents has to be pushed via sockets instead of a very clean front-end sdk that firestore has.
Each has their perks, but a large scale chat app with 1000+ rooms and complex querying feels like forcing a square peg through a round hole here:(
Related
I am trying to build an application whereby data is separated by companies. Within each company, there can be multiple locations that contain different data. Users should be able to view each of the location's data separately and also would like to include user permissions to only allow admins to edit data. I don't have a lot of experience working with MongoDb but I'm trying to use it to gain some experience. What is the best way of structuring this?
Im not sure I unserstand your use case, but maybe:
You could assosicate each user with a company
Before performing a query, you pass the query through a function, say limitResults()
limitResults(user, query) adds a filter (WHERE like clause) to the query to only show data for the company the user is in, unless the user is an admin, then it returns the query unchanged so they can see all results.
It maybe beneficial to you read up the difference between authentication and authorisation. Once you have a system for authenticating someone, you need a system to determine what the person is allowed to do.
I'm making a list of tasks to learn how to use PouchDB / CouchDB, the application is quite simple, would have authentication and the user would create their tasks.
My question is regarding how to store each user's information in the database. Should I create a database for each user with their tasks? Or is there a way to put all of the tasks of all users into a database called "Tasks" and somehow filter the synchronization so that PouchDB does not synchronize the whole database (including other users' tasks) that is in the server?
(I have read the pouchdb documentation a few times and I have not been able to define this, if it is documented, please inform me where.)
You can use both approaches to fulfill your use case:
Database per user
A database per user, is the db-per-user pattern in CouchDB. CouchDB can handle the database creation/deletion each time a user is created/deleted in CouchDB. In this case each PouchDB client will replicate the complete user database.
You can enable it in the server config
This is a proper approach if the users data is isolated and you don't need to share information between users. In this case you can have some scalability issues if you need you sync many user databases with another one in CouchDB. See this post.
Single database for every user
You need to use the filtered-replication feature in CouchDB/PouchDB. This post explains how to use it.
With this approach you can replicate a subset of the CouchDB database in PouchDB
As you have a single database is easier to share info between users
But, this approach has some performance problems. The filtering process is very inefficient. As it has to process the whole dataset, including the deleted documents to determine the set of documents to be included in the replication. This filtering is done in a couchdb external process in the server which add more cost to the process.
If you need to use the filtering approach it is better to use a Mango Selector for this purpose as it is evaluated in the CouchDB main process and it could be indexed. See options.selector in the PouchDB replication filtering options.
Conclusion
Which is better? depends on your use case... In any case you should consider the scalability issues in both cases:
In the case of filtered replication, you will face some issues as the number of documents grow if you have to filter the complete dataset. This is reported to be 10x faster when using mango selectors.
In the case of db-per-user, you will have some issues if you need to consolidate the different user databases in a single one when the number of users grow.
Both pattern are valid. The only difference is that in order to use the filtered replication, you need to provide access to the main database.
Since it's in javascript, it's easy to get credentials and then access the main database. This would give users the ability to see everyone's data.
A more secure approach would be to use a database-per-user pattern. Each database will be protected by the user's credentials.
I'm working on a pretty fun web app project that can become rather big, and I have a chance to play around with this handy thing called PubNub as main real-time engine of application.
So it's a web application with Node.js backend, involves potentially huge amount of chat rooms between users and realtime notifications sent to users by backend when some data in DB is updated.
Usually, developing with Sockets.io, I will just subscribe each user to channel of his unique DB id, and also to chanels representing different chat rooms.
This way I can handle chat rooms and authentication on backend and after storing some personal notification in DB I can easily push them to channel named by user id, so if user is online - he gets it, if not - fine, he will see it on next login, notification is already in DB. And theoretically this monstrocity should scale just fine horizontally with help of redis pub/sub.
Thing that worries me about PubNub in this case is scalability. As I obviously have no insight on what is going on in PubNub backend's dark corners, I want to make sure that app is built in the way that it will be prepared to handle some obscure enormously huge amount of simultaneous users.
My question is, what is the best approach to building such a system with PubNub?
Am I correct assuming that it will be better, in need of pushing notification to specific user, to subscribe to this user's pubnub, push note and unsubscribe. As if I will keep all online-user channels open - then there is no point in PubNub instead of websockets on my server, as server will be anyway under load of all of those opened online-user channels and should be scaled just to maintain huge quantity of them.
What about user authorisation? Without involving my backend how can I be sure that user posting some message will not be able to fake his personality and will have exatly the same as he have authenticated inside application?
And generally (and via PubNub) what is the best practice to tackle huge amounts of chats per users? As say during application life each user may accumulate some decent amount of garbage chat rooms that have some users in it, though havn't been touched by anyone for a long time, and users just way too lazy to leave it manually?
Thanks for Your patience in reading this wall of text!
UPDATED Dec 5, 2021
If you are implementing a chat app, please refer to the PubNub Chat use-case documentation for full details. It has new features and UI components that are built upon the PubNub Platform.
UPDATED May 15, 2020
We have some new docs that will explain much of the below in much clearer terms.
And new features that can be applied to many of the questions/answers below:
Message Actions
Message Counts
Batch History (multi-channel message fetch)
Objects (Users, Channels and Memberships Metadata)
NOTE: I've sprinkled some of the above links down in the answers below.
First, let's address this...
Thing that worries me about PubNub, in this case, is scalability. As I
obviously have no insight on what is going on in PubNub backend's dark
corners, I want to make sure that the app is built in a way that it will
be prepared to handle some obscure enormously huge amount of
simultaneous users.
and this...
then there is no point in PubNub instead of WebSockets on my server,
as the server will be anyway under the load of all of those opened online-user
channels and should be scaled just to maintain the huge quantity of them
This is sort of backward because you would use a service like PubNub to ensure that your application scales to handle millions of users. PubNub has thousands of customers that scale to millions of users and 100's of billions of messages. Not knowing how PubNub does this frees you to implement the biz logic of your application.
But I think I get what you are saying. You are under the impression that your server has to be involved in each and every chat room interaction for every user, but that is only partially true. Mostly, your server will be used for authentication, some subscription maintenance (optional), and probably for sending messages out to one, many, or all end users, as required (depends on your requirements).
Here are some attempts to answer your questions although they are kind of a bit all over the place so I will do my best to answer what I think it is you are asking.
Question 1
This question seems to be directed at maintaining lots of subscriptions to channels and the scalability of that.
Generally speaking, every end-user initializes PubNub and subscribes to channels they need to listen to and publish to channels they need to send messages on. Typically, the channels (chat rooms in your case, I assume) they are publishing on are the same channels they are subscribing to, but they are different kinds of use cases. And you can subscribe to thousands of channels at a time (up to 20K per client). If you did this with WebSockets, how would you go about scaling this to millions of users? You would implement and operate (to scale) something similar to PubNub (not easy and not cheap).
Now, if a user is subscribed to a bunch of chat room channels but some or many are stale (user hasn't viewed or posted to in a while), you could have some code on your server (or the client) that monitors the activity of users and unsubscribes them from those stale channels. This is possible using channels groups. Each end-user would have their own channel group that contains all the channels they are listening to. And the client code or the server code and add and remove channels to/from those end users' channel groups.
Question 2
UPDATED DOCS: https://www.pubnub.com/docs/platform/security/access-control
Now this question is a bit more clear and focused and is asking about authentication (login) and how to ensure someone is who they say they are and how to handle authorization (what they can and cannot do) and where/who controls this.
The answer is, you control the authentication (login) to prove that the person is what they say they are. Your log-in process checks for a valid username/password and in the user record, you will have a list of access controls for that user. With that, you generate an auth-key that you grant read and/or write access to one or more channels. This grant is a PubNub operation that your server invokes. The auth-key is passed back to the client and the client code initializes PubNub instance using the pub/sub keys and this auth-key that PubNub servers use to check for access based on the channel and the operation being requested (subscribe to this channel, publish to that channel, etc). If the auth-key does not have the proper access, the PubNub server will deny access (403 response).
There's more to all of this but this is a good start. Read up on PubNub Access Manager for the SDK you will be using on our docs page. For example, you can start with the JavaScript SDK Access Manager docs and tutorials.
Question 3
UPDATED DOCS: https://www.pubnub.com/docs/platform/channels/receive#subscribe-to-channels
I believe I answered this sufficiently with question 1 - Channel Groups. Start with the JavaScript SDK Stream Controller (which provides Channel Group feature) docs and tutorials.
I hope I have managed to move you a few steps further along your journey to a highly successful real-time, data stream application using PubNub. Please reply with any additional questions you may still have.
*Answers to your new comments:*
Thanks for your follow-up comments. It is very clear what you are asking now.
I will need to compare chat room timestamp with personal user last-read timestamp for this, so it seems that I need to listen to those channels from back-end and update user's last-reads, or to trust into the front-end, and get timestamps from a user directly
No, you do not have to listen to the channels on your server. Yes, from the client app, you will keep the timestamp of the last received message. When the user comes back online, you use this timestamp to get history for the channels the client was subscribed to. Many have done this successfully and we are going to be releasing some amazing features in the coming months that will simplify this considerably.
pushing real-time notifications to users from the back-end. Do I need to be subscribed to all of my user channels if I want to push notes to them at any time?
You can publish on any channel without actually subscribing to it first. So your server can publish to channels as it needs to.
And as before, keep coming with more questions as you require.
*Great follow-up questions again. Here's what I suggest*
... it makes sense to not request all of those chat rooms from DB and join via pubnub all of them, but rather implement pagination... how user can be aware of new messages that may appear in his old chat rooms?
Again, you can stay subscribed to 20K channels using channel groups. You can subscribe to 10 channel groups with 2K channels per channel group - but I'd recommend just limiting the user to 100 or less because that seems like a sufficient limit to impose in your app. But pick whatever upper limit you want and when the user hits that limit, force them to leave another chat room first or suggest they leave one of the top 10 most inactive, or some algorithm that makes sense for your app.
UPDATED DOCS: https://www.pubnub.com/docs/platform/channels/receive#subscribe-to-channels
Getting the # of missed messages does require a full history fetch, but we are going to be providing improved APIs to make this simpler in the near future. But if the user is registered for push notifications on all these channels, the device would be able to receive these push messages and your app can keep that count locally. We will have a "how to update the badge count in background" article being published soon. You could also use that to keep track of the number of missed messages per channel (chat room).
For now I just want to limit the number of rooms available for users to let's say a hundred and request and join them without pagination.
UPDATED DOCS: https://www.pubnub.com/docs/platform/channels/retrieve
We do have customers that do this without worrying about pagination. They just retrieve history on the 100 channels the device is subscribed to. With the background badge count updater strategy, you will have the advantage to know which channels to fetch from when the app becomes active. I will post the link to that article here once it is published.
I'm currently running into an issue creating or updating a large amount of users (200+) at once. Here's the issues of found so far:
Existing solution - Keeping users as Users, I must fetch all users with my list of email addresses, saveAll() changes to users with the existing email addresses and then signUp() users who do not exist. The problem here is running into the request limit.
Problematic solution - I believe I could use a saveAll() method if the users were not Users, but then I lose the ability to store passwords and use the signIn() method associated with it
Problematic solution - User a background job: I haven't tried this but, "Jobs that are initiated after the maximum concurrent limit has been reached will be terminated immediately" scares me a bit as well.
The reason I need this is because I need a company to be able to upload their users for use in our application. The users have information stored about them and can log in to check the information, so I would really like being able to use the User class.
Another option would be to pay to up the request limit, but if we jump to creating/updating 2000+ users, we'll be right back to the same scenario...
Hi I am building a "Twitter clone" for my school project.
I want to implement a publish subscribe pattern for realtime updates.
Users can "follow" other users
When a user is online, and a "follower" posts a new message, the user should get a realtime notification.
I am using Node.js, Socket.io, Redis and MySql as database provider. Should I use a message queue, and whatfor are people using message queue's?
Thanks for help and answers
Update
The problem is not there when you are small. But when you get big the fanout(forwarding message to all followers is going to be expensive and you want to do this offline using a MQ. Like twitter you store all active tweets in memory. When a tweet is posted you put(set) that tweet in memory #key(unique). You could use something like Twitter's snowflake for that.
Next the fanout process happens. For every user you need to put that unique key(tweet id) in their list so that they can retrieve the tweets from memory. When your site is small I guess you could do this without a message queue, but when you need to distribute a message from a user like for example scoble with 274,776 followers and who tweets a lot this can get pretty expensive.
A lot of users are offline so these tweets do not need to get delivered to the user immediately. You design your system like this because you need to keep everything in memory. I think that is the only way to do this effectively.
You should use a MQ just like twitter does. They have even open-sourced their own MQ: Kestrel. The High Scalability blog has a really interesting article: Scaling Twitter: Making Twitter 10000 Percent Faster. I advice you to study at least hot articles at High Scalability blog to learn how the big players scale their website. Some other links explaining how Twitter scales:
http://highscalability.com/blog/2009/10/13/why-are-facebook-digg-and-twitter-so-hard-to-scale.html
http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html
http://highscalability.com/blog/2009/4/20/some-things-about-memcached-from-a-twitter-software-develope.html
I also assume you have read:
http://redis.io/topics/twitter-clone
Also I would have a look at all the projects Twitter has open-sourced:
https://github.com/twitter
I would have a look at the popular MQs like for example:
Redis
Beanstalkd
Gearman.
I recently worked on a similar use case, and I used nodejs, socketio and redis pubsub.
The code is available at https://github.com/roshansingh/realtime-notifications.
Now coming back to your questions:
Users can "follow" other users
When a user is online, and a "follower" posts a new message, the user should get a realtime notification.
You can achieve both by creating rooms using socketio and a channel with same name in redis pubsub.
The flow can be something like this:
You can make user join socketio rooms(say John, Dan etc) as soon as they login for which you will save all their subscribed rooms in database. And that the same time you will subscribe to redis pubsub with these channel names (like John). These updates when received can then be broadcasted to the rooms, and hence to all the online users.
You will have to publish John's activities on the same channel name(John) to redis.
Please read the code on the link pasted above. Let me know if you need any help.