What we have?
An API build in Node.js (using Moleculer.js for micro-services and PostgreSQL for storing data) which has the functionality of keeping track of users and user groups. We have in average 3k users per group, and a user can be part of multiple groups.
What we want to achieve?
We want to create a special service which will send text messages. The admins will select multiple groups, the code will remove the duplicated users and send them an SMS.
After a selection we can have around 1 million users. How can we send them text messages in an efficient way?
What have we tried?
Paginate the users and for each page send a request to the SMS service.
const users = db.getPage(1); // [{ id: 1, phone: '+123456789' }, ...]
smsClient.sendBulk(users);
PROBLEM: The user list in the database can change in the process and can affect the pagination by giving us duplicates or skipping some users.
Load all the results in the memory and send all the users to the SMS service.
const users = db.getAll(); // [..., { id: 988123, phone: '+987654321' }]
smsClient.sendBulk(users);
PROBLEM: We think it's a bad idea, resource wise, to make this kind of queries to the database and keep them in the memory. In the same time, we don't want to send 1 million entities through an HTTP request to the SMS service.
How can we select a 1 million users and send them an SMS message without worry about duplicates, skipped data or any other alteration to the admin's selection? We were thinking about queues as a necessary step but after we find a solution for the cases mentioned above. Or, is the queue part of the solution?
How can we select a 1 million users and send them an SMS message without worry about duplicates, skipped data, or any other alteration to the admin's selection?
For managing duplicates You could use an additional DB to save a Hash Table for the users that been handled already. This is a bit more expensive because you will need to check the user before each SMS send.
Managing not skipping is a bit tricky because you will need to add more recipients to an ongoing SMS transaction. You will need the ability to detect (hook) when a user is added to a group and add it as recipients to the ongoing transactions accordingly.
You will need to find a fast DB and save that user in a HashSet for a fast set and get (O(1))
We were thinking about queues as a necessary step but after we find a solution for the cases mentioned above. Or, is the queue part of the solution?
Defenently. Queue is the correct way to go for this scenario (queueing many small tasks). Some queues come with a re-queue features that will re-queue any task that didn't get acknowledgment.
you need to check out RabbitMQ.message-driven microservices
Have you considered creating an indirect state between the user and sent SMS? Something like SmsRequest / SmsTask / however you'd call it.
It'd consist of necessary user-data, message content, status of the request (to-send, sending, sent, failed, ...) and some additional metadata depending on your needs.
Then the first step you'd do is to prepare these request and store them in db, effectively making a queue out of a table. You can add some constraints on user and message type that'd prevent any duplicates and then start second asynchronous process that simply fetches requests in to-send state, sets the state to sending and then saves the outcome.
This also gives you the benefit of audit + you can batch the outgoing messages and.
Of course it'd increase your data volume significantly but I guess it's cheap nowadays anyway.
Related
I'm building a chat app and trying to work out the most efficient way to request multiple conversation threads (both private & group) from a MongoDB database.
My current idea is to loop through the user's contacts on the client side and send a 'getConversation' request to my REST API for each contact. This happens after the user profile data has first been retrieved on the server and sent to the client, in order to populate some of the chat interface as quickly as possible, although I'm not sure if this is optimal given the number of additional requests I'm making (easily 25 - 50 at a time).
I currently think that there are 3 methods I could use:
1.) Send a request to the server for the user data > loop through each contact (private & group) on the server > get each conversation from the DB > send the entire bundle back to the client and separate data into the relevant (Vue / Vuex) modules. Total Requests: 1 / Data Requested: Large
2.) * What I'm doing now: Send an initial request for the user data > receive it on the client > loop through the contacts on the client side > send a separate API request for each contact > populate the conversations as they arrive back on the client. Total Requests: > 20 / Data Requested: Small
3.) Send the initial request for user data > receive it > send a single request for all conversations. I expect this to take longer than option 2, but I could be wrong. Total Requests: 2 / Data Requested: Medium
My objective is to retrieve both user data & conversations as quickly + as efficiently as possible, so I welcome any suggestions or techniques you've used to achieve this kind of thing.
Cheers :)
Notes:
I'm using Vue / Vuex / MongoDB / Express / SocketIO.
TL;DR I'd stay with the second option.
Since you want your app to load as fast as possible and be responsive, you should avoid requesting big chunks of data which you might even end up not using in the app. I'd fetch the first (latest) 5-10 conversations since those would probably be the ones the user would like to read first. Then, if the user wants to read more conversations you haven't fetched from the server yet, you can fetch those (and maybe some conversations from around that time). About your concern regarding sending lots of requests to the server shouldn't be significantly slower than a single big request, and it would make the app much faster and snappier.
For further discussion on this subject check out this question.
Database stores some data about the user which almost never change. Well sometimes information might change if the user wants to edit his name for example.
Data information is about each user's name, username and his company data.
The first two are being shown to his navigation bar all the time using ejs, like User_1 is logged in, his company profile data when he needs to create an invoice.
My current way is to fetch user data through middleware using router.use so the extracted information is always available through all routes/views, for example:
router.use(function(req, res ,next) { // this block of code is called as middleware in every route
req.getConnection(function(err,conn){
uid = req.user.id;
if(err){
console.log(err);
return next("Mysql error, check your query");
}
var query = conn.query('SELECT * FROM user_profile WHERE uid = ? ', uid, function(err,rows){
if(err){
console.log(err);
return next(err, uid, "Mysql error, check your query");
}
var userData = rows;
return next();
});
});
})
.
I understand that this is not an optimal way of passing user profile data to every route/view since it makes new DB queries every time the user navigates through the application.
What would be a better way of having this data available without repeating the same query in each route yet having them re-fetched once the user changes a portion of this data, like his fullname ?
You've just stumbled into the world of "caching", welcome! Caching is a very popular choice for use cases like this, as well as many others. A cache is essentially somewhere to store data that you can get back much quicker than making a full DB query, or a file read, etc.
Before we go any further, it's worth considering your use case. If you're serving only a few users and have a low load on your service, caching might be over-engineering and in fact making a DB request might be the simplest idea. Adding caching can add a lot of complexity to your code as things move forward, not enough to scare you, but enough to cause hard to trace bugs. So consider for a moment your service load, if it's not very high (say an internal application for somewhere you work with only maybe a few requests every few minutes) then just reading from the DB is probably not going to slow down a request too much. In this case, reading from the DB is the simplest and probably best solution. However, if you're noticing that this DB request is slowing down your application for requests or making it harder to scale up, then caching might be for you.
A really popular approach for this would be to get something like "redis" which is a key-value database that holds everything in memory (RAM). Redis can sit as a service like MySQL and has a very basic query language. It is blindingly fast and can scale to enormous loads. If you're using Express, there are a number of NPM modules that help you access a redis instance. Simply push in your credentials and you can then make GET and SET requests (to get data or to set data).
In your example, you may wish to store a users profile in a JSON format against their user id or username in redis. Then, create a function called getUserProfile which takes in the ID or username. This can then look it up in redis, if it finds the record then it can return it to your main controller logic. If it does not, it can look it up in your MySQL database, save it in redis, and then return it to the controller logic (so it'll be able to get it from cache next time).
Your next problem is known for being a very pesky problem in computer science. It's "Cache Invalidation", in this case if your user profile updates you want to "invalidate" your cache. A way of doing this would be to update your cached version when the user updates their profile (or any other data saved). Alternatively, you could also just remove the cached version from redis and then next time it's requested from getUserProfile, it will be fetched from the DB fresh, and then put into redis for next time.
There are many other ways to approach this, but this will most likely solve your problem in the simplest way without too much overhead. It will also be easy to expand in the future!
I'm using Firebase for an app and the built-in real-time capabilities seem well suited for instant messaging. I'm just having a hard time working out in my head how the database should be set up. Ideally, it's something like this:
messages: {
<messageId>: {
from: <userId>,
to: <userId>,
text: <String>,
dateSent: <Date>
dateRead: <Date>
}
}
And that's all fine for sending messages, but reading message threads becomes difficult. I need to query the (potentially huge) list of messages for messages that match the current thread's sender and receiver, and then order those by dateSent. If that is possible with Firebase's new querying API, then I have yet to figure out exactly how to do it.
Querying a huge list of messages is never a good idea. If you want a fast-performing Firebase/NoSQL application, you'll need to model the data to allow fast look up.
In a chat scenario that typically means that you'll model your chat rooms into the data structure. So instead of storing one long list of messages, store the messages for each chat "room" separately.
messages
<roomId>
<messageId1>: "..."
<messageId2>: "..."
<messageId3>: "..."
Now you can access the messages for the chat without a query, just ref.child(roomId).on(....
If you want a persistent mapping that ensures the same two users end up in the same room, have a look at Best way to manage Chat channels in Firebase
Is there any way to drop a Mongo Database Collection from within the server side JavaScript code with Meteor? (really drop the whole thing, not just Meteor.Collection.remove({}); it's contents)
In addition, is there also a way to drop a Meteor.Collection from within the server side JavaScript code without dropping the corresponding database collection?
Why do that?
Searching in the subdocuments (subdocuments of the user-document, e.g. userdoc.mailbox[12345]) with underscore or similar turns out quiet slow (e.g. for large mailboxes).
On the other hand, putting all messages (in context of the mailbox-example) of all users in one big DB and then searching* all messages for one or more particular messages turns out to be very, very slow (for many users with large mailboxes), too.
There is also the size limit for Mongo documents, so if I store all messages of a user in his/her user-document, the mailbox's maximum size is < 16 MB together with all other user-data.
So I want to have a database for each of my user to use it as a mailbox, then the maximum size for one message is 16 MB (very acceptable) and I can search a mailbox using mongo queries.
Furthemore, since I'm using Meteor, it would be nice to then have this mongo db collection be loaded as Meteor.Collection whenever a user logs in. When a user deactivates his/her account, the db should of course be dropped, if the user just logs out, only the Meteor.Collection should be dropped (and restored when he/she logs in again).
To some extent, I got this working already, each user has a own db for the mailbox, but if anybody cancels his/her account, I have to delete this particular Mongo Collection manually. Also, I have do keep all mongo db collections alive as Meteor.Collections at all times because I cannot drop them.
This is a well working server-side code snippet for one-collection-per-user mailboxes:
var mailboxes = {};
Meteor.users.find({}, {fields: {_id: 1}}).forEach(function(user) {
mailboxes[user._id] = new Meteor.Collection("Mailbox_" + user._id);
});
Meteor.publish("myMailbox", function(_query,_options) {
if (this.userId) {
return mailboxes[this.userId].find(_query, _options);
};
});
while a client just subscribes with a certain query with this piece of client-code:
myMailbox = new Meteor.Collection("Mailbox_"+Meteor.userId());
Deps.autorun(function(){
var filter=Session.get("mailboxFilter");
if(_.isObject(filter) && filter.query && filter.options)
Meteor.subscribe("myMailbox",filter.query,filter.options);
});
So if a client manipulates the session variable "mailboxFilter", the subscription is updated and the user gets a new bunch of messages in the minimongo.
It works very nice, the only thing missing is db collection dropping.
Thanks for any hint already!
*I previeously wrote "dropping" here, which was a total mistake. I meant searching.
A solution that doesn't use a private method is:
myMailbox.rawCollection().drop();
This is better in my opinion because Meteor could randomly drop or rename the private method without any warning.
You can completely drop the collection myMailbox with myMailbox._dropCollection(), directly from meteor.
I know the question is old, but it was the first hit when I searched for how to do this
Searching in the subdocuments...
Why use subdocuments? A document per user I suppose?
each message must be it's own document
That's a better way, a collection of messages, each is id'ed to the user. That way, you can filter what a user sees when doing publish subscribe.
dropping all messages in one db turns out to be very slow for many users with large mailboxes
That's because most NoSQL DBs (if not all) are geared towards read-intensive operations and not much with write-intensive. So writing (updating, inserting, removing, wiping) will take more time.
Also, some online services (I think it was Twitter or Yahoo) will tell you when deactivating the account: "Your data will be deleted within the next N days." or something that resembles that. One reason is that your data takes time to delete.
The user is leaving anyway, so you can just tell the user that your account has been deactivated, and your data will be deleted from our databases in the following days. To add to that, so you can respond to the user immediately, do the remove operation asynchronously by sending it a blank callback.
How should I design an on-login middleware that checks if the recurring subscription has failed ? I know that Stripe fires events when things happen, and that the best practice is webhooks. The problem is, I can't use webhooks in the current implementation, so I have to check when the user logs in.
The Right Answer:
As you're already aware, webhooks.
I'm not sure what you're doing that webhooks aren't an option in the current implementation: they're just a POST to a publicly-available URL, the same as any end-user request. If you can implement anything else in Node, you can implement webhook support.
Implementing webhooks is not an all-or-nothing proposition; if you only want to track delinquent payments, you only have to implement processing for one webhook event.
The This Has To Work Right Now, Customer Experience Be Damned Answer:
A retrieved Stripe Customer object contains a delinquent field. This field will be set to true if the latest invoice charge has failed.
N.B. This call may take several seconds—sometimes into the double digits—to complete, during which time your site will appear to have ceased functioning to your users. If you have a large userbase or short login sessions, you may also exceed your Stripe API rate limit.
I actually wrote the Stripe support team an email complaining about this issue (the need to loop through every invoice or customer if you're trying to pull out delinquent entries) and it appears that you can actually do this without webhooks or wasteful loops... it's just that the filtering functionality is undocumented. The current documentation shows that you can only modify queries of customers or invoices by count, created (date), and offset... but if you pass in other parameters the Stripe API will actually try to understand the query, so the cURL request:
https://api.stripe.com/v1/invoices?closed=false&count=100&offset=0
will look for only open invoices.... you can also pass a delinquent=true parameter in when looking for delinquent customers. I've only tested this in PHP, so returning delinquent customers looks like this:
Stripe_Customer::all(array(
"delinquent" => true
));
But I believe this should work in Node.js:
stripe.customers.list(
{delinquent:true},
function(err, customers) {
// asynchronously called
});
The big caveat here is that because this filtering is undocumented it could be changed without notice... but given how obvious the approach is, I'd guess that it's pretty safe.