Dropping a Mongo Database Collection in Meteor - javascript

Is there any way to drop a Mongo Database Collection from within the server side JavaScript code with Meteor? (really drop the whole thing, not just Meteor.Collection.remove({}); it's contents)
In addition, is there also a way to drop a Meteor.Collection from within the server side JavaScript code without dropping the corresponding database collection?
Why do that?
Searching in the subdocuments (subdocuments of the user-document, e.g. userdoc.mailbox[12345]) with underscore or similar turns out quiet slow (e.g. for large mailboxes).
On the other hand, putting all messages (in context of the mailbox-example) of all users in one big DB and then searching* all messages for one or more particular messages turns out to be very, very slow (for many users with large mailboxes), too.
There is also the size limit for Mongo documents, so if I store all messages of a user in his/her user-document, the mailbox's maximum size is < 16 MB together with all other user-data.
So I want to have a database for each of my user to use it as a mailbox, then the maximum size for one message is 16 MB (very acceptable) and I can search a mailbox using mongo queries.
Furthemore, since I'm using Meteor, it would be nice to then have this mongo db collection be loaded as Meteor.Collection whenever a user logs in. When a user deactivates his/her account, the db should of course be dropped, if the user just logs out, only the Meteor.Collection should be dropped (and restored when he/she logs in again).
To some extent, I got this working already, each user has a own db for the mailbox, but if anybody cancels his/her account, I have to delete this particular Mongo Collection manually. Also, I have do keep all mongo db collections alive as Meteor.Collections at all times because I cannot drop them.
This is a well working server-side code snippet for one-collection-per-user mailboxes:
var mailboxes = {};
Meteor.users.find({}, {fields: {_id: 1}}).forEach(function(user) {
mailboxes[user._id] = new Meteor.Collection("Mailbox_" + user._id);
});
Meteor.publish("myMailbox", function(_query,_options) {
if (this.userId) {
return mailboxes[this.userId].find(_query, _options);
};
});
while a client just subscribes with a certain query with this piece of client-code:
myMailbox = new Meteor.Collection("Mailbox_"+Meteor.userId());
Deps.autorun(function(){
var filter=Session.get("mailboxFilter");
if(_.isObject(filter) && filter.query && filter.options)
Meteor.subscribe("myMailbox",filter.query,filter.options);
});
So if a client manipulates the session variable "mailboxFilter", the subscription is updated and the user gets a new bunch of messages in the minimongo.
It works very nice, the only thing missing is db collection dropping.
Thanks for any hint already!
*I previeously wrote "dropping" here, which was a total mistake. I meant searching.

A solution that doesn't use a private method is:
myMailbox.rawCollection().drop();
This is better in my opinion because Meteor could randomly drop or rename the private method without any warning.

You can completely drop the collection myMailbox with myMailbox._dropCollection(), directly from meteor.
I know the question is old, but it was the first hit when I searched for how to do this

Searching in the subdocuments...
Why use subdocuments? A document per user I suppose?
each message must be it's own document
That's a better way, a collection of messages, each is id'ed to the user. That way, you can filter what a user sees when doing publish subscribe.
dropping all messages in one db turns out to be very slow for many users with large mailboxes
That's because most NoSQL DBs (if not all) are geared towards read-intensive operations and not much with write-intensive. So writing (updating, inserting, removing, wiping) will take more time.
Also, some online services (I think it was Twitter or Yahoo) will tell you when deactivating the account: "Your data will be deleted within the next N days." or something that resembles that. One reason is that your data takes time to delete.
The user is leaving anyway, so you can just tell the user that your account has been deactivated, and your data will be deleted from our databases in the following days. To add to that, so you can respond to the user immediately, do the remove operation asynchronously by sending it a blank callback.

Related

How to programatically send SMS notifications to 1 million users using queues?

What we have?
An API build in Node.js (using Moleculer.js for micro-services and PostgreSQL for storing data) which has the functionality of keeping track of users and user groups. We have in average 3k users per group, and a user can be part of multiple groups.
What we want to achieve?
We want to create a special service which will send text messages. The admins will select multiple groups, the code will remove the duplicated users and send them an SMS.
After a selection we can have around 1 million users. How can we send them text messages in an efficient way?
What have we tried?
Paginate the users and for each page send a request to the SMS service.
const users = db.getPage(1); // [{ id: 1, phone: '+123456789' }, ...]
smsClient.sendBulk(users);
PROBLEM: The user list in the database can change in the process and can affect the pagination by giving us duplicates or skipping some users.
Load all the results in the memory and send all the users to the SMS service.
const users = db.getAll(); // [..., { id: 988123, phone: '+987654321' }]
smsClient.sendBulk(users);
PROBLEM: We think it's a bad idea, resource wise, to make this kind of queries to the database and keep them in the memory. In the same time, we don't want to send 1 million entities through an HTTP request to the SMS service.
How can we select a 1 million users and send them an SMS message without worry about duplicates, skipped data or any other alteration to the admin's selection? We were thinking about queues as a necessary step but after we find a solution for the cases mentioned above. Or, is the queue part of the solution?
How can we select a 1 million users and send them an SMS message without worry about duplicates, skipped data, or any other alteration to the admin's selection?
For managing duplicates You could use an additional DB to save a Hash Table for the users that been handled already. This is a bit more expensive because you will need to check the user before each SMS send.
Managing not skipping is a bit tricky because you will need to add more recipients to an ongoing SMS transaction. You will need the ability to detect (hook) when a user is added to a group and add it as recipients to the ongoing transactions accordingly.
You will need to find a fast DB and save that user in a HashSet for a fast set and get (O(1))
We were thinking about queues as a necessary step but after we find a solution for the cases mentioned above. Or, is the queue part of the solution?
Defenently. Queue is the correct way to go for this scenario (queueing many small tasks). Some queues come with a re-queue features that will re-queue any task that didn't get acknowledgment.
you need to check out RabbitMQ.message-driven microservices
Have you considered creating an indirect state between the user and sent SMS? Something like SmsRequest / SmsTask / however you'd call it.
It'd consist of necessary user-data, message content, status of the request (to-send, sending, sent, failed, ...) and some additional metadata depending on your needs.
Then the first step you'd do is to prepare these request and store them in db, effectively making a queue out of a table. You can add some constraints on user and message type that'd prevent any duplicates and then start second asynchronous process that simply fetches requests in to-send state, sets the state to sending and then saves the outcome.
This also gives you the benefit of audit + you can batch the outgoing messages and.
Of course it'd increase your data volume significantly but I guess it's cheap nowadays anyway.

Right way of passing consistent data from DB to user without repeatedly querying

Database stores some data about the user which almost never change. Well sometimes information might change if the user wants to edit his name for example.
Data information is about each user's name, username and his company data.
The first two are being shown to his navigation bar all the time using ejs, like User_1 is logged in, his company profile data when he needs to create an invoice.
My current way is to fetch user data through middleware using router.use so the extracted information is always available through all routes/views, for example:
router.use(function(req, res ,next) { // this block of code is called as middleware in every route
req.getConnection(function(err,conn){
uid = req.user.id;
if(err){
console.log(err);
return next("Mysql error, check your query");
}
var query = conn.query('SELECT * FROM user_profile WHERE uid = ? ', uid, function(err,rows){
if(err){
console.log(err);
return next(err, uid, "Mysql error, check your query");
}
var userData = rows;
return next();
});
});
})
.
I understand that this is not an optimal way of passing user profile data to every route/view since it makes new DB queries every time the user navigates through the application.
What would be a better way of having this data available without repeating the same query in each route yet having them re-fetched once the user changes a portion of this data, like his fullname ?
You've just stumbled into the world of "caching", welcome! Caching is a very popular choice for use cases like this, as well as many others. A cache is essentially somewhere to store data that you can get back much quicker than making a full DB query, or a file read, etc.
Before we go any further, it's worth considering your use case. If you're serving only a few users and have a low load on your service, caching might be over-engineering and in fact making a DB request might be the simplest idea. Adding caching can add a lot of complexity to your code as things move forward, not enough to scare you, but enough to cause hard to trace bugs. So consider for a moment your service load, if it's not very high (say an internal application for somewhere you work with only maybe a few requests every few minutes) then just reading from the DB is probably not going to slow down a request too much. In this case, reading from the DB is the simplest and probably best solution. However, if you're noticing that this DB request is slowing down your application for requests or making it harder to scale up, then caching might be for you.
A really popular approach for this would be to get something like "redis" which is a key-value database that holds everything in memory (RAM). Redis can sit as a service like MySQL and has a very basic query language. It is blindingly fast and can scale to enormous loads. If you're using Express, there are a number of NPM modules that help you access a redis instance. Simply push in your credentials and you can then make GET and SET requests (to get data or to set data).
In your example, you may wish to store a users profile in a JSON format against their user id or username in redis. Then, create a function called getUserProfile which takes in the ID or username. This can then look it up in redis, if it finds the record then it can return it to your main controller logic. If it does not, it can look it up in your MySQL database, save it in redis, and then return it to the controller logic (so it'll be able to get it from cache next time).
Your next problem is known for being a very pesky problem in computer science. It's "Cache Invalidation", in this case if your user profile updates you want to "invalidate" your cache. A way of doing this would be to update your cached version when the user updates their profile (or any other data saved). Alternatively, you could also just remove the cached version from redis and then next time it's requested from getUserProfile, it will be fetched from the DB fresh, and then put into redis for next time.
There are many other ways to approach this, but this will most likely solve your problem in the simplest way without too much overhead. It will also be easy to expand in the future!

Couchdb / Pouchdb Relation between multiple users and multiple documents

I have a problematic here:
I'm builing a mobile app with ionic frmaework who needs to be able to work offline.
I'd like to adopt the CouchDB / PouchDB solution. But to do that i need to know how to put my data in the noSQl datatbase (MySQL user before ...). So, nosql is new to me but it seems interesting.
So my app has a connection part so a user database. And each user has documents who are attached to him. But many users can have many documents (sharing documents). And I want to replicate the data of one user (so his information + his documents on the mobile app).
What I thought is this:
One database per. One database for all Document with a server filtering to send only the documents that belongs to the user.
And on the client side I'd juste have to call :
var localDB = new PouchDB("myuser");
var remoteDB = new PouchDB("http://128.199.48.178:5984/myuser");
localDB.sync(remoteDB, {
live: true
});
And like that on the client side I'd have something like that :
{
username: "myuser",
birthday : "Date",
documents : [{
"_id": "2",
"szObject": "My Document",
},
{
"_id": "85",
"szObject": "My Document",
}]
}
Do you think something like that is possible using Couchdb and pouchdb, and if yes, am I thinking about it the right way?
I read it's not a problem to have one database per document, but I don't know if the replication will work like I imagine it
Plain CouchDB doesn't have any per-document access options, but these could be your solutions:
A. Create a View, then sync Pouch-To-Couch with a filter. But although this will only sync the documents that the user is supposed to see, anyone with enough knowledge could alter the code and view someone else's documents or just do anything with the database actually (probably not what you're looking for).
B. Create a master DB with all documents, then a database for each user, and a filtered replication between the master & per-user-dbs. Probably the simplest and most proper way to handle this.
C. Unfortunately there isn't a validate_doc_read (as there is a validate_doc_update) but perhaps you could make a simple HTTP proxy, which would parse out incoming JSON, check if a particular user can view it and if not, throw a 403 Forbidden. Well you'd also have to catch any views that query with include_docs=true.
(late reply, I hope it's still useful - or if not, that you found a good solution for your problem)

Prevent race condition when concurrent API calls that write to database occurs (Or when the server is slow)

Lets imagine a scenario where you would have an endpoint used to create a user. This would be within a restful application, so lets imagine that a rich client calls this API endpoint.
exports.createUser = function(req,res){
if(req.body){
//Check if email has already been used
db.User.find({where:{email:req.body.email}}).success(function(user){
if(user === null || user === undefined){
//Create user
res.send(201);
} else {
res.json(409,{error: 'User already exists'});
}
});
} else {
res.send(400);
}
};
If I were to call this endpoint multiple time really fast, it would be possible to create multiple records with the same email in the database, even though you queryed the user table to make sure there would be no duplicate.
I'm sure this is a common problem, but how would one go about preventing this issue? I tough limiting the number of request to a certain endpoints, but that doesn't seem like a very good solution.
Any ideas?
Thank you very much!
The simplest option is to LOCK TABLE "users" IN EXCLUSIVE MODE at the beginning the transaction that does the find then the insert. This ensures that only one transaction can be writing to the table at a time.
For better concurrency, you can:
Define a UNIQUE constraint on email, then skip the find step. Attempt the insert and if it fails, trap the error and report a duplicate; or
Use one of the insert-if-not-exists techniques known to be concurrency-safe
If using a unique constraint, one thing to consider is that your app might mark users as disabled w/o deleting them, and probably doesn't want to force email addresses to be unique for disabled users. If so, you might want a partial unique index instead (see the docs).

Adding a user to PFRelation using Parse Cloud Code

I am using Parse.com with my iPhone app.
I ran into a problem earlier where I was trying to add the currently logged in user to another user's PFRelation key/column called "friendsRelation" which is basically the friends list.
The only problem, is that you are not allowed to save changes to any other users besides the one that is currently logged in.
I then learned, that there is a workaround you can use, using the "master key" with Parse Cloud Code.
I ended up adding the code here to my Parse Cloud Code: https://stackoverflow.com/a/18651564/3344977
This works great and I can successfully test this and add an NSString to a string column/key in the Parse database.
However, I do not know how to modify the Parse Cloud Code to let me add a user to another user's PFRelation column/key.
I have been trying everything for the past 2 hours with the above Parse Cloud Code I linked to and could not get anything to work, and then I realized that my problem is with the actual cloud code, not with how I'm trying to use it in xcode, because like I said I can get it to successfully add an NSString object for testing purposes.
My problem is that I do not know javascript and don't understand the syntax, so I don't know how to change the Cloud Code which is written in javascript.
I need to edit the Parse Cloud Code that I linked to above, which I will also paste below at the end of this question, so that I can add the currently logged in PFUser object to another user's PFRelation key/column.
The code that I would use to do this in objective-c would be:
[friendsRelation addObject:user];
So I am pretty sure it is the same as just adding an object to an array, but like I said I don't know how to modify the Parse Cloud Code because it's in javascript.
Here is the Parse Cloud Code:
Parse.Cloud.define('editUser', function(request, response) {
var userId = request.params.userId,
newColText = request.params.newColText;
var User = Parse.Object.extend('_User'),
user = new User({ objectId: userId });
user.set('new_col', newColText);
Parse.Cloud.useMasterKey();
user.save().then(function(user) {
response.success(user);
}, function(error) {
response.error(error)
});
});
And then here is how I would use it in xcode using objective-c:
[PFCloud callFunction:#"editUser" withParameters:#{
#"userId": #"someuseridhere",
#"newColText": #"new text!"
}];
Now it just needs to be modified for adding the current PFUser to another user's PFRelation column/key, which I am pretty sure is technically just adding an object to an array.
This should be fairly simple for someone familiar with javascript, so I really appreciate the help.
Thank you.
I would recommend that you rethink your data model, and extract the followings out of the user table. When you plan a data model, especially for a NoSQL database, you should think about your queries first and plan your structure around that. This is especially true for mobile applications, as server connections are costly and often introduces latency issues if your app performs lots of connections.
Storing followings in the user class makes it easy to find who a person is following. But how would you solve the task of finding all users who follow YOU? You would have to check all users if you are in their followings relation. That would not be an efficient query, and it does not scale well.
When planning a social application, you should build for scalabilty. I don't know what kind of social app you are building, but imagine if the app went ballistic and became a rapidly growing success. If you didn't build for scalability, it would quickly fall apart, and you stood the chance of losing everything because the app suddenly became sluggish and therefore unusable (people have almost zero tolerance for waiting on mobile apps).
Forget all previous prioities about consistency and normalization, and design for scalability.
For storing followings and followers, use a separate "table" (Parse class) for each of those two. For each user, store an array of all usernames (or their objectId) they follow. Do the same for followers. This means that when YOU choose to follow someone, TWO tables need to be updated: you add the other user's username to the array of who you follow (in the followings table), and you also add YOUR username to the array of the other user's followers table.
Using this method, getting a list of followers and followings is extremely fast.
Have a look at this example implementation of Twitter for the Cassandra NoSQL database:
https://github.com/twissandra/twissandra

Categories

Resources