Understanding Many-to-Many relationships in MongoDB and how to dereference collections - javascript

I've spent some time researching MongoDB alternatives for implementing a many-to-many relationships including several stackoverflow articles (here and here) and these slides.
I am creating an app using the MEAN stack and I'm trying to get confirmation on my schema setup and best practices in dereferencing a collection of objects.
I have a basic many-to-many relationship between users and meetings (think scheduling meetings for users where a user can be in many meetings and a meeting contains several users).
Given my use case I think it's best that I use referencing rather than embedding. I believe (from what I've read) that it would be better to use embedding only if my meetings had users unique to a single meeting. In my case these same users are shared across meetings. Also, although updating users would be infrequent (e.g., change username, password) I still feel that using a reference feels right - although I'm open to opinions.
Assuming I went with references I have the following (simplified) schema:
var MeetingSchema = new Schema({
description: {
type: String,
default: '',
required: 'Please fill in a description for the meeting',
trim: true
},
location: {
type: String,
default: '',
required: 'Please fill in a location for the meeting',
trim: true
},
users: [ {
type: Schema.ObjectId,
ref: 'User'
} ]
});
var UserSchema = new Schema({
firstName: {
type: String,
trim: true,
default: '',
validate: [validateLocalStrategyProperty, 'Please fill in your first name']
},
lastName: {
type: String,
trim: true,
default: '',
validate: [validateLocalStrategyProperty, 'Please fill in your last name']
},
email: {
type: String,
trim: true,
default: '',
validate: [validateLocalStrategyProperty, 'Please fill in your email'],
match: [/.+\#.+\..+/, 'Please fill a valid email address']
},
username: {
type: String,
unique: true,
required: 'Please fill in a username',
trim: true
},
password: {
type: String,
default: '',
validate: [validateLocalStrategyPassword, 'Password should be longer']
}
});
First, you will notice that I don't have a collection of meetings in users. I decided not to add this collection because I believe I could use the power of a MongoDB find to obtain all meetings associated with a specific user - i.e.,
db.meetings.find({users:ObjectId('x123')});
Of course I would need to add some indexes.
Now if I'm looking to deference my users for a specific meeting, how do I do that? For those who understand rails and know the different between :include and :join I'm looking for a similar concept. I understand we are not dealing with joins in MongoDB, but for me in order to dereference the users collection from the meeting to get a user's first and last name I would need to cycle through the collection of id's and perform some sort of a db.users.find() for each id. I assume there's some easy MongoDB call I can make to get this to occur in a performant way.

For a discussion of schema design in MongoDB, covering exactly this topic, I refer you to these postings on the MongoDB blog:
Part 1
Part 2
Part 3
In particular, look at the sample JavaScript code showing you how to do the application-level joins.

Related

Biased Random w/ MongoDB and Javascript

Currently I am making a system where users submit images, they get put into a database by the use of a Schema and then users can use a command (through discord/discordjs) to pull a random biased/weighted image (and therefore document) from mongoDB and have it sent to them with options to vote on the image or report it.
That is the idea ^, Here is where I am so far:
.
Users can Submit images using a command through discord and it works.
The MongoDB document is made with these values:
const imageSchema = new Schema({
imageId: { type: Number, required: true, index: { unique: true } },
imageLocation: reqString,
votes: {type: Number, required: false, default: 0},
verified: {type: Number, required: false, default: 0},})
Image of what it looks like in mongodb compass for more context
What I am completely stumped on is how to scan all the documents and create a way to pick out one of the images randomly (not THE most highest voted, just have higher number of votes on a document make it show up more), that are most highly voted using JavaScript.
Any suggestions on this would be great, code snippets explaining concepts would be good too.
You can use the aggregation pipeline with the $sample operator.
Something like this should do the trick:
imageSchema.aggregate([
{
$sample: {
size: 1 // <- how many random documents you want
}
}
])

If I want certain workflow governance for different users, do I have to incorporate that in my data model?

I'm new to coding and am running into an issue that conceptually has my brain in a pretzel. I'm going to try my best to explain it and will link to my github as well, but here it is.. The music app I'm building will have three different user "classifications" or ("Class" in my code):
Artists
Venue owners
Fans
The intention is that when a user signs up they will select a persona (similar to Bandcamp), which then determines their functionality.
Here's the workflow:
Everyone is a "user"
Users either
a) registers their venue, or
b) registers their band, or
c) signs up as a fan
Then, venues can create "events" for bands to play but bands cannot create events. Users can browse both venues and bands to see which events they've hosted/played. Fans can attend events.
I can figure out the controls with authorization but I want to make sure I'm setting up my data model correctly -- and this is where I get confused.
Using my artist model as an example, if an artist is also a user, how do I incorporate the user into my artist model? Would adding a "user_id" schema below create a circular reference?
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const artistSchema = new Schema({
name: {
type: String,
required: [true, 'Artist must have a name']
},
genre: {
type: String
},
email: {
type: String,
required: [true, 'Contact email required']
},
location: {
type: String,
required: [true, 'Hometown (so you can be paired with local venues)']
},
})
module.exports = mongoose.model('Artist', artistSchema);
Here's a link to my github if you need more context on the four models. Thank you!!

Users schema with poking other users schema in node.js

I am making an application in which a user can poke other users. Here is the code for the schema designs I have considered. The first is using only a users schema:
const userSchema = new Schema({
name: { type : String},
pokes: [{ type : Schema.Types.ObjectId, ref: 'Users' ,default:null}],
});
Another way is using a pokes schema. Here I'm storing the object ids of the pokes schema in the users schema.
const pokesSchema = new Schema({
from_user_id: { type : Schema.Types.ObjectId, ref: 'Users' ,default:null},
to_user_id: { type : Schema.Types.ObjectId, ref: 'Users' ,default:null},
});
const userSchema = new Schema({
name: { type : String},
pokes: [{ type : Schema.Types.ObjectId, ref: 'Pokes' ,default:null}],
});
In the third way I totally remove the relation between the two schemas:
const pokesSchema = new Schema({
from_user_id: { type : Schema.Types.ObjectId, ref: 'Users' ,default:null},
to_user_id: { type : Schema.Types.ObjectId, ref: 'Users' ,default:null},
});
const userSchema = new Schema({
name: { type : String},
});
In the second and third ways I can query for pokes easily.
I want to know which of the three is the best design and why. Also if userA pokes userB then it can be the case that userB can also poke back to userA. I'm learning node.js currently and am confused about the design in mongoDb.
Alright, so here's the best I can do. You briefly answered my question in my comment above but I'd like to point out it's important to think about what you are doing (or expect to be doing) more and how much more. That aside though, let's take a look at each schema.
const userSchema = new Schema({
name: { type : String},
pokes: [{ type : Schema.Types.ObjectId, ref: 'Users' ,default:null}],
});
When we look at this first one it seems inadequate for your needs. It's a collection of users who have a name and an array of pokes they have made. If we need to know who a user has poked then that's a really easy and fast query - it's right there under their name, search by name or _id and we're done! But what happens when you want to look up who has poked this user? You will need to query every single user and then search through every single pokes array for the original user. If we have m-many users and each has n-many pokes, that's doing m* n tests. Yikes. If m and n get big that's going to be a lot (think of the difference of 100 * 100 vs 10,000 * 10,000 or even more!). Even if you personally are not coding that search in your node then mongo is doing that search. So unless you're sure that looking up who has poked a user is going to be something that is pretty rare this is probably not a good option. Moving on:
const pokesSchema = new Schema({
from_user_id: { type : Schema.Types.ObjectId, ref: 'Users' ,default:null},
to_user_id: { type : Schema.Types.ObjectId, ref: 'Users' ,default:null},
});
const userSchema = new Schema({
name: { type : String},
pokes: [{ type : Schema.Types.ObjectId, ref: 'Pokes' ,default:null}],
});
Now we have a pokes schema, nice! If we wanted to do the search we discussed above we can instead query pokes directly based on to_user_id, and then if we need a name of all the users who initiated the pokes we can just query the users based on its _id. Not bad! We also still have the fast way to get the reverse, aka search for pokes a user has initiated, because there is still pokes in our user schema. What happens when a poke occurs, though? We have to update both schemas. So not only will we do a (relatively easy) insert into pokes, we will have to also update our pokes array of the user who did the poking. This might not be so bad, but what happens if one update fails? Now our data is inconsistent - users and pokes don't match. We're also doubling our updates every poke. This might not be a big deal, and if we're getting a user's pokes much more than we're poking then it might be an ok trade-off, but it becomes a little riskier because we've introduced somewhere we can be inconsistent. Alright, last one:
const pokesSchema = new Schema({
from_user_id: { type : Schema.Types.ObjectId, ref: 'Users' ,default:null},
to_user_id: { type : Schema.Types.ObjectId, ref: 'Users' ,default:null},
});
const userSchema = new Schema({
name: { type : String},
});
First, note that these schemas are still related - the pokes schema reference users. It's just not doubly-related like the last one. Anyway, now we've removed the pokes array from the user schema. Ok! We don't run the risk of having inconsistent data anymore, noice! We're also not doing two updates for every poke, toit! The trade-off is now when we want to get the list of users a user has poked we have to do a similar query to the one we did above when we wanted to get a list of users a user has been poked by. Which isn't so bad, but is certainly not as fast as having the pokes array already sitting there and waiting.
In my opinion, unless you're searching for who users have poked (and not been poked by) significantly more often than doing anything else this third scenario is best. The schemas make logical sense, you're not updating twice. It's what I would use. But as I said, it's very important to consider your particular need and design.
Hope that helps!

MongoDB best practise - One to many relation

According to this post I should embed a "reference": MongoDB relationships: embed or reference?
User.js
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const userSchema = new Schema({
email: {
type: String,
required: true
},
password: {
type: String,
required: true
},
createdEvents: ['Event']
});
module.exports = mongoose.model('User', userSchema);
Event.js
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const eventSchema = new Schema({
title: {
type: String,
required: true
},
description: {
type: String,
required: true
},
price: {
type: Number,
required: true
},
date: {
type: Date,
required: true
}
});
module.exports = mongoose.model('Event', eventSchema);
So an embedded event looks like this in the database:
My Code works but im curious if this is the right way to embed the event. Because every example for one to many relations is made with references and not embedded.
From my experience, using embedding or referencing depends on how much data we are dealing with.
When deciding on which approach to pick, You should always consider:
1- One-To-Few: if you'll have a small number of events added to a user over time, I recommend you to pick the embedding approach as it is simpler to deal with.
2- One-To-Many: if new events are frequently added, you totally should go for referencing to avoid future performance issues.
Why?
When you are frequently adding new events, if they are being embedded inside an user document, that document will grow larger and larger over time. In the future you will probably face issues like I/O overhead. You can catch a glimpse of the evils of large arrays in the article Why shouldn't I embed large arrays in my documents?. Although it's hard to find it written anywhere, large arrays in MongoDB are considered a performance anti-pattern.
If you decide to go for referencing, I suggest reading Building with Patterns: The Bucket Pattern. The article can give you an idea on how to design your user_events collection in a non-relational way.

Which is better in MongoDB: multiple indexes or multiple collections

I'm working on an application that authenticates users using 3rd party services (Facebook, Google, etc.). I give each user an internal id (uuid v4) which is associated with their 3rd party ids. Right now, my (mongoose) user document model looks something like this:
var user = new mongoose.Schema({
uuid: {type: String, required: true, unique: true, index: true, alias: 'userId'},
fbid: {type: String, required: false, index: true, alias: 'facebookId'},
gid: {type: String, required: false, index: true, alias: 'googleId'}
});
Because I can query on any IDs, I need indexes on all of them. I'm thinking that this can become an issue with a large amount of users (or if I add more 3rd party logins (Twitter, LinkedIn, etc.). Now, my question is whether this is the correct way to do this, or if there is a better solution.
One idea I had is having multiple collections, one per ID type. Something like this:
var user = new mongoose.Schema({
uuid: {type: String, required: true, unique: true, index: true, alias: 'userId'},
});
var facebookUser = new mongoose.Schema({
fbid: {type: String, required: false, index: true, alias: 'facebookId'},
userId: {type: Schema.Types.ObjectId, ref: 'user'}
});
This has the advantage of not cluttering the user model and easier sharding, however it means more queries to retrieve a user and even more to create a new user (1. check in facebookUser collection if a user exists, if not, create a new user, save it, then create a new facebookUser with a link towards that new user and then save that).
Which way is "better" (scales well, handles load, etc.)?
The main thing to consider with indexes is that they will fix in memory. Whether you have three indexes in one collection or three collections with one index is irrelevant (as far as the index is concerned). I would lean towards putting them all into one collection for ease of use.

Categories

Resources