I am writing an aggregation pipeline to return a win ratio. When I use $sum the value is output from $facet $project within an array. This has me confused. To solve the issue I simply run $sum on the arrays when I calculate the winRatio, which works fine. How do I use $project without it adding values into an array?
Round.aggregate([
{
$match: {
$and: query,
},
},
{
$facet: {
wins: [
{
$match: {
winner: user,
},
},
{
$group: {
_id: { user: '$scores.player', game: '$game' },
value: { $sum: 1 }, // value *not* within array
},
},
],
rounds: [
{
$unwind: '$scores',
},
{
$match: {
'scores.player': user,
},
},
{
$group: {
_id: { user: '$scores.player', game: '$game' },
value: { $sum: 1 }, // value *not* within array
},
},
],
},
},
{
$project: {
_id: '$rounds._id',
rounds: '$rounds.value', // value within an array
wins: '$wins.value', // value within an array
winRatio: { ... },
},
},
]);
Schema:
const schema = new mongoose.Schema(
{
game: { type: mongoose.Schema.ObjectId, required: true },
scores: [
{
player: { type: mongoose.Schema.ObjectId, ref: 'User', required: true },
playerName: { type: String }, // denormalise
score: { type: Number, required: true },
},
],
winner: { type: mongoose.Schema.ObjectId, required: true },
datePlayed: { type: Date },
},
{ timestamps: true },
);
Your asking why $sum 'works' and $project dosent.
Lets start off by understand the output of the $facet phase.
{
"wins" : [
{
"_id" : {
"user" : [
"player1",
"player2"
],
"game" : 1.0
},
"value" : 2.0
}
],
"rounds" : [
{
"_id" : {
"user" : "player1",
"game" : 1.0
},
"value" : 3.0
}
]
}
As we can see each document result is an array, even though you grouped at the end, imagine each result as its own aggregation, that return value is always an array (either empty or not depending on results).
so when you $project on $rounds.value you're telling mongo to keep the value field for each of the results in the array. in our case its only one but still.
$sum on the other hand is an accumulative operator, from the docs:
With a single expression as its operand, if the expression resolves to an array, $sum traverses into the array to operate on the numerical elements of the array to return a single value.
a quick fix to your 'issue' is just to add $sum while projecting:
{
$project: {
_id: '$rounds._id',
rounds: {$sum: '$rounds.value'},
wins: {$sum: '$wins.value'},
winRatio: { ... },
},
},
Related
I have been trying to find the averageSum and averageRating, but I cannot get it done because I do not know how to populate using aggregate or if there is a work around. I have heard of $lookup, but I am not sure how to do it, also it tells me something about atlas tier does not do it. Is there a another way around to this? Can I populate then aggregate or can I find the averageSum and averageRating at the end using another method? Please help me
here is how my schema looks:
const favoriteSchema = new mongoose.Schema({
user: {
type: mongoose.Schema.Types.ObjectId,
ref: "User",
unique: true,
},
favoriteSellers: [
//create array of object id, make sure they are unique for user not to add multiple sellers
{
type: mongoose.Schema.Types.ObjectId,
ref: "Seller",
unique: true,
},
],
});
and here is my Seller schema:
const sellerSchema = new mongoose.Schema({
user: {
type: mongoose.Schema.Types.ObjectId,
ref: "User",
unique: true,
},
business: businessSchema,
sellerType: [String],
reviews: [
{
by: {
type: mongoose.Schema.Types.ObjectId,
ref: "User",
unique: true,
},
title: {
type: String,
},
message: {
type: String,
},
rating: Number,
imagesUri: [String],
timestamp: {
type: Date,
default: Date.now,
},
},
],
...
});
So I have an array of favorite sellers, I want to populate the sellers, then populate the reviews.by and user paths, and then do the calculation for the average sum and do the average rating. If possible please help me. What are my options here? Just do it outside on the expressjs route logic?
Here is my aggregate:
aggregatePipeline.push({
$match: { user: req.user._id },
});
//****** Here is where I want to populate before start the rest **********
then continue to following code because the fields(paths) are not populated so it averageSum will be 0 at all times.
aggregatePipeline.push({
$addFields: {
ratingSum: {
$reduce: {
initialValue: 0,
input: "$favoriteSellers.reviews",
in: { $sum: ["$$value", "$$this.rating"] },
},
},
},
});
//get average of rating ex. seller1 has a 4.5 averageRating field
aggregatePipeline.push({
$addFields: {
averageRating: {
$cond: [
{ $eq: [{ $size: "favoriteSellers.reviews" }, 0] }, //if it does not have any reviews, then we will just send 0
0, //set it to 0
{
$divide: ["$ratingSum", { $size: "$reviews" }], //else we can divide to get average Rating
},
],
},
},
});
let favList = await Favorite.aggregate(aggregatePipeline).exec();
When I retrieve my code, the array looks like:
[
{
_id: new ObjectId("62a7ce9550094eafc7a61233"),
user: new ObjectId("6287e4e61df773752aadc286"),
favoriteSellers: [ new ObjectId("6293210asdce81d9f2ae1685") ],
}
]
Here is a sample on how I want it to look:
(so each seller should have a field of average rating like and averageSum)
_id: 'favorite_id.....'
user: 'my id',
favoriteSellers:[
{
_id: 'kjskjhajkhsjk',
averageRating: 4.6
reviews:[.....],
...
},
{
_id: 'id______hsjk',
averageRating: 2.6
reviews:[.....],
...
},
{
_id: 'kjid______khsjk....',
averageRating: 3.6
reviews:[.....],
...
}
]
User Schema
I have been building a social media application and I have to write a query that returns the user of user. The schema of user is shown below.
const userSchema = Schema(
{
email: {
type: String,
unique: true,
required: [true, "Email is required"],
index: true,
},
active: {
type: Boolean,
default: true,
},
phone: {
type: String,
unique: true,
required: [true, "Phone is required"],
index: true,
},
name: {
required: true,
type: String,
required: [true, "Name is required"],
},
bio: {
type: String,
},
is_admin: {
type: Boolean,
index: true,
default: false,
},
is_merchant: {
type: Boolean,
index: true,
default: false,
},
password: {
type: String,
required: [true, "Password is required"],
},
profile_picture: {
type: String,
},
followers: [
// meaning who has followed me
{
type: Types.ObjectId,
ref: "user",
required: false,
},
],
followings: [
// meaning all of them who I followed
{
type: Types.ObjectId,
ref: "user",
required: false,
},
],
},
{
timestamps: { createdAt: "created_at", updatedAt: "updated_at" },
toObject: {
transform: function (doc, user) {
delete user.password;
},
},
toJSON: {
transform: function (doc, user) {
delete user.password;
},
},
}
);
Follow/following implementation
I have implemented follow/following using the logic shown as below. Each time user follows another user. It would perform 2 queries. One would update the follower followers part using findOneAndUpdate({push:followee._id}) and a second query to update the part of followee user.
Query Response Pattern
I have written a query that should return the response with followings response appended to each user
{
doesViewerFollowsUser: boolean // implying if person we are viewing profile of follows us
doesUserFollowsViewer: boolean // implying if person we are viewing profile of follows us
}
The actual query
The query must looks like this
userModel
.aggregate([
{
$match: {
_id: {
$in: [new Types.ObjectId(userId), new Types.ObjectId(viewerId)],
},
},
},
{
$addFields: {
order: {
$cond: [
{
$eq: ["$_id", new Types.ObjectId(viewerId)], // testing for viewer
},
2,
1,
],
},
},
},
{
$group: {
_id: 0,
subjectFollowings: {
$first: "$followings",
},
viewerFollowings: {
$last: "$followings",
},
viewerFollowers: {
$last: "$followers",
},
},
},
{
$lookup: {
from: "users",
localField: "subjectFollowings",
foreignField: "_id",
as: "subjectFollowings",
},
},
{
$project: {
subjectFollowings: {
$map: {
input: "$subjectFollowings",
as: "user",
in: {
$mergeObjects: [
"$$user",
{
doesViewerFollowsUser: {
$cond: [
{
$in: ["$$user._id", "$viewerFollowers"],
},
true,
false,
],
},
},
{
doesUserFollowsViewer: {
$cond: [
{
$in: ["$$user._id", "$viewerFollowings"],
},
true,
false,
],
},
},
],
},
},
},
},
},
{
$project: {
"subjectFollowings.followings": 0,
"subjectFollowings.followers": 0,
"subjectFollowings.bio": 0,
"subjectFollowings.password": 0,
"subjectFollowings.is_admin": 0,
"subjectFollowings.is_merchant": 0,
"subjectFollowings.email": 0,
"subjectFollowings.phone": 0,
"subjectFollowings.created_at": 0,
"subjectFollowings.updated_at": 0,
"subjectFollowings.__v": 0,
},
},
])
The problem
I don't think the current query scales that much. The worst case complexity for this query reaches 0(n^2) (approximately). So, please help me optimize this query.
The problem is with your data modeling. You shound not store follower/following in an array because:
Mongodb has a 16mb hard limit for every document, which means you can store limited data in a single document
Arrays lookups will take linear time; larger the array, longer it will take to query it.
What you can do is have a collection for user relationships like so:
follower: user id
followee: user id
You can then create a compound index on follower-followee and query effectively to check who follows who. You can also enable timestamps here.
In order to get all followers of a user, just create an index on followee key and this will also resolve quickly
Background:
A customer is an object that has a name field.
A line is an object that has the following fields:
inLine - an array of customers
currentCustomer - a customer
processed - an array of customers
The collection 'line' contains documents that are line objects.
Problem:
I'm trying to implement a procedure which would do the following:
Push currentCustomer to processed
Set currentCustomer to the 1st element in inLine
Pop the 1st element of inLine
Since the new value of a field depends on the previous value of another, atomicity is important here.
What I tried so far:
Naive approach
db.collection('line').findOneAndUpdate({
_id: new ObjectId(lineId),
}, {
$set: {
currentCustomer: '$inLine.0',
},
$pop: {
inLine: -1,
},
$push: {
processed: '$currentCustomer',
},
});
However, currentCustomer is set to a string which is literally "$inLine.0" and processed has a string which is literally "$currentCustomer".
Aggregation approach
db.collection('line').findOneAndUpdate({
_id: new ObjectId(lineId),
}, [{
$set: {
currentCustomer: '$inLine.0',
},
$pop: {
inLine: -1,
},
$push: {
processed: '$currentCustomer',
},
}]);
However, I got the following error:
MongoError: A pipeline stage specification object must contain exactly one field.
Multi-stage aggregation approach
db.collection('line').findOneAndUpdate({
_id: new ObjectId(lineId),
}, [{
$set: {
currentCustomer: '$inLine.0',
},
}, {
$pop: {
inLine: -1,
},
}, {
$push: {
processed: '$currentCustomer',
},
}]);
However, $pop and $push are Unrecognized pipeline stage names.
I tried making it using only $set stages, but it ended up very ugly and I still couldn't get it to work.
Based on turivishal's answer, it was solved like so:
db.collection('line').findOneAndUpdate({
_id: new ObjectId(lineId),
}, [{
$set: {
// currentCustomer = inLine.length === 0 ? null : inLine[0]
currentCustomer: {
$cond: [
{ $eq: [{ $size: '$inLine' }, 0] },
null,
{ $first: '$inLine' },
],
},
// inLine = inLine.slice(1)
inLine: {
$cond: [
{ $eq: [{ $size: '$inLine' }, 0] },
[],
{ $slice: ['$inLine', 1, { $size: '$inLine' }] },
],
},
// if currentCustomer !== null then processed.push(currentCustomer)
processed: {
$cond: [
{
$eq: ['$currentCustomer', null],
},
'$processed',
{
$concatArrays: [
'$processed', ['$currentCustomer'],
],
}
],
},
},
}]);
I don't think its possible with simple update using $push or $pop.
As per your experiment, the aggregation can not support direct $push, $pop stage in root level, so I have corrected your query,
currentCustomer check condition if size of inLine is 0 then return null otherwise get first element from inLine array using $arrayElemAt,
inLine check condition if size of inLine is 0 then return [] otherwise remove first element from inLine array using $slice and $size
processed concat both arrays using $concatArrays, $ifNull to check if field is null then return blank array, check condition if currentCustomer null then return [] otherwise return currentCustomer
db.collection('line').findOneAndUpdate(
{ _id: new ObjectId(lineId), },
[{
$set: {
currentCustomer: {
$cond: [
{ $eq: [{ $size: "$inLine" }, 0] },
null,
{ $arrayElemAt: ["$inLine", 0] }
]
},
inLine: {
$cond: [
{ $eq: [{ $size: "$inLine" }, 0] },
[],
{ $slice: ["$inLine", 1, { $size: "$inLine" }] }
]
},
processed: {
$concatArrays: [
{ $ifNull: ["$processed", []] },
{
$cond: [
{ $eq: ["$currentCustomer", null] },
[],
["$currentCustomer"]
]
}
]
}
}
}]
);
Playground
I'm trying to find stocks in the Stock collection where the sum of all owners' shares is less than 100. Here is my schema.
const stockSchema = new mongoose.Schema({
owners: [
{
owner: {
type: Schema.Types.ObjectId,
ref: "Owner"
},
shares: {
type: Number,
min: 0,
max: 100
}
}
]
}
const Stock = mongoose.model("Stock", stockSchema);
I've tried to use aggregate but it returns a single object computed over all stocks in the collection, as opposed to multiple objects with the sum of each stock's shares.
stockSchema.statics.getUnderfundedStocks = async () => {
const result = await Stock.aggregate([
{ $unwind: "$owners" },
{ $group: { _id: null, shares: { $sum: "$owners.shares" } } },
{ $match: { shares: { $lt: 100 } } }
]);
return result;
};
So, rather than getting:
[ { _id: null, shares: 150 } ] from getUnderfundedStocks, I'm looking to get:
[ { _id: null, shares: 90 }, { _id: null, shares: 60 } ].
I've come across $expr, which looks useful, but documentation is scarce and not sure if that's the appropriate path to take.
Edit: Some document examples:
/* 1 */
{
"_id" : ObjectId("5ea699fb201db57b8e4e2e8a"),
"owners" : [
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c72"),
"shares" : 85
}
]
}
/* 2 */
{
"_id" : ObjectId("5ea699fb201db57b8e4e2e1e"),
"owners" : [
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c72"),
"shares" : 20
},
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c73"),
"shares" : 50
},
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c74"),
"shares" : 30
}
]
}
I'd like to return an array that just includes document #1.
You do not need to use $group here. Simply use $project with $sum operator.
db.collection.aggregate([
{ "$project": {
"shares": { "$sum": "$owners.shares" }
}},
{ "$match": { "shares": { "$lt": 100 } } }
])
Or even you do not need to use aggregation here
db.collection.find({
"$expr": { "$lt": [{ "$sum": "$owners.shares" }, 100] }
})
MongoPlayground
I want to implement retweet feature in my app. I use Mongoose and have User and Message models, and I store retweets as array of objects of type {userId, createdAt} where createdAt is time when retweet occurred. Message model has it's own createdAt field.
I need to create feed of original and retweeted messages merged together based on createdAt fields. I am stuck with merging, whether to do it in a single query or separate and do the merge in JavaScript. Can I do it all in Mongoose with a single query? If not how to find merge insertion points and index of the last message?
So far I just have fetching of original messages.
My Message model:
const messageSchema = new mongoose.Schema(
{
fileId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'File',
required: true,
},
userId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
required: true,
},
likesIds: [{ type: mongoose.Schema.Types.ObjectId, ref: 'User' }],
reposts: [
{
reposterId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
},
createdAt: { type: Date, default: Date.now },
},
],
},
{
timestamps: true,
},
);
Edit: Now I have this but pagination is broken. I am trying to use newCreatedAt field for cursor, that doesn't seem to work. It returns empty array in second call when newCreatedAt is passed from the frontend.
messages: async (
parent,
{ cursor, limit = 100, username },
{ models },
) => {
const user = username
? await models.User.findOne({
username,
})
: null;
const options = {
...(cursor && {
newCreatedAt: {
$lt: new Date(fromCursorHash(cursor)),
},
}),
...(username && {
userId: mongoose.Types.ObjectId(user.id),
}),
};
console.log(options);
const aMessages = await models.Message.aggregate([
{
$addFields: {
newReposts: {
$concatArrays: [
[{ createdAt: '$createdAt', original: true }],
'$reposts',
],
},
},
},
{
$unwind: '$newReposts',
},
{
$addFields: {
newCreatedAt: '$newReposts.createdAt',
original: '$newReposts.original',
},
},
{ $match: options },
{
$sort: {
newCreatedAt: -1,
},
},
{
$limit: limit + 1,
},
]);
const messages = aMessages.map(m => {
m.id = m._id.toString();
return m;
});
//console.log(messages);
const hasNextPage = messages.length > limit;
const edges = hasNextPage ? messages.slice(0, -1) : messages;
return {
edges,
pageInfo: {
hasNextPage,
endCursor: toCursorHash(
edges[edges.length - 1].newCreatedAt.toString(),
),
},
};
},
Here are the queries. The working one:
Mongoose: messages.aggregate([{
'$match': {
createdAt: {
'$lt': 2020 - 02 - 02 T19: 48: 54.000 Z
}
}
}, {
'$sort': {
createdAt: -1
}
}, {
'$limit': 3
}], {})
And the non working one:
Mongoose: messages.aggregate([{
'$match': {
newCreatedAt: {
'$lt': 2020 - 02 - 02 T19: 51: 39.000 Z
}
}
}, {
'$addFields': {
newReposts: {
'$concatArrays': [
[{
createdAt: '$createdAt',
original: true
}], '$reposts'
]
}
}
}, {
'$unwind': '$newReposts'
}, {
'$addFields': {
newCreatedAt: '$newReposts.createdAt',
original: '$newReposts.original'
}
}, {
'$sort': {
newCreatedAt: -1
}
}, {
'$limit': 3
}], {})
This can be done in one query, although its a little hack-ish:
db.collection.aggregate([
{
$addFields: {
reposts: {
$concatArrays: [[{createdAt: "$createdAt", original: true}],"$reports"]
}
}
},
{
$unwind: "$reposts"
},
{
$addFields: {
createdAt: "$reposts.createdAt",
original: "$reposts.original"
}
},
{
$sort: {
createdAt: -1
}
}
]);
You can add any other logic you want to the query using the original field, documents with original: true are the original posts while the others are retweets.