I have generated a histogram by the following command:
db.mydb.aggregate([{ $bucketAuto: { groupBy: "$userId", buckets: 1e9 } }])
Assuming I have fewer than 1 billion unique users (and sufficient memory), this gives me the count of documents for each user.
User Docs
===== ====
userA 3
userB 1
userC 5
userD 1
I want to take the result of this histogram and pivot to count the number of users for each document count.
The result would look like:
Docs Users
==== =====
1 2
2 0
3 1
4 0
5 1
Is there a simple, functional, way of doing this in MongoDB?
One thing you can start with is simple $group stage:
db.col.aggregate([
{
$group: {
_id: "$docs",
count: { $sum: 1 }
}
},
{
$project: {
_id: 0,
docs: "$_id",
users: "$count"
}
},
{
$sort: { docs: 1 }
}
])
This will give you below result:
{ "docs" : 1, "users" : 2 }
{ "docs" : 3, "users" : 1 }
{ "docs" : 5, "users" : 1 }
Then docs without users are the missing part. You can add them either from your application or from MongoDB (shown below):
db.col.aggregate([
{
$group: {
_id: "$docs",
count: { $sum: 1 }
}
},
{
$group: {
_id: null,
histogram: { $push: "$$ROOT" }
}
},
{
$project: {
values: {
$map: {
input: { $range: [ { $min: "$histogram._id" }, { $add: [ { $max: "$histogram._id" }, 1 ] } ] },
in: {
docs: "$$this",
users: {
$let: {
vars: {
current: { $arrayElemAt: [ { $filter: { input: "$histogram", as: "h", cond: { $eq: [ "$$h._id", "$$this" ] } } }, 0 ] }
},
in: {
$ifNull: [ "$$current.count", 0 ]
}
}
}
}
}
}
}
},
{
$unwind: "$values"
},
{
$replaceRoot: {
newRoot: "$values"
}
}
])
The idea here is that we can $group by null which produces single document containing all docs from previous stage. Knowing $min and $max values we can generate a $range of numbers and $map that range into either existing counts or default value which is 0. Then we can use $unwind and $replaceRange to get single histogram point per document. Output:
{ "docs" : 1, "users" : 2 }
{ "docs" : 2, "users" : 0 }
{ "docs" : 3, "users" : 1 }
{ "docs" : 4, "users" : 0 }
{ "docs" : 5, "users" : 1 }
mickl's answer definitely got me moving in the right direction. In particular, using $group is a nice improvement over $bucketAuto for this use-case. The trick to layering the histogram was just to use a $group stage more than once within the same aggregate. I guess it's obvious in hindsight.
The complete solution is here:
const h2 = db.mydb.aggregate([
{ $group: { _id: "$userId", count: { $sum: 1 } } },
{ $group: { _id: "$count", count: { $sum: 1 } } },
{ $project: { docs: "$_id", users: "$count" } },
{ $sort: { docs: +1 } }
])
Related
I am trying to apply a group by operation based on month from field From_Date and then calculate the sum of Gross_Amount, Net_Amount and Tax_Amount. Have a look at below mongoDB document sample:
{
"Partner_ID" : "0682047456",
"EarningsData" : [
{
"From_Date" : ISODate("2022-01-10T18:30:00.000Z"),
"Gross_Amount" : 300,
"Net_Amount" : 285,
"Tax_Amount" : 15
},
{
"From_Date" : ISODate("2022-10-01T18:30:00.000Z"),
"Gross_Amount" : 1958,
"Net_Amount" : 1860,
"Quantity" : 979,
"Tax_Amount" : 98
},
],
"createdAt" : ISODate("2023-01-23T16:23:02.430Z")
}
Below is the aggregation query which I have written :
var projectQry = [
{
$match: {
"Partner_ID": userId
}
},
{
$unwind: "$EarningsData"
},
{
$group: {
_id: {
$month: "$EarningsData.From_Date"
},
Gross: {
$sum: "$EarningsData.Gross_Amount"
},
Tax: {
$sum: "$EarningsData.Tax_Amount"
},
Net: {
$sum: "$EarningsData.Net_Amount"
},
}
},
{
$project: {
_id: 0,
Month: "$_id",
Gross: 1,
Tax: 1,
Net: 1
}
}
];
Everything is working fine and I am getting the output also. But, I need to sort that output based on Month. I tried to apply sort pipeline at the end as follows
{
$sort: {
Month: 1
}
},
But the problem happening here is previous year Dec month is coming after Jan month of current year.
NOTE: The From_Date field contains the date of either current year or last year only. It will never go beyond last year.
If I understand what you are trying to do, you should group by <year, month> and perform sorting on these fields.
Note:
Check the data you reported in the question as there are inconsistencies with your pipeline, however they are understandable.
The aggregation pipeline should look as follows:
db.getCollection("test01").aggregate([
{
$match: {
"Partner_ID": "0682047456"
}
},
{
$unwind: "$EarningsData"
},
{
$group: {
_id: {
year: { $year: "$EarningsData.From_Date", },
month: { $month: "$EarningsData.From_Date" }
},
Gross: {
$sum: "$EarningsData.Gross_Amount"
},
Tax: {
$sum: "$EarningsData.Tax_Amount"
},
Net: {
$sum: "$EarningsData.Net_Amount"
},
}
},
{
$project: {
_id: 0,
Date: "$_id",
Gross: 1,
Tax: 1,
Net: 1
}
},
{
$sort: {
"Date.year": 1,
"Date.month": 1,
}
}
]);
For example, I have something in my database like in customers collection.
{
Max: {
shoping_list: {
food: { Pizza: 2, Ramen: 1, Sushi: 5 }
}
},
John: {
shoping_list: {
food: { Pizza: 2, Ramen: 1, Burger: 1 }
}
}
}
In my backend, I want to get the sum of food
const request = await customers.aggregate([
{
$group: {
_id: null,
Pizza: {
$sum: '$shoping_list.food.Pizza',
},
Is there a way how to update or get the sum automatically without manually writing every food from the shopping_list?
The design of the document may lead the query looks complex but still achievable.
$replaceRoot - Replace the input document with a new document.
1.1. $reduce - Iterate the array and transform it into a new form (array).
1.2. input - Transform key-value pair of current document $$ROOT to an array of objects such as: [{ k: "", v: "" }]
1.3. initialValue - Initialize the value with an empty array. And this will result in the output in the array.
1.4. in
1.4.1. $concatArrays - Combine aggregate array result ($$value) with 1.4.2.
1.4.2. With the $cond operator to filter out the document with { k: "_id" }, and we transform the current iterate object's v shoping_list.food to the array via $objectToArray.
$unwind - Deconstruct the foods array into multiple documents.
$group - Group by foods.k and perform sum for foods.v.
db.collection.aggregate([
{
$replaceRoot: {
newRoot: {
foods: {
$reduce: {
input: {
$objectToArray: "$$ROOT"
},
initialValue: [],
in: {
$concatArrays: [
"$$value",
{
$cond: {
if: {
$ne: [
"$$this.k",
"_id"
]
},
then: {
$objectToArray: "$$this.v.shoping_list.food"
},
else: []
}
}
]
}
}
}
}
}
},
{
$unwind: "$foods"
},
{
$group: {
_id: "$foods.k",
sum: {
$sum: "$foods.v"
}
}
}
])
Demo # Mongo Playground
Using the following code, I do get totalAccount and totalBalance. But, no other field/data is showing up. How can I also get all data from my collection that matches my query (brcode)?
const test = await db.collection('alldeposit').aggregate([
{
$match: {
brcode: brcode
}
},
{
$group: {
_id: null,
totalAccount: {
$sum: 1
},
totalBalance: {
$sum: "$acbal"
}
}
}
]).toArray()
You have to specify which fields you want to see in the $group stage
For example:
await db.collection('alldeposit').aggregate([
{
$match: {
brcode: brcode
}
},
{
$group: {
_id : null,
name : { $first: '$name' },
age : { $first: '$age' },
sex : { $first: '$sex' },
province : { $first: '$province' },
city : { $first: '$city' },
area : { $first: '$area' },
address : { $first: '$address' },
totalAccount: {
$sum: 1
},
totalBalance: {
$sum: "$acbal"
}
}
}]);
Edit:
Regarding our chat in the comments, unfortunately I don't know a way to do the operation you asked in a single aggregation.
But with two steps, you can do it:
First step:
db.collection.aggregate([
{
$match: {
brcode: brcode
}
},
{
"$group": {
"_id": null,
totalAccount: {
$sum: 1
},
totalBalance: {
$sum: "$acbal"
}
}
}
])
And second step:
db.collection.update(
{ brcode: brcode }
,{$set : {
"totalAccount": totalAccount,
"totalBalance": totalBalance
}}
)
I need to count records grouped by tags and have filtered bofore including in ones
// in db
{tags: ['video', 'Alex'], ... },
{tags: ['video', 'John'], ... },
{tags: ['video', 'John'], ... },
{tags: ['text', 'Alex'], ... },
{tags: ['text', 'John'], ... },
client.db('mydb').collection('Files').aggregate(
[
{ $group: { _id: { tags: '$tags' }, total: { $sum: 1 } } },
{ $match: { tags: 'video' } },
],
).toArray()
But sadly I got zero docs. If remove $group section I got 3 docs.
In original request I anticipated 2 docs
{ _id: ['video', 'Alex'], total: 1 },
{ _id: ['video', 'John'], total: 2 }
In aggregation the order of pipeline is important, as output of previous stage is fed to the next one.
Your query is almost there basis the expected output. Just move $match stage before the $group stage.
Query:
db.collection.aggregate([
{
$match: {
"tags": "video"
}
},
{
$group: {
_id: {
tags: "$tags"
},
total: {
$sum: 1
}
}
}
]);
Working Example
I'm trying to find stocks in the Stock collection where the sum of all owners' shares is less than 100. Here is my schema.
const stockSchema = new mongoose.Schema({
owners: [
{
owner: {
type: Schema.Types.ObjectId,
ref: "Owner"
},
shares: {
type: Number,
min: 0,
max: 100
}
}
]
}
const Stock = mongoose.model("Stock", stockSchema);
I've tried to use aggregate but it returns a single object computed over all stocks in the collection, as opposed to multiple objects with the sum of each stock's shares.
stockSchema.statics.getUnderfundedStocks = async () => {
const result = await Stock.aggregate([
{ $unwind: "$owners" },
{ $group: { _id: null, shares: { $sum: "$owners.shares" } } },
{ $match: { shares: { $lt: 100 } } }
]);
return result;
};
So, rather than getting:
[ { _id: null, shares: 150 } ] from getUnderfundedStocks, I'm looking to get:
[ { _id: null, shares: 90 }, { _id: null, shares: 60 } ].
I've come across $expr, which looks useful, but documentation is scarce and not sure if that's the appropriate path to take.
Edit: Some document examples:
/* 1 */
{
"_id" : ObjectId("5ea699fb201db57b8e4e2e8a"),
"owners" : [
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c72"),
"shares" : 85
}
]
}
/* 2 */
{
"_id" : ObjectId("5ea699fb201db57b8e4e2e1e"),
"owners" : [
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c72"),
"shares" : 20
},
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c73"),
"shares" : 50
},
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c74"),
"shares" : 30
}
]
}
I'd like to return an array that just includes document #1.
You do not need to use $group here. Simply use $project with $sum operator.
db.collection.aggregate([
{ "$project": {
"shares": { "$sum": "$owners.shares" }
}},
{ "$match": { "shares": { "$lt": 100 } } }
])
Or even you do not need to use aggregation here
db.collection.find({
"$expr": { "$lt": [{ "$sum": "$owners.shares" }, 100] }
})
MongoPlayground