histogram the result of a histogram - javascript

I have generated a histogram by the following command:
db.mydb.aggregate([{ $bucketAuto: { groupBy: "$userId", buckets: 1e9 } }])
Assuming I have fewer than 1 billion unique users (and sufficient memory), this gives me the count of documents for each user.
User Docs
===== ====
userA 3
userB 1
userC 5
userD 1
I want to take the result of this histogram and pivot to count the number of users for each document count.
The result would look like:
Docs Users
==== =====
1 2
2 0
3 1
4 0
5 1
Is there a simple, functional, way of doing this in MongoDB?

One thing you can start with is simple $group stage:
db.col.aggregate([
{
$group: {
_id: "$docs",
count: { $sum: 1 }
}
},
{
$project: {
_id: 0,
docs: "$_id",
users: "$count"
}
},
{
$sort: { docs: 1 }
}
])
This will give you below result:
{ "docs" : 1, "users" : 2 }
{ "docs" : 3, "users" : 1 }
{ "docs" : 5, "users" : 1 }
Then docs without users are the missing part. You can add them either from your application or from MongoDB (shown below):
db.col.aggregate([
{
$group: {
_id: "$docs",
count: { $sum: 1 }
}
},
{
$group: {
_id: null,
histogram: { $push: "$$ROOT" }
}
},
{
$project: {
values: {
$map: {
input: { $range: [ { $min: "$histogram._id" }, { $add: [ { $max: "$histogram._id" }, 1 ] } ] },
in: {
docs: "$$this",
users: {
$let: {
vars: {
current: { $arrayElemAt: [ { $filter: { input: "$histogram", as: "h", cond: { $eq: [ "$$h._id", "$$this" ] } } }, 0 ] }
},
in: {
$ifNull: [ "$$current.count", 0 ]
}
}
}
}
}
}
}
},
{
$unwind: "$values"
},
{
$replaceRoot: {
newRoot: "$values"
}
}
])
The idea here is that we can $group by null which produces single document containing all docs from previous stage. Knowing $min and $max values we can generate a $range of numbers and $map that range into either existing counts or default value which is 0. Then we can use $unwind and $replaceRange to get single histogram point per document. Output:
{ "docs" : 1, "users" : 2 }
{ "docs" : 2, "users" : 0 }
{ "docs" : 3, "users" : 1 }
{ "docs" : 4, "users" : 0 }
{ "docs" : 5, "users" : 1 }

mickl's answer definitely got me moving in the right direction. In particular, using $group is a nice improvement over $bucketAuto for this use-case. The trick to layering the histogram was just to use a $group stage more than once within the same aggregate. I guess it's obvious in hindsight.
The complete solution is here:
const h2 = db.mydb.aggregate([
{ $group: { _id: "$userId", count: { $sum: 1 } } },
{ $group: { _id: "$count", count: { $sum: 1 } } },
{ $project: { docs: "$_id", users: "$count" } },
{ $sort: { docs: +1 } }
])

Related

MongoDb How to group by month and then sort based on month?

I am trying to apply a group by operation based on month from field From_Date and then calculate the sum of Gross_Amount, Net_Amount and Tax_Amount. Have a look at below mongoDB document sample:
{
"Partner_ID" : "0682047456",
"EarningsData" : [
{
"From_Date" : ISODate("2022-01-10T18:30:00.000Z"),
"Gross_Amount" : 300,
"Net_Amount" : 285,
"Tax_Amount" : 15
},
{
"From_Date" : ISODate("2022-10-01T18:30:00.000Z"),
"Gross_Amount" : 1958,
"Net_Amount" : 1860,
"Quantity" : 979,
"Tax_Amount" : 98
},
],
"createdAt" : ISODate("2023-01-23T16:23:02.430Z")
}
Below is the aggregation query which I have written :
var projectQry = [
{
$match: {
"Partner_ID": userId
}
},
{
$unwind: "$EarningsData"
},
{
$group: {
_id: {
$month: "$EarningsData.From_Date"
},
Gross: {
$sum: "$EarningsData.Gross_Amount"
},
Tax: {
$sum: "$EarningsData.Tax_Amount"
},
Net: {
$sum: "$EarningsData.Net_Amount"
},
}
},
{
$project: {
_id: 0,
Month: "$_id",
Gross: 1,
Tax: 1,
Net: 1
}
}
];
Everything is working fine and I am getting the output also. But, I need to sort that output based on Month. I tried to apply sort pipeline at the end as follows
{
$sort: {
Month: 1
}
},
But the problem happening here is previous year Dec month is coming after Jan month of current year.
NOTE: The From_Date field contains the date of either current year or last year only. It will never go beyond last year.
If I understand what you are trying to do, you should group by <year, month> and perform sorting on these fields.
Note:
Check the data you reported in the question as there are inconsistencies with your pipeline, however they are understandable.
The aggregation pipeline should look as follows:
db.getCollection("test01").aggregate([
{
$match: {
"Partner_ID": "0682047456"
}
},
{
$unwind: "$EarningsData"
},
{
$group: {
_id: {
year: { $year: "$EarningsData.From_Date", },
month: { $month: "$EarningsData.From_Date" }
},
Gross: {
$sum: "$EarningsData.Gross_Amount"
},
Tax: {
$sum: "$EarningsData.Tax_Amount"
},
Net: {
$sum: "$EarningsData.Net_Amount"
},
}
},
{
$project: {
_id: 0,
Date: "$_id",
Gross: 1,
Tax: 1,
Net: 1
}
},
{
$sort: {
"Date.year": 1,
"Date.month": 1,
}
}
]);

MongoDB Aggregation - How to get/update sum

For example, I have something in my database like in customers collection.
{
Max: {
shoping_list: {
food: { Pizza: 2, Ramen: 1, Sushi: 5 }
}
},
John: {
shoping_list: {
food: { Pizza: 2, Ramen: 1, Burger: 1 }
}
}
}
In my backend, I want to get the sum of food
const request = await customers.aggregate([
{
$group: {
_id: null,
Pizza: {
$sum: '$shoping_list.food.Pizza',
},
Is there a way how to update or get the sum automatically without manually writing every food from the shopping_list?
The design of the document may lead the query looks complex but still achievable.
$replaceRoot - Replace the input document with a new document.
1.1. $reduce - Iterate the array and transform it into a new form (array).
1.2. input - Transform key-value pair of current document $$ROOT to an array of objects such as: [{ k: "", v: "" }]
1.3. initialValue - Initialize the value with an empty array. And this will result in the output in the array.
1.4. in
1.4.1. $concatArrays - Combine aggregate array result ($$value) with 1.4.2.
1.4.2. With the $cond operator to filter out the document with { k: "_id" }, and we transform the current iterate object's v shoping_list.food to the array via $objectToArray.
$unwind - Deconstruct the foods array into multiple documents.
$group - Group by foods.k and perform sum for foods.v.
db.collection.aggregate([
{
$replaceRoot: {
newRoot: {
foods: {
$reduce: {
input: {
$objectToArray: "$$ROOT"
},
initialValue: [],
in: {
$concatArrays: [
"$$value",
{
$cond: {
if: {
$ne: [
"$$this.k",
"_id"
]
},
then: {
$objectToArray: "$$this.v.shoping_list.food"
},
else: []
}
}
]
}
}
}
}
}
},
{
$unwind: "$foods"
},
{
$group: {
_id: "$foods.k",
sum: {
$sum: "$foods.v"
}
}
}
])
Demo # Mongo Playground

How to get grouped data as well as all data using mongodb?

Using the following code, I do get totalAccount and totalBalance. But, no other field/data is showing up. How can I also get all data from my collection that matches my query (brcode)?
const test = await db.collection('alldeposit').aggregate([
{
$match: {
brcode: brcode
}
},
{
$group: {
_id: null,
totalAccount: {
$sum: 1
},
totalBalance: {
$sum: "$acbal"
}
}
}
]).toArray()
You have to specify which fields you want to see in the $group stage
For example:
await db.collection('alldeposit').aggregate([
{
$match: {
brcode: brcode
}
},
{
$group: {
_id : null,
name : { $first: '$name' },
age : { $first: '$age' },
sex : { $first: '$sex' },
province : { $first: '$province' },
city : { $first: '$city' },
area : { $first: '$area' },
address : { $first: '$address' },
totalAccount: {
$sum: 1
},
totalBalance: {
$sum: "$acbal"
}
}
}]);
Edit:
Regarding our chat in the comments, unfortunately I don't know a way to do the operation you asked in a single aggregation.
But with two steps, you can do it:
First step:
db.collection.aggregate([
{
$match: {
brcode: brcode
}
},
{
"$group": {
"_id": null,
totalAccount: {
$sum: 1
},
totalBalance: {
$sum: "$acbal"
}
}
}
])
And second step:
db.collection.update(
{ brcode: brcode }
,{$set : {
"totalAccount": totalAccount,
"totalBalance": totalBalance
}}
)

MongoDB agregation with filtering in array

I need to count records grouped by tags and have filtered bofore including in ones
// in db
{tags: ['video', 'Alex'], ... },
{tags: ['video', 'John'], ... },
{tags: ['video', 'John'], ... },
{tags: ['text', 'Alex'], ... },
{tags: ['text', 'John'], ... },
client.db('mydb').collection('Files').aggregate(
[
{ $group: { _id: { tags: '$tags' }, total: { $sum: 1 } } },
{ $match: { tags: 'video' } },
],
).toArray()
But sadly I got zero docs. If remove $group section I got 3 docs.
In original request I anticipated 2 docs
{ _id: ['video', 'Alex'], total: 1 },
{ _id: ['video', 'John'], total: 2 }
In aggregation the order of pipeline is important, as output of previous stage is fed to the next one.
Your query is almost there basis the expected output. Just move $match stage before the $group stage.
Query:
db.collection.aggregate([
{
$match: {
"tags": "video"
}
},
{
$group: {
_id: {
tags: "$tags"
},
total: {
$sum: 1
}
}
}
]);
Working Example

Mongo find by sum of subdoc array

I'm trying to find stocks in the Stock collection where the sum of all owners' shares is less than 100. Here is my schema.
const stockSchema = new mongoose.Schema({
owners: [
{
owner: {
type: Schema.Types.ObjectId,
ref: "Owner"
},
shares: {
type: Number,
min: 0,
max: 100
}
}
]
}
const Stock = mongoose.model("Stock", stockSchema);
I've tried to use aggregate but it returns a single object computed over all stocks in the collection, as opposed to multiple objects with the sum of each stock's shares.
stockSchema.statics.getUnderfundedStocks = async () => {
const result = await Stock.aggregate([
{ $unwind: "$owners" },
{ $group: { _id: null, shares: { $sum: "$owners.shares" } } },
{ $match: { shares: { $lt: 100 } } }
]);
return result;
};
So, rather than getting:
[ { _id: null, shares: 150 } ] from getUnderfundedStocks, I'm looking to get:
[ { _id: null, shares: 90 }, { _id: null, shares: 60 } ].
I've come across $expr, which looks useful, but documentation is scarce and not sure if that's the appropriate path to take.
Edit: Some document examples:
/* 1 */
{
"_id" : ObjectId("5ea699fb201db57b8e4e2e8a"),
"owners" : [
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c72"),
"shares" : 85
}
]
}
/* 2 */
{
"_id" : ObjectId("5ea699fb201db57b8e4e2e1e"),
"owners" : [
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c72"),
"shares" : 20
},
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c73"),
"shares" : 50
},
{
"owner" : ObjectId("5ea62a94ccb1b974d40a2c74"),
"shares" : 30
}
]
}
I'd like to return an array that just includes document #1.
You do not need to use $group here. Simply use $project with $sum operator.
db.collection.aggregate([
{ "$project": {
"shares": { "$sum": "$owners.shares" }
}},
{ "$match": { "shares": { "$lt": 100 } } }
])
Or even you do not need to use aggregation here
db.collection.find({
"$expr": { "$lt": [{ "$sum": "$owners.shares" }, 100] }
})
MongoPlayground

Categories

Resources