I want to group by name then find the percentage of count of fill document to total document. The data is given below,here(fill:0 means not fill):-
{"name":"Raj","fill":0}
{"name":"Raj","fill":23}
{"name":"Raj","fill":0}
{"name":"Raj","fill":43}
{"name":"Rahul","fill":0}
{"name":"Rahul","fill":23}
{"name":"Rahul","fill":0}
{"name":"Rahul","fill":43}
{"name":"Rahul","fill":43}
{"name":"Rahul","fill":43}
Result :-
{
"name":"Raj",
fillcount:2,
fillpercentagetototaldocument:50% // 2 (fill count except 0 value ) divide by 4(total document for raj)
}
{
"name":"Rahul",
fillcount:4,
fillpercentagetototaldocument:66% // 4(fill count except 0 value ) divide by 6(total document for rahul)
}
You want you use $group combined with a conditional count like so:
db.collection.aggregate([
{
$group: {
_id: "$name",
total: {
$sum: 1
},
fillcount: {
$sum: {
$cond: [
{
$ne: [
"$fill",
0
]
},
1,
0
]
}
}
}
},
{
$project: {
_id: 0,
name: "$_id",
fillcount: 1,
fillpercentagetototaldocument: {
"$multiply": [
{
"$divide": [
"$fillcount",
"$total"
]
},
100
]
}
}
}
])
Mongo Playground
You can use mongos aggregation function to achieve that.
example:
db.getCollection('CollectionName').aggregate([
{
$group: {
_id: { name: '$name'},
fillpercentagetototaldocument: { $sum: '$fill' },
fillCount:{$sum:1}
},
},
{ $sort: { fillpercentagetototaldocument: -1 } },
]);
The result will look like this afterwards:
[
{
"_id" : {
"name" : "Rahul"
},
"fillpercentagetototaldocument" : 152,
"fillCount" : 6.0
},
{
"_id" : {
"name" : "Raj"
},
"fillpercentagetototaldocument" : 66,
"fillCount" : 4.0
}
]
Related
I have a series of documents in MongoDB that look like this:
{
"_id" : ObjectId("63ceb466db8c0f5500ea0aaa"),
"Partner_ID" : "662347848",
"EarningsData" : [
{
"From_Date" : ISODate("2022-01-10T18:30:00.000Z"),
"Scheme_Name" : "CUSTOMERWINBACKJCA01",
"Net_Amount" : 256,
},
{
"From_Date" : ISODate("2022-02-10T18:30:00.000Z"),
"Scheme_Name" : "CUSTOMERWINBACKJCA01",
"Net_Amount" : 285,
}
],
"createdAt" : ISODate("2023-01-23T16:23:02.440Z")
}
Now, what I need to do is to get the sum of Net_Amount per Scheme_Name per month of From_Date for the specific Partner_ID.
For the above document, the output will look something like this:
[
{
"Month" : 1,
"Scheme_Name" : 'CUSTOMERWINBACKJCA01'
"Net_Amount": 256
},
{
"Month" : 2,
"Scheme_Name" : 'CUSTOMERWINBACKJCA01'
"Net_Amount": 285
}
]
I have tried to implement the aggregation pipeline and was successfully able to get the sum of Net_Amount per Scheme_Name but I am not able to figure out how to integrate the per month of From_Date logic.
Below is the query sample:
var projectQry = [
{
"$unwind": {
path : '$EarningsData',
preserveNullAndEmptyArrays: true
}
},
{
$match: {
"Partner_ID": userId
}
},
{
$group : {
_id: "$EarningsData.Scheme_Name",
Net_Amount: {
$sum: "$EarningsData.Net_Amount"
}
}
},
{
$project: {
_id: 0,
Scheme_Name: "$_id",
Net_Amount: 1
}
}
];
You need to fix some issues,
$match move this stage to first for better performance, can use an index if you have created
$unwind doesn't need preserveNullAndEmptyArrays property, it preserves empty and null arrays
$group by Scheme_Name and From_Date as month, get sum of From_Date by $sum operator
$project to show required fields
db.collection.aggregate([
{ $match: { "Partner_ID": "662347848" } },
{ $unwind: "$EarningsData" },
{
$group: {
_id: {
Scheme_Name: "$EarningsData.Scheme_Name",
Month: {
$month: "$EarningsData.From_Date"
}
},
Net_Amount: {
$sum: "$EarningsData.Net_Amount"
}
}
},
{
$project: {
_id: 0,
Net_Amount: 1,
Scheme_Name: "$_id.Scheme_Name",
Month: "$_id.Month"
}
}
])
Playground
I have a set of documents (messages) in MongoDB collection as below. I want to just preserve the latest 500 records for individual user pairs. Users are identified as sentBy and sentTo.
/* 1 */
{
"_id" : ObjectId("5f1c1b00c62e9b9aafbe1d6c"),
"sentAt" : ISODate("2020-07-25T11:44:00.004Z"),
"readAt" : ISODate("1970-01-01T00:00:00.000Z"),
"msgBody" : "dummy text",
"msgType" : "text",
"sentBy" : ObjectId("54d6732319f899c704b21ef7"),
"sentTo" : ObjectId("54d6732319f899c704b21ef5"),
}
/* 2 */
{
"_id" : ObjectId("5f1c1b3cc62e9b9aafbe1d6d"),
"sentAt" : ISODate("2020-07-25T11:45:00.003Z"),
"readAt" : ISODate("1970-01-01T00:00:00.000Z"),
"msgBody" : "dummy text",
"msgType" : "text",
"sentBy" : ObjectId("54d6732319f899c704b21ef9"),
"sentTo" : ObjectId("54d6732319f899c704b21ef8"),
}
/* 3 */
{
"_id" : ObjectId("5f1c1b78c62e9b9aafbe1d6e"),
"sentAt" : ISODate("2020-07-25T11:46:00.003Z"),
"readAt" : ISODate("1970-01-01T00:00:00.000Z"),
"msgBody" : "dummy text",
"msgType" : "text",
"sentBy" : ObjectId("54d6732319f899c704b21ef6"),
"sentTo" : ObjectId("54d6732319f899c704b21ef8"),
}
/* 4 */
{
"_id" : ObjectId("5f1c1c2e1449dd9bbef28575"),
"sentAt" : ISODate("2020-07-25T11:49:02.012Z"),
"readAt" : ISODate("1970-01-01T00:00:00.000Z"),
"msgBody" : "dummy text",
"msgType" : "text",
"sentBy" : ObjectId("54cfcf93e2b8994c25077924"),
"sentTo" : ObjectId("54d6732319f899c704b21ef5"),
}
/* and soon... assume it to be 10k+ */
Algo that came to my mind is -
Grouping first based on the OR operator
Sorting the records in descending order on a timely basis
Limit it to 500
Get the array of _id that should be preserved
Pass the ID(s) to new mongo query .deleteMany() with $nin condition
Please help I struggled a lot on this, and have not got any success. Many Thanks :)
Depending on scale I would do one of the two following:
Assuming scale is somewhat low and you can actually group the entire collection in a reasonable time I would do something similar to what you suggjested:
db.collection.aggregate([
{
$sort: {
sentAt: 1
}
},
{
$group: {
_id: {
$cond: [
{$gt: ["$sentBy", "$sentTo"]},
["$sendBy", "$sentTo"],
["$sentTo", "$sendBy"],
]
},
roots: {$push: "$$ROOT"}
}
},
{
$project: {
roots: {$slice: ["$roots", -500]}
}
},
{
$unwind: "$roots"
},
{
$replaceRoot: {
newRoot: "$roots"
}
},
{
$out: "this_collection"
}
])
The sort stage has to come first as you can't sort an inner array post group, the $cond in the group stage simulates the $or operator logic which can't be used there. finally instead of retrieving the result than using deleteMany with $nin you can just use $out to rewrite the current collection.
If scale is way too big to support this then you should just iterate user by user and do what you suggested at first, here is a quick example:
let userIds = await db.collection.distinct("sentBy");
let done = [1];
for (let i = 0; i < userIds.length; i++) {
let matches = await db.collection.aggregate([
{
$match: {
$and: [
{
$or: [
{
"sentTo": userIds[i]
},
{
"sendBy": userIds[i]
}
]
},
{ // this is not necessary it's just to avoid running on ZxY and YxZ
$or: [
{
sendTo: {$nin: done}
},
{
sendBy: {$nin: done}
}
]
}
]
}
},
{
$sort: {
sentAt: 1
}
},
{
$group: {
_id: {
$cond: [
{$eq: ["$sentBy", userIds[i]]},
"$sendTo",
"$sentBy"
]
},
roots: {$push: "$$ROOT"}
}
},
{
$project: {
roots: {$slice: ["$roots", -500]}
}
},
{
$unwind: "$roots"
},
{
$group: {
_id: null,
keepers: {$push: "$roots._id"}
}
}
]).toArray();
if (matches.length) {
await db.collection.deleteMany(
{
$and: [
{
$or: [
{
"sentTo": userIds[i]
},
{
"sendBy": userIds[i]
}
]
},
{ // this is only necessary if you used it above.
$or: [
{
sendTo: {$nin: done}
},
{
sendBy: {$nin: done}
}
]
},
{
_id: {$nin: matches[0].keepers}
}
]
}
)
}
done.push(userIds[i])
}
I have a document of a following structure.
{
name: "John Doe",
City : "OK",
Prepaid: "Y"
},
{
name: "Jane Doe",
City : "CA",
Prepaid: "N"
},
{
name: "Jule Doe",
City : "OK",
Prepaid: "N"
},
{
name: "Jake Doe",
City : "OK",
Prepaid: "Y"
}
I would like to group this first based on the city and then Prepaid and get individual counts of each prepaid types. Something that looks similar to this.
{
City : OK
Count : {
"filter": prepaid,
"count": {
Y : 2
N: 1
}
}
}
{
City : CA
Count : {
"filter": prepaid,
"count": {
Y : 0
N: 1
}
}
}
I tried doing aggregation based on multiple fields and it gives me the total count of the documents and not the breakdowns.
Here's what I tried for my aggregation pipeline:
db.collection.aggregate([
{$match:matchquery
},{$group:{_id:{city: '$city', prepaid: '$prepaid' }, count:{$sum:1}
}}
])
You can run $group twice to count by prepaid first and then you can apply $arrayToObject to get Y/N as object keys:
db.collection.aggregate([
{
$group: {
_id: { city: "$City", prepaid: "$Prepaid" },
total: { $sum: 1 }
}
},
{
$grop: {
_id: "$_id.city",
Count: {
$push: {
k: "$_id.prepaid", v: "$total"
}
}
}
},
{
$project: {
_id: 0,
city: "$_id",
Count: { $mergeObjects: [ { filter: "prepaid" }, { count: { $arrayToObject: "$Count" } } ] }
}
}
])
Mongo Playground
{
"_id" : ObjectId("15672"),
"userName" : "4567",
"library" : [
{
"serialNumber" : "Book_1"
},
{
"serialNumber" : "Book_2"
},
{
"serialNumber" : "Book_4"
}
]
},
{
"_id" : ObjectId("123456"),
"userName" : "123",
"library" : [
{
"serialNumber" : "Book_2"
}
]
},
{
"_id" : ObjectId("1835242"),
"userName" : "13526",
"library" : [
{
"serialNumber" : "Book_7"
},
{
"serialNumber" : "Book_6"
},
{
"serialNumber" : "Book_5"
},
{
"serialNumber" : "Book_4"
},
{
"serialNumber" : "Book_3"
},
{
"serialNumber" : "Book_5"
}
]
}
I want a query which will give me the username in which serialNumber values are duplicate. The serial number values in one library can be present in other username library but it should not be there in one particular username library
Try this query :
db.collection.aggregate([
/** First match stage is optional if all of your docs are of type array & not empty */
{ $match: { $expr: { $and: [{ $eq: [{ $type: "$library" }, "array"] }, { $ne: ["$library", []] }] } } },
/** Add a new field allUnique to each doc, will be false where if elements in library have duplicates */
{
$addFields: {
allUnique: {
$eq: [
{
$size:
{
$reduce: {
input: "$library.serialNumber",
initialValue: [], // start with empty array
/** iterate over serialNumber's array from library & push current value if it's not there in array, at the end reduce would produce an array with uniques */
in: { $cond: [{ $in: ["$$this", "$$value"] }, [], { $concatArrays: [["$$this"], "$$value"] }] }
}
}
},
{
$size: "$library"
}
]
}
}
},
/** get docs where allUnique: false */
{
$match: {
allUnique: false
}
},
/** Project only needed fields & remove _id which is bydefault projected */
{
$project: {
userName: 1,
_id: 0
}
}
])
Other option can be doing this through $unwind but which is not preferable on huge datasets as it explodes your collection.
Test : MongoDB-Playground
Or from answer of #Dennis in this link duplicate-entries-from-an-array , You can try as below :
db.collection.aggregate([
{
$match: {
$expr: {
$and: [
{
$eq: [
{
$type: "$library"
},
"array"
]
},
{
$ne: [
"$library",
[]
]
}
]
}
}
},
{
$addFields: {
allUnique: {
$eq: [
{
$size: {
"$setUnion": [
"$library.serialNumber",
[]
]
}
},
{
$size: "$library"
}
]
}
}
},
{
$match: {
allUnique: false
}
},
{
$project: {
userName: 1,
_id: 0
}
}
])
Test : MongoDB-Playground
My mongodb data is like this,i want to filter the memoryLine.
{
"_id" : ObjectId("5e36950f65fae21293937594"),
"userId" : "5e33ee0b4a3895a6d246f3ee",
"notes" : [
{
"noteId" : ObjectId("5e36953665fae212939375a0"),
"time" : ISODate("2020-02-02T17:24:06.460Z"),
"memoryLine" : [
{
"_id" : ObjectId("5e36953665fae212939375ab"),
"memoryTime" : ISODate("2020-02-03T17:54:06.460Z")
},
{
"_id" : ObjectId("5e36953665fae212939375aa"),
"memoryTime" : ISODate("2020-02-03T05:24:06.460Z")
}
]
}
]}
i want to get the item which memoryTime is great than now as expected like this,
"userId" : "5e33ee0b4a3895a6d246f3ee",
"notes" : [
{
"noteId" : ObjectId("5e36953665fae212939375a0"),
"time" : ISODate("2020-02-02T17:24:06.460Z"),
"memoryLine" : [
{
"_id" : ObjectId("5e36953665fae212939375ab"),
"memoryTime" : ISODate("2020-02-03T17:54:06.460Z")
},
{
"_id" : ObjectId("5e36953665fae212939375aa"),
"memoryTime" : ISODate("2020-02-03T05:24:06.460Z")
}
]
}]
so is use code as below.i use a $filter in memoryLine to filter to get the right item.
aggregate([{
$match: {
"$and": [
{ userId: "5e33ee0b4a3895a6d246f3ee"},
]
}
}, {
$project: {
userId: 1,
notes: {
noteId: 1,
time: 1,
memoryLine: {
$filter: {
input: "$memoryLine",
as: "mLine",
cond: { $gt: ["$$mLine.memoryTime", new Date(new Date().getTime() + 8 * 1000 * 3600)] }
}
}
}
}
}]).then(doc => {
res.json({
code: 200,
message: 'success',
result: doc
})
});
but i got this,memoryLine is null,why?I try to change $gt to $lt, but also got null.
"userId" : "5e33ee0b4a3895a6d246f3ee",
"notes" : [
{
"noteId" : ObjectId("5e36953665fae212939375a0"),
"time" : ISODate("2020-02-02T17:24:06.460Z"),
"memoryLine" : null <<<------------- here is not right
}]
You can use $addFields to replace existing field, $map for outer collection and $filter for inner:
db.collection.aggregate([
{
$addFields: {
notes: {
$map: {
input: "$notes",
in: {
$mergeObjects: [
"$$this",
{
memoryLine: {
$filter: {
input: "$$this.memoryLine",
as: "ml",
cond: {
$gt: [ "$$ml.memoryTime", new Date() ]
}
}
}
}
]
}
}
}
}
}
])
$mergeObjects is used to avoid repeating fields from source memoryLine object.
Mongo Playground