I have a data in profile collection
[
{
name: "Harish",
gender: "Male",
caste: "Vokkaliga",
education: "B.E"
},
{
name: "Reshma",
gender: "Female",
caste: "Vokkaliga",
education: "B.E"
},
{
name: "Rangnath",
gender: "Male",
caste: "Lingayath",
education: "M.C.A"
},
{
name: "Lakshman",
gender: "Male",
caste: "Lingayath",
education: "B.Com"
},
{
name: "Reshma",
gender: "Female",
caste: "Lingayath",
education: "B.E"
}
]
here I need to calculate total Number of different gender, total number of different caste and total number of different education.
Expected o/p
{
gender: [{
name: "Male",
total: "3"
},
{
name: "Female",
total: "2"
}],
caste: [{
name: "Vokkaliga",
total: "2"
},
{
name: "Lingayath",
total: "3"
}],
education: [{
name: "B.E",
total: "3"
},
{
name: "M.C.A",
total: "1"
},
{
name: "B.Com",
total: "1"
}]
}
using mongodb aggregation how can I get the expected result.
There are different approaches depending on the version available, but they all essentially break down to transforming your document fields into separate documents in an "array", then "unwinding" that array with $unwind and doing successive $group stages in order to accumulate the output totals and arrays.
MongoDB 3.4.4 and above
Latest releases have special operators like $arrayToObject and $objectToArray which can make transfer to the initial "array" from the source document more dynamic than in earlier releases:
db.profile.aggregate([
{ "$project": {
"_id": 0,
"data": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$in": [ "$$this.k", ["gender","caste","education"] ] }
}
}
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": "$data",
"total": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.k",
"v": {
"$push": { "name": "$_id.v", "total": "$total" }
}
}},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$v" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": "$data"
}
}}
])
So using $objectToArray you make the initial document into an array of it's keys and values as "k" and "v" keys in the resulting array of objects. We apply $filter here in order to select by "key". Here using $in with a list of keys we want, but this could be more dynamically used as a list of keys to "exclude" where that was shorter. It's just using logical operators to evaluate the condition.
The end stage here uses $replaceRoot and since all our manipulation and "grouping" in between still keeps that "k" and "v" form, we then use $arrayToObject here to promote our "array of objects" in result to the "keys" of the top level document in output.
MongoDB 3.6 $mergeObjects
As an extra wrinkle here, MongoDB 3.6 includes $mergeObjects which can be used as an "accumulator" in a $group pipeline stage as well, thus replacing the $push and making the final $replaceRoot simply shifting the "data" key to the "root" of the returned document instead:
db.profile.aggregate([
{ "$project": {
"_id": 0,
"data": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$in": [ "$$this.k", ["gender","caste","education"] ] }
}
}
}},
{ "$unwind": "$data" },
{ "$group": { "_id": "$data", "total": { "$sum": 1 } }},
{ "$group": {
"_id": "$_id.k",
"v": {
"$push": { "name": "$_id.v", "total": "$total" }
}
}},
{ "$group": {
"_id": null,
"data": {
"$mergeObjects": {
"$arrayToObject": [
[{ "k": "$_id", "v": "$v" }]
]
}
}
}},
{ "$replaceRoot": { "newRoot": "$data" } }
])
This is not really that different to what is being demonstrated overall, but simply demonstrates how $mergeObjects can be used in this way and may be useful in cases where the grouping key was something different and we did not want that final "merge" to the root space of the object.
Note that the $arrayToObject is still needed to transform the "value" back into the name of the "key", but we just do it during the accumulation rather than after the grouping, since the new accumulation allows the "merge" of keys.
MongoDB 3.2
Taking it back a version or even if you have a MongoDB 3.4.x that is less than the 3.4.4 release, we can still use much of this but instead we deal with the creation of the array in a more static fashion, as well as handling the final "transform" on output differently due to the aggregation operators we don't have:
db.profile.aggregate([
{ "$project": {
"data": [
{ "k": "gender", "v": "$gender" },
{ "k": "caste", "v": "$caste" },
{ "k": "education", "v": "$education" }
]
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": "$data",
"total": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.k",
"v": {
"$push": { "name": "$_id.v", "total": "$total" }
}
}},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$v" } }
}},
/*
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": "$data"
}
}}
*/
]).map( d =>
d.data.map( e => ({ [e.k]: e.v }) )
.reduce((acc,curr) => Object.assign(acc,curr),{})
)
This is exactly the same thing, except instead of having a dynamic transform of the document into the array, we actually "explicitly" assign each array member with the same "k" and "v" notation. Really just keeping those key names for convention at this point since none of the aggregation operators here depend on that at all.
Also instead of using $replaceRoot, we just do exactly the same thing as what the previous pipeline stage implementation was doing there but in client code instead. All MongoDB drivers have some implementation of cursor.map() to enable "cursor transforms". Here with the shell we use the basic JavaScript functions of Array.map() and Array.reduce() to take that output and again promote the array content to being the keys of the top level document returned.
MongoDB 2.6
And falling back to MongoDB 2.6 to cover the versions in between, the only thing that changes here is the usage of $map and a $literal for input with the array declaration:
db.profile.aggregate([
{ "$project": {
"data": {
"$map": {
"input": { "$literal": ["gender","caste", "education"] },
"as": "k",
"in": {
"k": "$$k",
"v": {
"$cond": {
"if": { "$eq": [ "$$k", "gender" ] },
"then": "$gender",
"else": {
"$cond": {
"if": { "$eq": [ "$$k", "caste" ] },
"then": "$caste",
"else": "$education"
}
}
}
}
}
}
}
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": "$data",
"total": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.k",
"v": {
"$push": { "name": "$_id.v", "total": "$total" }
}
}},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$v" } }
}},
/*
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": "$data"
}
}}
*/
])
.map( d =>
d.data.map( e => ({ [e.k]: e.v }) )
.reduce((acc,curr) => Object.assign(acc,curr),{})
)
Since the basic idea here is to "iterate" a provided array of the field names, the actual assignment of values comes by "nesting" the $cond statements. For three possible outcomes this means only a single nesting in order to "branch" for each outcome.
Modern MongoDB from 3.4 have $switch which makes this branching simpler, yet this demonstrates the logic was always possible and the $cond operator has been around since the aggregation framework was introduced in MongoDB 2.2.
Again, the same transformation on the cursor result applies as there is nothing new there and most programming languages have the ability to do this for years, if not from inception.
Of course the basic process can even be done way back to MongoDB 2.2, but just applying the array creation and $unwind in a different way. But no-one should be running any MongoDB under 2.8 at this point in time, and official support even from 3.0 is even fast running out.
Output
For visualization, the output of all demonstrated pipelines here has the following form before the last "transform" is done:
/* 1 */
{
"_id" : null,
"data" : [
{
"k" : "gender",
"v" : [
{
"name" : "Male",
"total" : 3.0
},
{
"name" : "Female",
"total" : 2.0
}
]
},
{
"k" : "education",
"v" : [
{
"name" : "M.C.A",
"total" : 1.0
},
{
"name" : "B.E",
"total" : 3.0
},
{
"name" : "B.Com",
"total" : 1.0
}
]
},
{
"k" : "caste",
"v" : [
{
"name" : "Lingayath",
"total" : 3.0
},
{
"name" : "Vokkaliga",
"total" : 2.0
}
]
}
]
}
And then either by the $replaceRoot or the cursor transform as demonstrated the result becomes:
/* 1 */
{
"gender" : [
{
"name" : "Male",
"total" : 3.0
},
{
"name" : "Female",
"total" : 2.0
}
],
"education" : [
{
"name" : "M.C.A",
"total" : 1.0
},
{
"name" : "B.E",
"total" : 3.0
},
{
"name" : "B.Com",
"total" : 1.0
}
],
"caste" : [
{
"name" : "Lingayath",
"total" : 3.0
},
{
"name" : "Vokkaliga",
"total" : 2.0
}
]
}
So whilst we can put some new and fancy operators into the aggregation pipeline where we have those available, the most common use case is in these "end of pipeline transforms" in which case we may as well simply do the same transformation on each document in the cursor results returned instead.
Related
I have a lot of documents with many attributes. After a specific $match pass, I end up with a subsection. Here it is simplified:
[
{"name": "foo", "code": "bbb"},
{"name": "foo", "code": "aaa"},
{"name": "foo", "code": "aaa"},
{"name": "foo", "code": "aaa"},
{"name": "bar", "code": "aaa"},
{"name": "bar", "code": "aaa"},
{"name": "bar", "code": "aaa"},
{"name": "baz", "code": "aaa"},
{"name": "baz", "code": "aaa"}
]
I would like to count the occurances of certain attributes so I end up with this (simplified):
{
"name": {
"foo": 4,
"bar": 3,
"baz": 2
},
"code": {
"bbb": 1,
"aaa": 8
}
}
(Or something close that I can 'translate' afterwards with Node.js)
I already do a $group stage to count other attributes (differently). Ideally I would $addToSet and also count how many times a similar value was added to the set. But I cannot figure out how.
Alternatively I was thinking to $push to end up with this (simplified):
{
"name": ["foo", "foo", "foo", "foo", "bar", "bar", "bar", "baz", "baz"],
"code": ["bbb", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", ]
}
But I can't figure out how to turn it into (something close to) the above hypothetical result either.
For single fields alone, the closest I can come is by using the above $push and then I can use $group:
"$group": {
"_id": {"_id": "$_id", "name": "$name"},
"nameCount": {"$sum": 1}
}
Now I have _id.name and nameCount. But I have lost all the previously counted attributes, 20 or so.
Is there a way to do (something close to) what I want?
Note: Using MongoDB 3.2
For MongoDB 3.2 you are pretty much limited to mapReduce if you want to return the "data" values as "keys" in a returned document. There is however the case to consider that you actually "do not need" MongoDB to do that part for you. But to consider the approaches:
Map Reduce
db.stuff.mapReduce(
function() {
emit(null, {
name: { [this.name]: 1 },
code: { [this.code]: 1 }
})
},
function(key,values) {
let obj = { name: {}, code: {} };
values.forEach(value => {
['name','code'].forEach(key => {
Object.keys(value[key]).forEach(k => {
if (!obj[key].hasOwnProperty(k))
obj[key][k] = 0;
obj[key][k] += value[key][k];
})
})
});
return obj;
},
{ "out": { "inline": 1 } }
)
Returns:
{
"_id" : null,
"value" : {
"name" : {
"foo" : 4.0,
"bar" : 3.0,
"baz" : 2.0
},
"code" : {
"bbb" : 1.0,
"aaa" : 8.0
}
}
}
Aggregate
For MongoDB 3.4 and upwards, you can use $arrayToObject to reshape as "key/value" objects. And a bit more efficiently than simply using $push to make two large arrays which would almost certainly break the BSON limit in real world cases.
This "more or less" mirrors the mapReduce() operations:
db.stuff.aggregate([
{ "$project": {
"_id": 0,
"data": [
{ "k": "name", "v": { "k": "$name", "count": 1 } },
{ "k": "code", "v": { "k": "$code", "count": 1 } }
]
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": { "k": "$data.k", "v": "$data.v.k" },
"count": { "$sum": "$data.v.count" }
}},
{ "$group": {
"_id": "$_id.k",
"v": { "$push": { "k": "$_id.v", "v": "$count" } }
}},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$v" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$map": {
"input": "$data",
"in": {
"k": "$$this.k",
"v": { "$arrayToObject": "$$this.v" }
}
}
}
}
}}
])
Which has similar output ( without forcing ordering of keys by applying $sort ):
{
"code" : {
"bbb" : 1.0,
"aaa" : 8.0
},
"name" : {
"baz" : 2.0,
"foo" : 4.0,
"bar" : 3.0
}
}
So it's only really in the final stage where we actually use the new features, and the output up to that point is pretty similar, and would be easy to reshape in code:
{
"_id" : null,
"data" : [
{
"k" : "code",
"v" : [
{
"k" : "bbb",
"v" : 1.0
},
{
"k" : "aaa",
"v" : 8.0
}
]
},
{
"k" : "name",
"v" : [
{
"k" : "baz",
"v" : 2.0
},
{
"k" : "foo",
"v" : 4.0
},
{
"k" : "bar",
"v" : 3.0
}
]
}
]
}
So in fact we can do just that:
db.stuff.aggregate([
{ "$project": {
"_id": 0,
"data": [
{ "k": "name", "v": { "k": "$name", "count": 1 } },
{ "k": "code", "v": { "k": "$code", "count": 1 } }
]
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": { "k": "$data.k", "v": "$data.v.k" },
"count": { "$sum": "$data.v.count" }
}},
{ "$group": {
"_id": "$_id.k",
"v": { "$push": { "k": "$_id.v", "v": "$count" } }
}},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$v" } }
}},
/*
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$map": {
"input": "$data",
"in": {
"k": "$$this.k",
"v": { "$arrayToObject": "$$this.v" }
}
}
}
}
}}
*/
]).map( doc =>
doc.data.map( d => ({
k: d.k,
v: d.v.reduce((acc,curr) =>
Object.assign(acc,{ [curr.k]: curr.v })
,{}
)
})).reduce((acc,curr) =>
Object.assign(acc,{ [curr.k]: curr.v })
,{}
)
)
Which just goes to show that simply because the aggregation framework does not have the features to use "named keys" in output for earlier versions, you generally do not need them. Since the only place we actually used the new features was in the "final" stage, but we can easily do the same by simply reshaping the final output in client code.
And of course, it's the same result:
[
{
"code" : {
"bbb" : 1.0,
"aaa" : 8.0
},
"name" : {
"baz" : 2.0,
"foo" : 4.0,
"bar" : 3.0
}
}
]
So it helps to learn the lesson of exactly "where" you actually need to apply such transformations. Here it's at the "end" since we do not need that during any "aggregation" stage, and thus you simply reshape the results that can be optimally provided from the aggregation framework itself.
The Bad Ways
As noted, your attempt so far may be fine for small data, but in most real world cases "pushing" all the items in a collection into a single document without reduction is going to break the 16MB BSON Limit.
Where it would actually stay under, then you can use something like this monster with $reduce:
db.stuff.aggregate([
{ "$group": {
"_id": null,
"name": { "$push": "$name" },
"code": { "$push": "$code" }
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$map": {
"input": [
{ "k": "name", "v": "$name" },
{ "k": "code", "v": "$code" }
],
"as": "m",
"in": {
"k": "$$m.k",
"v": {
"$arrayToObject": {
"$reduce": {
"input": "$$m.v",
"initialValue": [],
"in": {
"$cond": {
"if": {
"$in": [
"$$this",
{ "$map": {
"input": "$$value",
"as": "v",
"in": "$$v.k"
}}
]
},
"then": {
"$concatArrays": [
{ "$filter": {
"input": "$$value",
"as": "v",
"cond": { "$ne": [ "$$v.k", "$$this" ] }
}},
[{
"k": "$$this",
"v": {
"$sum": [
{ "$arrayElemAt": [
"$$value.v",
{ "$indexOfArray": [ "$$value.k", "$$this" ] }
]},
1
]
}
}]
]
},
"else": {
"$concatArrays": [
"$$value",
[{ "k": "$$this", "v": 1 }]
]
}
}
}
}
}
}
}
}
}
}
}}
])
Which produces:
{
"name" : {
"foo" : 4.0,
"bar" : 3.0,
"baz" : 2.0
},
"code" : {
"bbb" : 1.0,
"aaa" : 8.0
}
}
Or indeed the same reduction process in client code:
db.stuff.aggregate([
{ "$group": {
"_id": null,
"name": { "$push": "$name" },
"code": { "$push": "$code" }
}},
]).map( doc =>
["name","code"].reduce((acc,curr) =>
Object.assign(
acc,
{ [curr]: doc[curr].reduce((acc,curr) =>
Object.assign(acc,
(acc.hasOwnProperty(curr))
? { [curr]: acc[curr] += 1 }
: { [curr]: 1 }
),{}
)
}
),
{}
)
)
Which again has the same result:
{
"name" : {
"foo" : 4.0,
"bar" : 3.0,
"baz" : 2.0
},
"code" : {
"bbb" : 1.0,
"aaa" : 8.0
}
}
I have a dataset that looks something like this:
{
"id": "02741544",
"items": [{
"item": "A"
}]
}, {
"id": "02472691",
"items": [{
"item": "A"
}, {
"item": "B"
}, {
"item": "C"
}]
}, {
"id": "01316523",
"items": [{
"item": "A"
}, {
"item": "B"
}]
}, {
"id": "01316526",
"items": [{
"item": "A"
}, {
"item": "B"
}]
}, {
"id": "01316529",
"items": [{
"item": "A"
}, {
"item": "D"
}]
},
I'm trying to craft a query that will give me an output that looks like this:
{
"item": "A",
"ids": [{
"id": "02741544"
}, {
"id": "02472691"
}, {
"id": "01316523"
}, {
"id": "01316526"
}, {
"id": "01316529"
}]
}, {
"item": "B",
"ids": [{
"id": "02472691"
}, {
"id": "01316523"
}, {
"id": "01316526"
}]
}, {
"item": "C",
"ids": [{
"id": "02472691"
}]
}, {
"item": "D",
"ids": [{
"id": "02472691"
}]
},
Basically, I'm trying to get the distinct items from the item array in the object, and then returning an array of ids for each obj that has that item in it's item array.
Better use the aggregation framework in which you need to run an operation that consists of the following pipeline steps (in the given order):
$unwind - This initial step will flatten the items array i.e. it produces a copy of each document per array entry. This is necessary for processing the documents further down the pipeline as "denormalised" documents which you can aggregate as groups.
$group - This will group the flattened documents by the item subdocument key and create the ids list by using the $push accumulator operator.
-- UPDATE --
As #AminJ pointed out in the comments, if items can have duplicate item values and you don't want duplicate ids in the result you can use $addToSet instead of $push
The following example demonstrates this:
db.collection.aggregate([
{ "$unwind": "$items" },
{
"$group": {
"_id": "$items.item",
"ids": {
"$push": { "id": "$id" } /* or use
"$addToSet": { "id": "$id" } if you don't want duplicate ids */
}
}
}
])
Sample Output
{
"_id" : "A",
"ids" : [
{ "id" : "02741544" },
{ "id" : "02472691" },
{ "id" : "01316523" },
{ "id" : "01316526" },
{ "id" : "01316529" }
]
}
/* 2 */
{
"_id" : "B",
"ids" : [
{ "id" : "02472691" },
{ "id" : "01316523" },
{ "id" : "01316526" }
]
}
/* 3 */
{
"_id" : "C",
"ids" : [
{ "id" : "02472691" }
]
}
/* 4 */
{
"_id" : "D",
"ids" : [
{ "id" : "01316529" }
]
}
The result from an aggregate() function is a cursor to the documents produced by the final stage of the aggregation pipeline operation. So if you want the results in an array you can use the cursor's toArray() method which returns an array that contains all the documents from it.
For example:
var pipeline = [
{ "$unwind": "$items" },
{
"$group": {
"_id": "$items.item",
"ids": {
"$push": { "id": "$id" } /* or use
"$addToSet": { "id": "$id" } if you don't want duplicate ids */
}
}
}
],
results = db.collection.aggregate(pipeline).toArray();
printjson(results);
Here's a solution using an aggregation pipeline:
db.col.aggregate([
{
$unwind: "$items"
},
{
$project: {
id: 1,
item: "$items.item"
}
},
{
$group: {
_id: "$item",
ids: {
$push: "$id"
}
}
}
])
I have the following document in my collection.
{
"_id" : ObjectId("55961a28bffebcb8058b4570"),
"title" : "BackOffice 2",
"cts" : NumberLong(1435900456),
"todo_items" : [
{
"id" : "55961a42bffebcb7058b4570",
"task_desc" : "test 1",
"completed_by" : "557fccb5bffebcf7048b457c",
"completed_date" : NumberLong(1436161096)
},
{
"id" : "559639afbffebcc7098b45a6",
"task_desc" : "test 2",
"completed_by" : "557fccb5bffebcf7048b457c",
"completed_date" : NumberLong(1435911809)
},
{
"id" : "559a22f5bffebcb0048b476c",
"task_desc" : "test 3",
}
],
"uts" : NumberLong(1436164853)
}
I need an aggregation query to perform following, if there is field "completed_by" and "completed_date" and if there is a value which is not null push in to the "completed" array field, otherwise push them into the "incomplete" field.
Following is a sample result I want.
{
"_id" : ObjectId("55961a28bffebcb8058b4570"),
"completed" : [
{
"id":"557fccb5bffebcf7048b457c",
"title":"test 1",
"completed_by" : "557fccb5bffebcf7048b457c",
"completed_date" : NumberLong(1436161096)
},
{
"id":"557fccb5bffebcf7048b457c",
"title":"test 1",
"completed_by" : "557fccb5bffebcf7048b457c",
"completed_date" : NumberLong(1436161096)
}
],
"incomplete":[
{
"id" : "559a22f5bffebcb0048b476c",
"title" : "test 3"
}
]
}
As long as your "array" items have "distinct" identifiers ( which they have ) there are a couple of approaches to this;
Firstly, without actually "aggregating accross documents":
db.collection.aggregate([
{ "$project": {
"title": 1,
"cts": 1,
"completed": { "$setDifference": [
{ "$map": {
"input": "$todo_items",
"as": "i",
"in": {
"$cond": [
"$$i.completed_date",
"$$i",
false
]
}
}},
[false]
]},
"incomplete": { "$setDifference": [
{ "$map": {
"input": "$todo_items",
"as": "i",
"in": {
"$cond": [
"$$i.completed_date",
false,
"$$i"
]
}
}},
[false]
]}
}}
])
That requires that you at least have MongoDB 2.6 available on the server in order to use the required $map and $setDifference operators. It's pretty fast considering that all the work is done in a single $project stage.
The alternative, which you should only use when "aggregating across documents", is available to all versions supporting the aggregation framework post MongoDB 2.2:
db.collection.aggregate([
{ "$unwind": "$todo_items" },
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"cts": { "$first": "$cts" },
"completed": {
"$addToSet": {
"$cond": [
"$todo_items.completed_date",
"$todo_items",
null
]
}
},
"incomplete": {
"$addToSet": {
"$cond": [
"$todo_items.completed_date",
null,
"$todo_items",
]
}
}
}},
{ "$unwind": "$completed" },
{ "$match": { "completed": { "$ne": null } } },
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"cts": { "$first": "$cts" },
"completed": { "$push": "$completed" },
"incomplete": { "$first": "$incomplete" }
}}
{ "$unwind": "$incomplete" },
{ "$match": { "incomplete": { "$ne": null } } },
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"cts": { "$first": "$cts" },
"completed": { "$first": "$completed" },
"incomplete": { "$push": "$incomplete" }
}}
])
Which isn't entirely all there since you need to cater for conditions where an array may end up empty. But that is not the real lesson here since MongoDB 2.6 is already a couple of years in circulation.
In aggregation, you cannot really exclude the "null/false" results, but you can "filter" them.
Also, unless you are actually "aggregating accross documents" as mentioned already, then the second form with $unwind to process the arrays comes with a "lot" of overhead. So you really should be altering the array contents in your client code as each document is read.
Can you please check the below :
db.collection.aggregate([
{$unwind : "$todo_items"},
{$group: {_id : "$_id" , completed : {{$cond :
{
if : { $and : [ {"todo_items.completed_by" : {$exists: true, $ne : null }},
{"todo_items.completed_date" : {$exists : true, $ne : null}} ] } },
then : {$push : {"old_completed" : "$todo_items"}},
else: {$push : {"old_incompleted" : "$todo_items"}}
} } } },
{$project: {_id : "$_id", completed : "$completed.old_completed" ,
incompleted : "$completed.old_incompleted"}}
]);
This is my collection:
{
"_id" : 10926400,
"votes": 131,
"author": "Jesse",
"comments" : [
{
"id" : 1,
"votes": 31,
"author": "Mirek"
},
{
"id": 2,
"votes": 13,
"author": "Leszke"
}
]
},
{
"_id" : 10926401,
"votes": 75,
"author": "Mirek",
"comments" : [
{
"id" : 1,
"votes": 17,
"author": "Jesse"
},
{
"id": 2,
"votes": 29,
"author": "Mirek"
}
]
}
And I want $sum values of votes and comments.votes of each author
expected output(sort $votes: -1):
"Mirek" total votes: 31 + 75 + 29 = 135
"Jesse" total votes: 131 + 17 = 148
"Leszke total votes: 13
Not immediately visible but possible. What you need to do here is combine your top level document with the array of comments without duplicating it. Here's an approach to first join the content as two arrays into a singular array, then $unwind to group the content:
db.collection.aggregate([
{ "$group": {
"_id": "$_id",
"author": {
"$addToSet": {
"id": "$_id",
"author": "$author",
"votes": "$votes"
}
},
"comments": { "$first": "$comments" }
}},
{ "$project": {
"combined": { "$setUnion": [ "$author", "$comments" ] }
}},
{ "$unwind": "$combined" },
{ "$group": {
"_id": "$combined.author",
"votes": { "$sum": "$combined.votes" }
}},
{ "$sort": { "votes": -1 } }
])
Which gives the output:
{ "_id" : "Jesse", "votes" : 148 }
{ "_id" : "Mirek", "votes" : 135 }
{ "_id" : "Leszke", "votes" : 13 }
Even as skipping the first $group stage and making a combined array a different way:
db.collection.aggregate([
{ "$project": {
"combined": {
"$setUnion": [
{ "$map": {
"input": { "$literal": ["A"] },
"as": "el",
"in": {
"author": "$author",
"votes": "$votes"
}
}},
"$comments"
]
}
}},
{ "$unwind": "$combined" },
{ "$group": {
"_id": "$combined.author",
"votes": { "$sum": "$combined.votes" }
}},
{ "$sort": { "votes": -1 } }
])
Those use operators such as $setUnion and even $map which were introduced as of MongoDB 2.6. This makes it simplier, but it can still be done in earlier versions lacking those operators, following much the same principles:
db.collection.aggregate([
{ "$project": {
"author": 1,
"votes": 1,
"comments": 1,
"type": { "$const": ["A","B"] }
}},
{ "$unwind": "$type" },
{ "$unwind": "$comments" },
{ "$group": {
"_id": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
{
"id": "$_id",
"author": "$author",
"votes": "$votes"
},
"$comments"
]
}
}},
{ "$group": {
"_id": "$_id.author",
"votes": { "$sum": "$_id.votes" }
}},
{ "$sort": { "votes": -1 } }
])
The $const is undocumented but present in all versions of MongoDB where the aggregation framework is present ( from 2.2 ). MongoDB 2.6 Introduced $literal which essentially links to the same underlying code. It's been used in two cases here to either provide a template element for an array, or as introducing an array to unwind in order to provide a "binary choice" between two actions.
You could aggregate the results as below:
Unwind the comments array.
Group the records together to first calculate the sum of the votes
received by each author in his comments. Meanwhile keep the original
posts in tact.
Unwind by the original post array.
Now project the sum for each author.
Sort by name and votes of the author.
Select the first record from each group to eliminate duplicates.
Code:
db.collection.aggregate([
{$unwind:"$comments"},
{$group:{"_id":null,
"comments":{$push:"$comments"},
"post":{$addToSet:{"author":"$author",
"votes":"$votes"}}}},
{$unwind:"$comments"},
{$group:{"_id":"$comments.author",
"votes":{$sum:"$comments.votes"},
"post":{$first:"$post"}}},
{$unwind:"$post"},
{$project:{"_id":1,
"votes":{$cond:[{$eq:["$_id","$post.author"]},
{$add:["$votes","$post.votes"]},
"$votes"]}}},
{$sort:{"_id":-1,"votes":-1}},
{$group:{"_id":"$_id","votes":{$first:"$votes"}}}
])
Sample o/p:
{ "_id" : "Leszke", "votes" : 13 }
{ "_id" : "Jesse", "votes" : 148 }
{ "_id" : "Mirek", "votes" : 135 }
Is this the correct query for finding all docs that user1 received where archived = true for user1?
var query = {
"to.username": user1,
"to.section.archive": true
};
Models.Message.find( query ).sort([['to.updated','descending']]).exec(function (err, messages) {
A sample embedded 'To' array of a messages Schema looks like this:
"to" : [
{
"user" : ObjectId("53b96c735f4a3902008aa019"),
"username" : "user1",
"updated" : ISODate("2014-07-08T06:23:43.000Z"),
"_id" : ObjectId("53bb8e6f1e2e72fd04009dad"),
"section" : {
"in" : true,
"out" : false,
"archive" : true
}
}
]
The query should only return the doc above (user1 and archive is true)..not this next doc (archive is true, but not user1):
"to" : [
{
"user" : ObjectId("53b96c735f4a3902008aa019"),
"username" : "user2",
"updated" : ISODate("2014-07-08T06:24:42.000Z"),
"_id" : ObjectId("53bb8e6f1e2e72fd04009dad"),
"section" : {
"in" : true,
"out" : false,
"archive" : true
}
}
]
You want the $elemMatch operator to select the element that has both conditions and the positional $ operator for projection:
Models.Message.find(
{
"to": {
"$elemMatch": {
"username": "user2",
"section.archive": true
}
}
},
{ "created": 1, "message": 1, "to.$": 1 }
).sort([['to.updated','descending']]).exec(function (err, messages) {
});
Please note that this only works in matching the "first" element for projection. Also you want to "sort" on the value of the matching array element, and you cannot do that with .find() and the .sort() modifier.
If you want more than one match in the array then you need to use the aggregate method. This does more complex "filtering" and "projection" than is possible otherwise:
Models.Message.aggregate([
// Match documents
{ "$match": {
"to": {
"$elemMatch": {
"username": "user2",
"section.archive": true
}
}
}},
// Unwind to de-normalize
{ "$unwind": "$to" },
// Match the array elements
{ "$match": {
"to.username": "user2",
"to.section.archive": true
}},
// Group back to the original document
{ "$group": {
"_id": "$_id",
"created": { "$first": "$created" },
"message": { "$first": "$message" },
"to": { "$push": "$to" }
}}
// Sort the results "correctly"
{ "$sort": { "to.updated": -1 } }
],function(err,messages) {
});
Or you can avoid using $unwind and $group by applying some logic with the $map operator in MongoDB 2.6 or greater. Just watching that your array contents are "truly" unique as $setDifference is applied to the resulting "filtered" array:
Models.Message.aggregate([
{ "$match": {
"to": {
"$elemMatch": {
"username": "user2",
"section.archive": true
}
}
}},
{ "$project": {
"created": 1,
"message": 1,
"_id": 1,
"to": {
"$setDifference": [
{
"$map": {
"input": "$to",
"as": "el",
"in": {
"$cond": [
{
"$and": [
{ "$eq": [ "$$el.username", "user2" ] },
"$$el.section.archive"
]
},
"$$el",
false
]
}
}
},
[false]
]
}
}},
{ "$sort": { "to.updated": -1 } }
],function(err,messages) {
});
Or even using $redact:
Models.Messages.aggregate([
{ "$match": {
"to": {
"$elemMatch": {
"username": "user2",
"section.archive": true
}
}
}},
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [
{ "$ifNull": [ "$username", "user2" ] },
"user2"
] },
{ "$ifNull": [ "$section.archive", true ] }
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
{ "$sort": { "to.updated": -1 } }
],function(err,messages) {
});
But be careful as $redact operates over all levels of the document, so your result might be unexpected.
Likely your "to" array actually only has single entries that will match though, so generally the standard projection should be fine. But here is how you do "multiple" matches in an array element with MongoDB.