I have a dataset that looks something like this:
{
"id": "02741544",
"items": [{
"item": "A"
}]
}, {
"id": "02472691",
"items": [{
"item": "A"
}, {
"item": "B"
}, {
"item": "C"
}]
}, {
"id": "01316523",
"items": [{
"item": "A"
}, {
"item": "B"
}]
}, {
"id": "01316526",
"items": [{
"item": "A"
}, {
"item": "B"
}]
}, {
"id": "01316529",
"items": [{
"item": "A"
}, {
"item": "D"
}]
},
I'm trying to craft a query that will give me an output that looks like this:
{
"item": "A",
"ids": [{
"id": "02741544"
}, {
"id": "02472691"
}, {
"id": "01316523"
}, {
"id": "01316526"
}, {
"id": "01316529"
}]
}, {
"item": "B",
"ids": [{
"id": "02472691"
}, {
"id": "01316523"
}, {
"id": "01316526"
}]
}, {
"item": "C",
"ids": [{
"id": "02472691"
}]
}, {
"item": "D",
"ids": [{
"id": "02472691"
}]
},
Basically, I'm trying to get the distinct items from the item array in the object, and then returning an array of ids for each obj that has that item in it's item array.
Better use the aggregation framework in which you need to run an operation that consists of the following pipeline steps (in the given order):
$unwind - This initial step will flatten the items array i.e. it produces a copy of each document per array entry. This is necessary for processing the documents further down the pipeline as "denormalised" documents which you can aggregate as groups.
$group - This will group the flattened documents by the item subdocument key and create the ids list by using the $push accumulator operator.
-- UPDATE --
As #AminJ pointed out in the comments, if items can have duplicate item values and you don't want duplicate ids in the result you can use $addToSet instead of $push
The following example demonstrates this:
db.collection.aggregate([
{ "$unwind": "$items" },
{
"$group": {
"_id": "$items.item",
"ids": {
"$push": { "id": "$id" } /* or use
"$addToSet": { "id": "$id" } if you don't want duplicate ids */
}
}
}
])
Sample Output
{
"_id" : "A",
"ids" : [
{ "id" : "02741544" },
{ "id" : "02472691" },
{ "id" : "01316523" },
{ "id" : "01316526" },
{ "id" : "01316529" }
]
}
/* 2 */
{
"_id" : "B",
"ids" : [
{ "id" : "02472691" },
{ "id" : "01316523" },
{ "id" : "01316526" }
]
}
/* 3 */
{
"_id" : "C",
"ids" : [
{ "id" : "02472691" }
]
}
/* 4 */
{
"_id" : "D",
"ids" : [
{ "id" : "01316529" }
]
}
The result from an aggregate() function is a cursor to the documents produced by the final stage of the aggregation pipeline operation. So if you want the results in an array you can use the cursor's toArray() method which returns an array that contains all the documents from it.
For example:
var pipeline = [
{ "$unwind": "$items" },
{
"$group": {
"_id": "$items.item",
"ids": {
"$push": { "id": "$id" } /* or use
"$addToSet": { "id": "$id" } if you don't want duplicate ids */
}
}
}
],
results = db.collection.aggregate(pipeline).toArray();
printjson(results);
Here's a solution using an aggregation pipeline:
db.col.aggregate([
{
$unwind: "$items"
},
{
$project: {
id: 1,
item: "$items.item"
}
},
{
$group: {
_id: "$item",
ids: {
$push: "$id"
}
}
}
])
Related
I'm having an issue with making count for items returned from an array without assuming or using those fields in my aggregration.
Data structure looks like this:
[
{
"_id": "1",
"title": "Vanella Icream",
"contain": "sugar",
"details": [
{
"flavour": "Vanella"
},
{
"weight": "10KG"
},
{
"sugar": "15KG"
}
]
},
{
"_id": "2",
"title": "Pretzels",
"contain": "salt",
"details": [
{
"flavour": "Wheat"
},
{
"weight": "10KG"
},
{
"sugar": "15KG"
}
]
},
{
"_id": "3",
"title": "Rasmalai Icream",
"contain": "sugar",
"details": [
{
"flavour": "Vanella"
},
{
"weight": "15KG"
},
{
"sugar": "12KG"
}
]
},
{
"_id": "4",
"title": "Vanella Icream",
"contain": "sugar",
"details": [
{
"flavour": "Vanella"
},
{
"weight": "15KG"
},
{
"sugar": "12KG"
}
]
}
]
Output I want:
[
{
"details": {
"flavour": {
"Vanella": 3, //Number of times Vanella present in each document.
"Wheat": 1,
},
"weight": {
"10KG": 2,
"15KG": 2
},
"sugar": {
"12KG": 2,
"15KG": 2
}
}
}
]
Query:
db.collection.aggregate([
{
"$unwind": {
"path": "$details"
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$details",
"$$ROOT"
]
}
}
},
{
"$facet": {
"flavour": [
{
"$group": {
"_id": "$flavour",
"sum": {
"$sum": 1
}
}
},
{
"$addFields": {
"flavour": "$_id"
}
},
{
"$project": {
"_id": 0
}
}
],
"weight": [
{
"$group": {
"_id": "$weight",
"sum": {
"$sum": 1
}
}
},
{
"$addFields": {
"weight": "$_id"
}
},
{
"$project": {
"_id": 0
}
}
]
}
},
{
"$addFields": {
"flavour": {
"$reduce": {
"input": {
"$filter": {
"input": {
"$map": {
"input": "$flavour",
"as": "w",
"in": {
"$cond": [
{
"$ne": [
"$$w.flavour",
null
]
},
{
"$let": {
"vars": {
"o": [
[
"$$w.flavour",
"$$w.sum"
]
]
},
"in": {
"$arrayToObject": "$$o"
}
}
},
null
]
}
}
},
"as": "f",
"cond": {
"$ne": [
"$$f",
null
]
}
}
},
"initialValue": {},
"in": {
"$let": {
"vars": {
"d": "$$value",
"p": "$$this"
},
"in": {
"$mergeObjects": [
"$$d",
"$$p"
]
}
}
}
}
},
"weight": {
"$reduce": {
"input": {
"$filter": {
"input": {
"$map": {
"input": "$weight",
"as": "w",
"in": {
"$cond": [
{
"$ne": [
"$$w.weight",
null
]
},
{
"$let": {
"vars": {
"o": [
[
"$$w.weight",
"$$w.sum"
]
]
},
"in": {
"$arrayToObject": "$$o"
}
}
},
null
]
}
}
},
"as": "f",
"cond": {
"$ne": [
"$$f",
null
]
}
}
},
"initialValue": {},
"in": {
"$let": {
"vars": {
"d": "$$value",
"p": "$$this"
},
"in": {
"$mergeObjects": [
"$$d",
"$$p"
]
}
}
}
}
}
}
},
{
"$project": {
"details": "$$ROOT"
}
}
])
Here I'm trying to get the flavour and weight with their count, with manually adding those fields in $filter stage. I want to do it without assuming those keys. So, even if there is 20 items present in array details it will map those items and shows me output with their counts respectively.
I hope you guys understand.
Playground:https://mongoplayground.net/p/j1mzgWvcmvd
You need to change the schema, the thing you want to do is easy, and both those queries are so complicated and slow, even the second that is much smaller has 2 $unwind and 3 $group with 3 $arrayToObject and 8 stages total because of the schema and the schema of the answer.
Don't store data in the keys of the documents, people that are new to MongoDB do those, i was doing it also, but it makes all things harder.(i can't say like never do it but you dont need it here)
Your schema should be something like
{
"_id": "2",
"title": "Pretzels",
"contain": "salt",
"details": [
{
"type" : "flavour",
"value" : "Wheat"
},
{
"type" : "weight",
"value" : "10KG"
},
{
"type" : "sugar",
"value" : "15KG"
}
]
}
See this example
Converts your schema, to the new schema and produce the results you
want but without data in keys (the first part you wouldnt need it you would need only the bellow query if you had that schema from start)
Query with the new Schema (no data in keys)
[{"$unwind": { "path": "$details"}},
{"$replaceRoot": {"newRoot": "$details"}},
{
"$group": {
"_id": {
"type": "$type",
"value": "$value"
},
"sum": {"$sum": 1}
}
},
{
"$replaceRoot": {
"newRoot": {"$mergeObjects": ["$_id","$$ROOT"]}
}
},
{"$project": {"_id": 0}},
{
"$group": {
"_id": "$type",
"values": {
"$push": {
"value": "$value",
"sum": "$sum"
}
}
}
},
{"$addFields": {"type": "$_id"}},
{"$project": {"_id": 0}}
]
MongoDB operators are not made to support for data in keys or dynamic keys(uknown keys) (to do it you do complicated things like the above)
If you want to change your schema, either do it with update in the database,
Or take the documents to the application and do it with javascript, and re-insert.
Even if you solve this question in the next one, you will have again problems.
I'm the guy from Mongodb Forum:
Try this out https://mongoplayground.net/p/tfyfpIkHilQ
I try to learn aggregation concept in MongoDB. I create an object like this for training.
"_id": "601c4bb56e018211b02abbf8",
"isDeleted": false,
"name": "TeacherName1",
"class": "7",
"students": [
{ "_id": "601c4bb56e018211b02abbf9", isDeleted:true, "name": "student-1", "studentGroup": "A", "avgResult": 36},
{ "_id": "601c4bb56e018211b02abbfa", isDeleted:false, "name": "student-2", "studentGroup": "A", "avgResult": 55},
{ "_id": "601c4bb56e018211b02abbfb", isDeleted:false, "name": "student-3", "studentGroup": "B", "avgResult": 44.66},
{ "_id": "601c4bb56e018211b02abbfc", isDeleted:false, "name": "student-4", "studentGroup": "C", "avgResult": 83.66},
{ "_id": "601c4bb56e018211b02abbfd", isDeleted:true, "name": "student-5", "studentGroup": "B", "avgResult": 37},
{ "_id": "601c4bb56e018211b02abbfe", isDeleted:true, "name": "student-6", "studentGroup": "C", "avgResult": 39.66},
]
I want to get teacher information and deleted students (isDeleted=true). So I try to get this result.
"_id": "601c4bb56e018211b02abbf8",
"isDeleted": false,
"name": "TeacherName1",
"class": "7",
"students": [
{ "_id": "601c4bb56e018211b02abbf9", isDeleted:true, ...},
{ "_id": "601c4bb56e018211b02abbfd", isDeleted:true, ...},
{ "_id": "601c4bb56e018211b02abbfe", isDeleted:true, ...},
]
I get result with use $unwind and $filter. But can I get this result with only $elemMatch?
If I use this query
this.aggregate([
{
$match: {
_id: mongoose.Types.ObjectId("601c4bb56e018211b02abbf8"),
isDeleted: false,
"students.isDeleted":true
},
},
]);
It returns all object.
If I try this
this.aggregate([
{
$match: {
_id: mongoose.Types.ObjectId("601c4bb56e018211b02abbf8"),
isDeleted: false,
students:{
$elemMatch:{
isDeleted:true
}
}
},
},
]);
It returns all object.
$match will just give you the whole doc should you match
however you may use $project with $filter using another stage
given
db.dummy.insert({studs:[{isDeleted:true, name:'a'},{isDeleted: true, name:'b'},{name:'c'}]})
db.dummy.insert({studs:[{name:'c'}]})
> match = {$match:{studs:{$elemMatch: {isDeleted:true}}}}
> project = {$project: { deletedStuds: {$filter:{input: '$studs', as:'stud', cond:{ $eq: ['$$stud.isDeleted', true]} } } }}
{
"$project" : {
"deletedStuds" : {
"$filter" : {
"input" : "$studs",
"as" : "stud",
"cond" : {
"$eq" : [
"$$stud.isDeleted",
true
]
}
}
}
}
}
> db.dummy.aggregate(match, project)
{ "_id" : ObjectId("6020351eb965951ac8a1eb62"), "deletedStuds" : [ { "isDeleted" : true, "name" : "a" }, { "isDeleted" : true, "name" : "b" } ] }
My document in cosmosdb looks like this
{
"todayDate": "2017-12-08",
"data": [
{
"group": {"priority": 1, "total": 10},
"severity": 1
},
{
"group": {"priority": 2, "total": 13},
"priority": 2
}
]
}
The following query when issued from either mongoShell for cosmosdb in azure portal or using my spring data mongodb project works fine and returns results in no time:
db.myCollection.find({ "$or" : [ { "data" : { "$elemMatch" : { "priority" : 1}} , "$or" : [ { "data" : { "$elemMatch" : { "group.priority" : 1}}}] }]})
However, the following query on the same lines with more OR conditions which basically is two of the above queries with OR operator, hangs indefinitely:
db.myCollection.find({ "$or": [ { "data" : { "$elemMatch" : { "priority" : 1}} , "$or" : [ { "data" : { "$elemMatch" : { "group.priority" : 1}}}] }, { "data" : { "$elemMatch" : { "severity" : 2}} , "$or" : [ { "data" : { "$elemMatch" : { "group.severity" : 2}}}] } ] })
Is there anything wrong with the last query that makes it hang indefinitely? Even if I replace initial OR with AND, still the same result i.e. hangs indefinitely.
I created 3 documents in my cosmos db according to the document template you provided.
[
{
"id": "1",
"todayDate": "2017-12-08",
"data": [
{
"group": {
"severity": 1,
"total": 10
},
"severity": 1
},
{
"group": {
"priority": 1,
"total": 13
},
"priority": 1
}
]
},
{
"id": "2",
"todayDate": "2017-12-09",
"data": [
{
"group": {
"priority": 3,
"total": 10
},
"severity": 1
},
{
"group": {
"priority": 3,
"total": 13
},
"priority": 1
}
]
},
{
"id": "3",
"todayDate": "2017-12-10",
"data": [
{
"group": {
"priority": 1,
"total": 10
},
"severity": 1
},
{
"group": {
"priority": 2,
"total": 13
},
"priority": 2
}
]
}
]
Then I use Robo 3T tool to execute your sql.
db.coll.find({
"$or": [
{ "data" : { "$elemMatch" : { "priority" : 1}} ,
"$or" : [
{ "data" : { "$elemMatch" : { "group.priority" : 1}}}
] },
{ "data" : { "$elemMatch" : { "severity" : 2}} ,
"$or" : [
{ "data" : { "$elemMatch" : { "group.severity" : 2}}}
] }
]
})
result:
The syntax of the $or that I found on the official document is:
{ $or: [ { <expression1> }, { <expression2> }, ... , { <expressionN> } ] }
It seems that your SQL can be executed normally though it is different from the above syntax. Per my experience, $or is generally used to be nested with $and (MongoDB Nested OR/AND Where?) ,so I do not quite understand what is the purpose of your $or nested here.
Surely, an indefinite hang is probably because the data is too large so that SQL runs too long and you need to optimize your SQL.
Hope it helps you.Any concern ,please let me know.
Update Answer:
I have properly modified my 3 sample documents then query 2 eligible documents via the SQL you provided.
SQL:
db.coll.find(
{
"$and": [
{
"$or": [
{
"data": {
"$elemMatch": {
"priority": 2
}
}
},
{
"data": {
"$elemMatch": {
"group.priority": 2
}
}
}
]
},
{
"$or": [
{
"data": {
"$elemMatch": {
"severity": 1
}
}
},
{
"data": {
"$elemMatch": {
"group.severity": 1
}
}
}
]
}
]
}
)
Results:
So , I think your SQL is correct. Is the data in the database very large? If you've been hanging for a long time, did you have seen timeout error messages? Or you could check RUs setting's issue.
I have a array like this->
var jsonResponse = [
{
"name": "abc",
"value": [
{ "label" : "Daily", "value":"Daily"}
]
},
{
"name": "ccc",
"value": [
{ "label" : "Daily", "value":"Daily"}
]
}
]
And I want to convert it to ->
{
"abc" : {
"name": "abc",
"value": [
{ "label" : "Daily", "value":"Daily"}
]
},
"ccc": {
"name": "ccc",
"value": [
{ "label" : "Daily", "value":"Daily"}
]
}
]
Probably I dont want foreach.
We can do partial with Object.assign( arrayDetails, ...jsonResponse);
But how to do object index?
let indexedResult = {};
jsonResponse.map(obj => indexedResult[obj.name] = obj)
console.log(JSON.stringify(indexedResult));
Is this the correct query for finding all docs that user1 received where archived = true for user1?
var query = {
"to.username": user1,
"to.section.archive": true
};
Models.Message.find( query ).sort([['to.updated','descending']]).exec(function (err, messages) {
A sample embedded 'To' array of a messages Schema looks like this:
"to" : [
{
"user" : ObjectId("53b96c735f4a3902008aa019"),
"username" : "user1",
"updated" : ISODate("2014-07-08T06:23:43.000Z"),
"_id" : ObjectId("53bb8e6f1e2e72fd04009dad"),
"section" : {
"in" : true,
"out" : false,
"archive" : true
}
}
]
The query should only return the doc above (user1 and archive is true)..not this next doc (archive is true, but not user1):
"to" : [
{
"user" : ObjectId("53b96c735f4a3902008aa019"),
"username" : "user2",
"updated" : ISODate("2014-07-08T06:24:42.000Z"),
"_id" : ObjectId("53bb8e6f1e2e72fd04009dad"),
"section" : {
"in" : true,
"out" : false,
"archive" : true
}
}
]
You want the $elemMatch operator to select the element that has both conditions and the positional $ operator for projection:
Models.Message.find(
{
"to": {
"$elemMatch": {
"username": "user2",
"section.archive": true
}
}
},
{ "created": 1, "message": 1, "to.$": 1 }
).sort([['to.updated','descending']]).exec(function (err, messages) {
});
Please note that this only works in matching the "first" element for projection. Also you want to "sort" on the value of the matching array element, and you cannot do that with .find() and the .sort() modifier.
If you want more than one match in the array then you need to use the aggregate method. This does more complex "filtering" and "projection" than is possible otherwise:
Models.Message.aggregate([
// Match documents
{ "$match": {
"to": {
"$elemMatch": {
"username": "user2",
"section.archive": true
}
}
}},
// Unwind to de-normalize
{ "$unwind": "$to" },
// Match the array elements
{ "$match": {
"to.username": "user2",
"to.section.archive": true
}},
// Group back to the original document
{ "$group": {
"_id": "$_id",
"created": { "$first": "$created" },
"message": { "$first": "$message" },
"to": { "$push": "$to" }
}}
// Sort the results "correctly"
{ "$sort": { "to.updated": -1 } }
],function(err,messages) {
});
Or you can avoid using $unwind and $group by applying some logic with the $map operator in MongoDB 2.6 or greater. Just watching that your array contents are "truly" unique as $setDifference is applied to the resulting "filtered" array:
Models.Message.aggregate([
{ "$match": {
"to": {
"$elemMatch": {
"username": "user2",
"section.archive": true
}
}
}},
{ "$project": {
"created": 1,
"message": 1,
"_id": 1,
"to": {
"$setDifference": [
{
"$map": {
"input": "$to",
"as": "el",
"in": {
"$cond": [
{
"$and": [
{ "$eq": [ "$$el.username", "user2" ] },
"$$el.section.archive"
]
},
"$$el",
false
]
}
}
},
[false]
]
}
}},
{ "$sort": { "to.updated": -1 } }
],function(err,messages) {
});
Or even using $redact:
Models.Messages.aggregate([
{ "$match": {
"to": {
"$elemMatch": {
"username": "user2",
"section.archive": true
}
}
}},
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [
{ "$ifNull": [ "$username", "user2" ] },
"user2"
] },
{ "$ifNull": [ "$section.archive", true ] }
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
{ "$sort": { "to.updated": -1 } }
],function(err,messages) {
});
But be careful as $redact operates over all levels of the document, so your result might be unexpected.
Likely your "to" array actually only has single entries that will match though, so generally the standard projection should be fine. But here is how you do "multiple" matches in an array element with MongoDB.