Aggregating the result of the MongoDB model query - javascript

I have a model Book with a field "tags" which is of type array of String / GraphQLString.
Currently, I'm able to query the tags for each book.
{
books {
id
tags
}
}
and I get the result:
{
"data": {
"books": [
{
"id": "631664448cb20310bc25c89d",
"tags": [
"database",
"middle-layer"
]
},
{
"id": "6316945f8995f05ac71d3b22",
"tags": [
"relational",
"database"
]
},
]
}
}
I want to write a RootQuery where I can fetch all unique tags across all books. This is how far I am (which is not too much):
tags: {
type: new GraphQLList(GraphQLString),
resolve(parent, args) {
Book.find({}) // CAN'T FIGURE OUT WHAT TO DO HERE
return [];
}
}
Basically, I'm trying to fetch all books and then potentially merge all tags fields on each book.
I expect that if I query:
{
tags
}
I would get
["relational", "database", "middle-layer"]
I am just starting with Mongoose, MongoDB, as well as GraphQL, so not 100% sure what keywords to exactly look fo or even what the title of this question should be.
Appreciate the help.

You want to $unwind the arrays so they're flat, at that point we can just use $group to get unique values. like so:
db.collection.aggregate([
{
"$unwind": "$data.books"
},
{
"$unwind": "$data.books.tags"
},
{
$group: {
_id: "$data.books.tags"
}
}
])
Mongo Playground

MongoDb + JavaScript Solution
tags = Book.aggregate([
{
$project: {
tags: 1,
_id: 0,
}
},
])
This returns an array of objects that contain only the tags value. $project is staging this item in the aggregation pipeline by selecting keys to include, denoted by 1 or 0. _id is added by default so it needs to be explicitly excluded.
Then take the tags array that looks like this:
[
{
"tags": [
"database",
"middle-layer"
]
},
{
"tags": [
"relational",
"database"
]
}
]
And reduce it to be one unified array, then make it into a javascript Set, which will exclude duplicates by default. I convert it back to an Array at the end, if you need to perform array methods on it, or write back to the DB.
let allTags = tags.reduce((total, curr) => [...total, ...curr.tags], [])
allTags = Array.from(new Set(allTags))
const tags = [
{
"tags": [
"database",
"middle-layer"
]
},
{
"tags": [
"relational",
"database"
]
}
]
let allTags = tags.reduce((total, curr) => [...total, ...curr.tags], [])
allTags = Array.from(new Set(allTags))
console.log(allTags)
Pure MongoDB Solution
Book.aggregate([
{
$unwind: "$tags"
},
{
$group: {
_id: "_id",
tags: {
"$addToSet": "$tags"
}
}
},
{
$project: {
tags: 1,
_id: 0,
}
}
])
Steps in Aggregation Pipeline
$unwind
Creates a new Mongo Document for each tag in tags
$group
Merges the individual tags into a set called tags
Sets are required to be have unique values and will exclude duplicates by default
_id is a required field
_id will be excluded from the final aggregation so it doesn't matter what it is
$project
Chooses which fields to pull from the previous step in the pipeline
Using it here to exclude _id from the results
Output
[
{
"tags": [
"database",
"middle-layer",
"relational"
]
}
]
Mongo Playground Demo
While this solution gets the result with purely Mongo queries, the resulting output is nested and still requires traversal to get to desired fields. I do not know of a way to replace the root with a list of string values in an aggregation pipeline. So at the end of the day, JavaScript is still required.

Related

Mongoose reverse lookup and delete

I am trying to make a healthcheck on references in one of my collections. so to see if objects referenced to still exist and if not I want to delete that _id in the array
I haven't found anything to that so my idea is to get the reversed result of a $lookup
Is it possible to get the reversed result of a lookup in MongoDB?
Here is an example of a collection and its taskList with references to the tasks collection.
Now I want to delete all the id's in there that do not have an existing result in the tasks collection.
How I solve it right now which is tons of queries:
get all the ids from taskList
Send a query for every single one of them to see if there is no match with the task collection
Send a query to pull that empty reference out of the array
I think this does what you want, its ok even if you have big collections.
But its not an update you can do after that a $merge stage, to the tasklists (if match on _id replace)(requires MongoDB >= 4.4) or you can do a $out stage to another collection, and replace the tasklist collection.
Test code here
Data in
db={
"tasklists": [
{
"_id": 1,
"tasklist": [
1,
2,
3,
4
]
},
{
"_id": 2,
"tasklist": [
5,
6,
7
]
}
],
"tasks": [
{
"_id": 1
},
{
"_id": 2
},
{
"_id": 3
},
{
"_id": 5
}
]
}
db.tasklists.aggregate([
{
"$lookup": {
"from": "tasks",
"let": {
"tasklist": "$tasklist"
},
"pipeline": [
{
"$match": {
"$expr": {
"$in": [
"$_id",
"$$tasklist"
]
}
}
}
],
"as": "valid"
}
},
{
"$addFields": {
"valid": {
"$map": {
"input": "$valid",
"as": "v",
"in": "$$v._id"
}
}
}
},
{
"$addFields": {
"tasklist": {
"$filter": {
"input": "$tasklist",
"as": "t",
"cond": {
"$in": [
"$$t",
"$valid"
]
}
}
}
}
},
{
"$unset": [
"valid"
]
}
])
Results (tasks 4,6,7 wasnt found in the task collection,and removed)
[
{
"_id": 1,
"tasklist": [
1,
2,
3
]
},
{
"_id": 2,
"tasklist": [
5
]
}
]
Edit
If you want to use index to do the $lookup you can try this
Test code here
Tasks have index on _id so no need to make one, if you dont join on _id make one.
db.tasklists.aggregate([
{
"$unwind": {
"path": "$tasklist"
}
},
{
"$lookup": {
"from": "tasks",
"localField": "tasklist",
"foreignField": "_id",
"as": "joined"
}
},
{
"$match": {
"$expr": {
"$gt": [
{
"$size": "$joined"
},
0
]
}
}
},
{
"$unset": [
"joined"
]
},
{
"$group": {
"_id": "$_id",
"tasklist": {
"$push": "$tasklist"
},
"afield": {
"$first": "$afield"
}
}
}
])
After that you can do $out or $merge with replace option.
But both lose the updated data if any while this was happening.
Only solution for this(if it is a problem) $merge with pipeline,
You need to keep also in the pipeline above an extra array with the initial tasklist, so you remove the valid ones, to have the invalid ones, and then on merge with pipeline to filter the array, and just removed those invalid. (this is safe, from data loss)
I think the best approach instead of doing all those is to have an index on tasklist(multikey index) and when an _id is deleted from tasks,to delete the _id from the array in tasklist.With index it will be fast, so you dont need to check for invalid _ids.
Afaik there's no other way than you described in order to achieve the desired outcome, but you can greatly simplify the second step to find the non-matching items. In fact it's the set difference between the taskList-ids and the existing task-ids.
So you could use the $setDifference-operator to calculate that difference:
db.tasks.aggregate([
{
$group: {
_id: "null",
ids: {
"$addToSet": "$_id"
}
}
},
{
$project: {
nonMatchingTaskIds: {
$setDifference: [
[
"taskId1",
"taskId2",
"taskId7",
"taskId8"
],
"$ids"
]
}
}
}
])
Assuming your tasks collection contains taskId1, task2 (and other documents), but not taskId7 and taskId8, the query will result in nonMatchingTaskIds containing taskId7 and taskId8.
Here's an example on mongoplayground: https://mongoplayground.net/p/75BpiGBJi3Q
So what I came to do now is a few stepped method.
This is quite fast but sicne the taskIds collected from Sets are currently way smaller than the entire amount of sets I imagine working with the $setDifference operator mentioned by eol will be faster once I get that many references.
let taskIdsInSets = []
// Get all referenced task ids
const result = await this.setSchema.aggregate([
{
'$project': {
'taskList': 1
}
}
])
// Map all elements in one row
result.forEach(set => taskIdsInSets.push(...set.taskList.map(x=> x.toString())))
// Delete duplicates of taskIds here
taskIdsInSets.filter((item, index) => taskIdsInSets.indexOf(item) != index)
// Get the existing task ids that are referenced in a Set
const result2 = await this.taskSchema.aggregate([
{
'$match': {
'_id': {
'$in': [...taskIdsInSets.map(x => Types.ObjectId(x.toString()))]
}
}
}, {
'$project': {
'_id': 1
}
}
])
let existingIdsInTasks = []
// Getting ids from result2 Object into
result2.forEach(set => existingIdsInTasks.push(set._id.toString()))
// Filtering out the ids that don't actually exist
let nonExistingTaskIds = taskIdsInSets.filter(x => existingIdsInTasks.indexOf(x) === -1);
// Deleting the ids that don't actually exist but are in Sets
const finalResult = await this.setSchema.updateMany(
{
$pullAll: {
taskList: [...nonExistingTaskIds.map(x => Types.ObjectId(x.toString()))]
}
})
console.log(finalResult)
return finalResult // returns the information how much got changed. unfortunately in mongoose there isn't the option to use findAndModify with `{new:true}` or atleast I didn't manage to make it work.
for some reason what the database returns neither matches the Mongo ObjectId nor strings so I have to do some castings there.

mongo aggregate based on conditions to filter the document for versioning

I am working on versioning, We have documents based on UUIDs andjobUuids, andjobUuids are the documents associated with the currently working user. I have some aggregate queries on these collections which I need to update based on the job UUIDs,
The results fetched by the aggregate query should be such that,
if the current usersjobUuid document does not exist then the master document with jobUuid: "default" will be returned(The document without any jobUuid),
if job uuid exists then only the document is returned.
I have a$match used to get these documents based on certain conditions, from those documents I need to filter out the documents based on the above conditions, and an example is shown below,
The data looks like this:
[
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"name": "adam",
"jobUuid": "default",
},
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "adam"
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"name": "eve",
"jobUuid": "default",
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "eve"
},
{
"uuid": "26cba689-7eb6-4a9e-a04e-24ede0309e50",
"name": "john",
"jobUuid": "default",
}
]
Results for "jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12" should be:
[
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "adam"
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "eve"
},
{
"uuid": "26cba689-7eb6-4a9e-a04e-24ede0309e50",
"name": "john",
"jobUuid": "default",
}
]
Based on the conditions mentioned above, is it possible to filter the document within the aggregate query to extract the document of a specific job uuid?
Edit 1: I got the following solution, which is working fine, I want a better solution, eliminating all those nested stages.
Edit 2: Updated the data with actual UUIDs and I just included only the name as another field, we do have n number of fields which are not relevant to include here but needed at the end (mentioning this for those who want to use the projection over all the fields).
Update based on comment:
but the UUIDs are alphanumeric strings, as shown above, does it have
an effect on these sorting, and since we are not using conditions to
get the results, I am worried it will cause issues.
You could use additional field to match the sort order to be the same order as values in the in expression. Make sure you provide the values with default as the last value.
[
{"$match":{"jobUuid":{"$in":["d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12","default"]}}},
{"$addFields":{ "order":{"$indexOfArray":[["d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12","default"], "$jobUuid"]}}},
{"$sort":{"uuid":1, "order":1}},
{
"$group": {
"_id": "$uuid",
"doc":{"$first":"$$ROOT"}
}
},
{"$project":{"doc.order":0}},
{"$replaceRoot":{"newRoot":"$doc"}}
]
example here - https://mongoplayground.net/p/wXiE9i18qxf
Original
You could use below query. The query will pick the non default document if it exists for uuid or else pick the default as the only document.
[
{"$match":{"jobUuid":{"$in":[1,"default"]}}},
{"$sort":{"uuid":1, "jobUuid":1}},
{
"$group": {
"_id": "$uuid",
"doc":{"$first":"$$ROOT"}
}
},
{"$replaceRoot":{"newRoot":"$doc"}}
]
example here - https://mongoplayground.net/p/KrL-1s8WCpw
Here is what I would do:
match stage with $in rather than an $or (for readability)
group stage with _id on $uuid, just as you did, but instead of pushing all the data into an array, be more selective. _id is already storing $uuid, so no reason to capture it again. name must always be the same for each $uuid, so take only the first instance. Based on the match, there are only two possibilities for jobUuid, but this will assume it will be either "default" or something else, and that there can be more than one occurrence of the non-"default" jobUuid. Using "$addToSet" instead of pushing to an array in case there are multiple occurrences of the same jobUuid for a user, also, before adding to the set, use a conditional to only add non-"default" jobUuids, using $$REMOVE to avoid inserting a null when the jobUuid is "default".
Finally, "$project" to clean things up. If element 0 of the jobUuids array does not exist (is null), there is no other possibility for this user than for the jobUuid to be "default", so use "$ifNull" to test and set "default" as appropriate. There could be more than 1 jobUuid here, depending if that is allowed in your db/application, up to you to decide how to handle that (take the highest, take the lowest, etc).
Tested at: https://mongoplayground.net/p/e76cVJf0F3o
[{
"$match": {
"jobUuid": {
"$in": [
"1",
"default"
]
}
}
},
{
"$group": {
"_id": "$uuid",
"name": {
"$first": "$name"
},
"jobUuids": {
"$addToSet": {
"$cond": {
"if": {
"$ne": [
"$jobUuid",
"default"
]
},
"then": "$jobUuid",
"else": "$$REMOVE"
}
}
}
}
},
{
"$project": {
"_id": 0,
"uuid": "$_id",
"name": 1,
"jobUuid": {
"$ifNull": [{
"$arrayElemAt": [
"$jobUuids",
0
]
},
"default"
]
}
}
}]
I was able to solve this problem with the following aggregate query,
We are first extracting the results matching only the jobUuid provided by the user or the "default" in the match section.
Then the results are grouped based on the uuid, using a group stage and we are counting the results as well.
Using the conditions in replaceRoot first we are checking the length of the grouped document,
If the grouped document length is greater than or equal to 2, we are
filtering the document that matches the provided jobUuid.
If it's less or equal to the 1, then we are checking if it's matching the default jobUuid and returning it.
The Query is below:
[
{
$match: {
$or: [{ jobUuid:1 },{ jobUuid: 'default'}]
}
},
{
$group: {
_id: '$uuid',
count: {
$sum: 1
},
docs: {
$push: '$$ROOT'
}
}
},
{
$replaceRoot: {
newRoot: {
$cond: {
if: {
$gte: [
'$count',
2
]
},
then: {
$arrayElemAt: [
{
$filter: {
input: '$docs',
as: 'item',
cond: {
$ne: [
'$$item.jobUuid',
'default'
]
}
}
},
0
]
},
else: {
$arrayElemAt: [
{
$filter: {
input: '$docs',
as: 'item',
cond: {
$eq: [
'$$item.jobUuid',
'default'
]
}
}
},
0
]
}
}
}
}
}
]

Meteor and Mongo AND tag filters with a twist

Meteor application, where I have a mongo collection that has a tags field.
[{name: "ABC", tags: {"#Movie", "#free", "!R"}},
{name: "DEF", tags: {"#Movie", "!PG"}},
{name: "GHI", tags: {"#Sports", "#free"}}]
On my UI, there are three groups of checkboxes that are populated on the fly, based on the first letter of the tag name.
filter group 1: [ ]Movie [ ] Sports
filter group 2: [ ]free
filter group 3: [ ]PG [ ]R
The filter logic is the following:
If filter group is empty then do not filter by that filter group
If any checkbox from a filter group is checked, then apply that filter
$and should be applied between filter groups (if movies and R checked, then only documents that have tags named "!Movie" and "#free" should be selected
I am struggling to build a mongo criteria parameters that follows the above logic. My code currently looks like spaghetti with lots of nested ifs (in pseudo code)
if (filter_group1 is empty) then if (filter_group2 is empty) then mongo_criteria= {_id: $in: $("input:checked", ".filtergroup1").map(function() {return this.value})}
What would be the right way of doing this?
Firstly, I'm sure you mean that "tags" is actually an array since otherwise the structure would be invalid:
{ "name": "ABC", "tags": ["#Movie", "#free", "!R"]},
{ "name": "DEF", "tags": ["#Movie", "!PG"]},
{ "name": "GHI", "tags": ["#Sports", "#free"]}
It's a novel idea to store "tags" data this way, but it does seem that your program logic to construct a query needs to be aware that there are at least "three" possible conditions that need to be considered in an $and combination.
In the simplest form where you only allowed one selection per filter group then you could get away with coming out to this with the $all operator. Just in simple MongoDB shell notation for brevity:
db.collection.find({ "tags": { "$all": [ "#Movie", "!R" ] } })
The problem there is that if you wanted multiple selections on a group, say the rating for example, then this would fail to get a result:
db.collection.find({ "tags": { "$all": [ "#Movie", "!R", "!PG" ] } })
No item in fact contains both those rating values so this would not be valid. So you would rather do this:
db.collection.find({ "$and": [
{ "tags": { "$in": [ "#Movie" ] } },
{ "tags": { "$in": [ "!R", "!PG" ] } }
])
That would correctly match all Movies with ratings tags for "R" and "PG". Extending this for another group is basically pushing another array item to the $and expression:
db.collection.find({ "$and": [
{ "tags": { "$in": [ "#Movie" ] } },
{ "tags": { "$in": [ "!R", "!PG" ] } },
{ "tags": { "$in": [ "#free" ] }
])
Getting only the document which contains each of those "types" of filters to the matching value, so the "PG" movie is not free and "Sports" was filtered out by not adding to the selection.
The basics of constructing the query is working with an array of selection options for $in in each filter group. Of course then you only append to the $and array when there is a selection present in your filter group.
So start with a base $and like this:
var query = { "$and":[{}] };
And then add in each of the checked options in each filter group to its own in:
var inner = { "tags": { "$in": [] } };
inner.tags["$in"].push( item );
And then append to the base query:
query["$and"].push( inner );
Rinse and repeat for each item. And this is perfectly valid since the base query will just select everything unfiltered, and this is also valid without constructing additional logic:
db.collection.find({ "$and": [
{ },
{ "tags": { "$in": [ "#Movie" ] } },
{ "tags": { "$in": [ "!R", "!PG" ] } },
{ "tags": { "$in": [ "#free" ] }
])
So it really comes down to contruction of the query as MongoDB understands it. This is really just simple JavaScript array manipulation in building the data structure. Which is all MongoDB queries really are.

How to get embedded document in an array in MongoDB (with Mongoose)

I have a BSON object like this saved in MongoDB:
{
"title": "Chemistry",
"_id": "532d665f89ae4ae703b29730",
"__v": 0,
"sections": [
{
"week": 1,
"_id": "532d665f89ae4ae703b29731",
"assignments": [
{
"created_date": "2014-03-22T10:30:55.621Z",
"_id": "532d665f89ae4ae703b29733",
"questions": []
},
{
"created_date": "2014-03-22T10:30:55.621Z",
"_id": "532d665f89ae4ae703b29732",
"questions": []
}
],
"materials": []
}
],
"instructor_ids": [],
"student_ids": []
}
What I wish to do is to retrieve the 'assignment' with _id 532d665f89ae4ae703b29731. It is an element in the assignments array, which, in turn, is an element in the sections array.
I am able to retrieve the entire document with the query
{ 'sections.assignments._id' : assignmentId }
However, what I want is just the assignment subdocument
{
"created_date": "2014-03-22T10:30:55.621Z",
"_id": "532d665f89ae4ae703b29733",
"questions": []
}
Is there a way to accomplish such query? Should I resolve to have assignment in a different collection?
As of mongoose version 6.x, the accepted answer is not valid any more because $elemMatch cannot be used any more on nested documents, instead, aggregate should be used.
if you want ti use an _id to find the document you should convert the _id you get as argument to native mongoDb _id format otherwise it will be constructed as a string and an error will occur.
const native_id = mongoose.Types.ObjectId(id);
const assignment = await <your_model_here>.aggregate([
{ $unwind: "$sections" },
{ $unwind: "$sections.assignments" },
{ $match: { "sections.assignments._id": native_id } },
{ $project: { _id: true, sections: "$sections.assignments" } }
]
)
console.log(assignment) // you have what you want
you can do a aggregate query like this :
db.collection.aggregate(
{$unwind: "$sections"},
{$unwind: "$sections.assignments"},
{$match: {"sections.assignments._id": "532d665f89ae4ae703b29731"}},
{$project: {_id: false, assignments: "$sections.assignments"}}
)
However, I recommends you to think about creating more collections, like you said.
More collections seems to me a better solution then this query.
To retrieve a subset of the elements of an array, you'll need to use the $elemMatch projection operator.
db.collection.find(
{"sections.assignments._id" : assignmentId},
{"sections.assignments":{$elemMatch:{"_id":assignmentId}}}
)
Note:
If multiple elements match the $elemMatch condition, the operator returns the first matching element in the array.

Retrieve n-level deep sub-document in MongoDB

I have a deeply nested document in mongoDB and I would like to fetch individual sub-objects.
Example:
{
"schoolName": "Cool School",
"principal": "Joe Banks",
"rooms": [
{
"number": 100
"teacher": "Alvin Melvin"
"students": [
{
"name": "Bort"
"currentGrade": "A"
},
// ... many more students
]
},
// ... many more rooms
]
}
Recently Mongo updated to allow 1-level-deep sub-object retrieval using $elemMatch projection:
var projection = { _id: 0, rooms: { $elemMatch: { number: 100 } } };
db.schools.find({"schoolName": "Cool School"}, projection);
// returns { "rooms": [ /* array containing only the matching room */ ] }
But when I try to fetch a student (2 levels deep) in this same fashion, I get an error:
var projection = { _id: 0, "rooms.students": { $elemMatch: { name: "Bort" } } };
db.schools.find({"schoolName": "Cool School"}, projection);
// "$err": "Cannot use $elemMatch projection on a nested field (currently unsupported).", "code": 16344
Is there a way to retrieve arbitrarily deep sub-objects in a mongoDB document?
I am using Mongo 2.2.1
I recently asked a similar question and can provide a suitably general answer (see Using MongoDB's positional operator $ in a deeply nested document query)
This solution is only supported for Mongo 2.6+, but from then you can use the aggregation framework's $redact function.
Here is an example query which should return just your student Bort.
db.users.aggregate({
$match: { schoolName: 'Cool School' }
}, {
$project: {
_id: 0,
'schoolName': 1,
'rooms.number': 1,
'rooms.students': 1
}
}, {
$redact: {
$cond: {
"if": {
$or: [{
$gte: ['$schoolName', '']
}, {
$eq: ['$number', 100]
}]
},
"then": "$$DESCEND",
"else": {
$cond: {
"if": {
$eq: ['$name', 'Bort']
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}
}
}
});
$redact can be used to make sub-queries by matching or pruning sub-documents recursively in the matched documents.
You can read about $redact here to understand more about what's going on but the design pattern I've identified has the following requirements:
The redact condition is applied at each sub-document level so you need a unique field at each level e.g. you can't have number as a key on both rooms and students say
It only works on data fields not array indices so if you want to know the returned position of a nested document (for example to update it) you need to include that and maintain it in your documents
Each part of the $or statement in $redact should match the documents you want at a specific level
Therefore each part of the $or statement needs to include a match to the unique field of the document at that level. For example, $eq: ['$number', 100] matches the room with number 100
If you aren't specifying a query at a level, you need to still include the unique field. For example, if it is a string you can match it with $gte: ['$uniqueField': '']
The last document level goes in the second if expression so that all of that document is kept.
I don't have mongodb 2.2 handy at the moment, so I can't test this, but have you tried?
var projection = { _id: 0, rooms: { $elemMatch: { "students.name": "Bort" } } };
db.schools.find({"schoolName": "Cool School"}, projection);

Categories

Resources