MongoDB 3.0.7 and Mongoose 4.3.4.
Schema:
var schema = new mongoose.Schema({
confirmed: { type: Boolean, default: false },
moves: [new mongoose.Schema({
name: { type: String, default: '' },
live: { type: Boolean, default: true }
})]
});
mongoose.model('Batches', schema);
Query:
var Batch = mongoose.model('Batches');
var query = {
confirmed: true,
moves: {
$elemMatch: {
live: true
}
}
};
Batch.find(query).exec(function(err, batches){
console.log('batches: ', batches);
});
I need to return all batches that are confirmed, and all moves within the returned batches that are live.
At the moment, the above is returning only the confirmed batches (which is what I want), but all the moves in each returned batch (which is not what I want). So the limiting of moves by the live flag is not working.
How do I limit the sub-documents that are returned..?
Ideally, I would like to keep everything that controls the data returned within query passed to find, and not have to call more methods on Batch.
For Mongoose versions >=4.3.0 which support MongoDB Server 3.2.x, you can use the $filter operator with the aggregation framework to limit/select the subset of the moves array to return based on the specified condition. This returns an array with only those elements that match the condition, so you will use it in the $project stage to modify the moves array based on the filter above.
The following example shows how you can go about this:
var Batch = mongoose.model('Batches'),
pipeline = [
{
"$match": { "confirmed": true, "moves.live": true }
},
{
"$project": {
"confirmed": 1,
"moves": {
"$filter": {
"input": "$moves",
"as": "el",
"cond": { "$eq": [ "$$el.live", true ] }
}
}
}
}
];
Batch.aggregate(pipeline).exec(function(err, batches){
console.log('batches: ', batches);
});
or with the fluent aggregate() API pipeline builder:
Batch.aggregate()
.match({
"$match": { "confirmed": true, "moves.live": true }
})
.project({
"confirmed": 1,
"moves": {
"$filter": {
"input": "$moves",
"as": "el",
"cond": { "$eq": [ "$$el.live", true ] }
}
}
})
.exec(function(err, batches){
console.log('batches: ', batches);
});
For Mongoose versions ~3.8.8, ~3.8.22, 4.x which support MongoDB Server >=2.6.x, you could filter out the false values using a combination of the $map and $setDifference operators:
var Batch = mongoose.model('Batches'),
pipeline = [
{
"$match": { "confirmed": true, "moves.live": true }
},
{
"$project": {
"confirmed": 1,
"moves": {
"$setDifference": [
{
"$map": {
"input": "$moves",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.live", true ] },
"$$el",
false
]
}
}
},
[false]
]
}
}
}
];
Batch.aggregate(pipeline).exec(function(err, batches){
console.log('batches: ', batches);
});
The query does not limit moves by the live flag. The query reads: find all confirmed batches with at least one live move.
There are 2 options to retrieve live moves only: retrieve all moves, and filter the array clientside; or map-reduce it serverside - unwind all moves, filter live ones, and group by document id.
The former is simpler to implement, but will result with more data transfer, cpu and memory consumption on the client side. The later is more efficient, but a bit more complex to implement - if you expect more than 16Mb in the response, you will need to use a temporary collection.
You can use the aggregation framework's $unwind method to split them into separate documents, here are sample codes.
Batches.aggregate(
{ $match: {'confirmed': true, 'moves.live': true}},
{$unwind: '$moves'},
{$project: {
confirmed: 1
name: '$moves.name',
live:'$moves.live'
}
}, function(err, ret){
})
Related
I am stuck in a problem where I have a field which is sometimes string and sometimes the output of that field is in array so how can i tackle that in $addField query
I am sharing my mongo query code
db.ledger_scheme_logs.aggregate([
{
$match:{
"type":{ $in: ["add","edit"]},
}
},
{
"$addFields": {
"trail_beginning": {
$substr: [ "$metadata.schemes._trail", 0, 36 ]
}
}
},
{
$group: {
"_id": {
"trail_beginning":"$trail_beginning"
},
"count": { $sum: 1 },
"items": { $push: "$$ROOT" },
}
},
{
"$sort": {
count: -1
}
}
])
In this query the "$metadata.schemes._trail" here schemes is in array in some array of objects and because of that I am getting mongo error -> "message" : "can't convert from BSON type array to String" so how can I solve this type of problem any help with example would be appreciated.
Thanks in advance!
The bigger and trickier question here is about what behavior you would like the system to have rather than how to actually make the database do it. There's a closely related topic around (consistent) schema design that naturally follows.
To directly answer your question, you can use the $cond operator to conditionally calculate the new trail_beginning field based on the data type of the source document currently being processed. An example would be something like:
{
"$addFields": {
"trail_beginning": {
"$cond": {
"if": {
$eq: [
{
$type: "$metadata.schemes"
},
"array"
]
},
"then": {
"$map": {
"input": "$metadata.schemes._trail",
"in": {
$substr: [
"$$this",
0,
3
]
}
}
},
"else": {
$substr: [
"$metadata.schemes._trail",
0,
3
]
}
}
}
}
}
Using two sample documents with different schemas yields the following as demonstrated in this playground example:
[
{
"_id": 1,
"metadata": {
"schemes": {
"_trail": "ABCDEFG"
}
},
"trail_beginning": "ABC"
},
{
"_id": 2,
"metadata": {
"schemes": [
{
"_trail": "HIJKLMN"
},
{
"_trail": "OPQRSTU"
}
]
},
"trail_beginning": [
"HIJ",
"OPQ"
]
}
]
Taking a glance at the rest of your pipeline though, I suspect (but can't say for sure) that this isn't actually what you want to do. This is because the subsequent $group will use the entire array of values to do the grouping, but I'm (again) guessing that you want to group based on individual values.
If my assumptions are correct, then logically what you really want to do is $unwind the array first before you do the substring transformation. This will correct the subsequent grouping logic and, as a side effect, it will also eliminate your problem of having different possible input types during the $addFields stage. Your full pipeline would look something like this:
db.ledger_scheme_logs.aggregate([
{
$match:{
"type":{ $in: ["add","edit"]},
}
},
{
$unwind: "$metadata.schemes"
},
{
"$addFields": {
"trail_beginning": {
$substr: [ "$metadata.schemes._trail", 0, 36 ]
}
}
},
{
$group: {
"_id": {
"trail_beginning":"$trail_beginning"
},
"count": { $sum: 1 },
"items": { $push: "$$ROOT" },
}
},
{
"$sort": {
count: -1
}
}
])
Playground demonstration (using a shorter substring) here.
This works because $unwind will treat non-array field paths as a single element array. However, having a discrepancy in the schema may frequently result in you having to put in special conditional logic to account for the difference in various places in the application. Consider simplifying development by making the schema consistent (converting the non-arrays to arrays with single values).
I am trying to make a healthcheck on references in one of my collections. so to see if objects referenced to still exist and if not I want to delete that _id in the array
I haven't found anything to that so my idea is to get the reversed result of a $lookup
Is it possible to get the reversed result of a lookup in MongoDB?
Here is an example of a collection and its taskList with references to the tasks collection.
Now I want to delete all the id's in there that do not have an existing result in the tasks collection.
How I solve it right now which is tons of queries:
get all the ids from taskList
Send a query for every single one of them to see if there is no match with the task collection
Send a query to pull that empty reference out of the array
I think this does what you want, its ok even if you have big collections.
But its not an update you can do after that a $merge stage, to the tasklists (if match on _id replace)(requires MongoDB >= 4.4) or you can do a $out stage to another collection, and replace the tasklist collection.
Test code here
Data in
db={
"tasklists": [
{
"_id": 1,
"tasklist": [
1,
2,
3,
4
]
},
{
"_id": 2,
"tasklist": [
5,
6,
7
]
}
],
"tasks": [
{
"_id": 1
},
{
"_id": 2
},
{
"_id": 3
},
{
"_id": 5
}
]
}
db.tasklists.aggregate([
{
"$lookup": {
"from": "tasks",
"let": {
"tasklist": "$tasklist"
},
"pipeline": [
{
"$match": {
"$expr": {
"$in": [
"$_id",
"$$tasklist"
]
}
}
}
],
"as": "valid"
}
},
{
"$addFields": {
"valid": {
"$map": {
"input": "$valid",
"as": "v",
"in": "$$v._id"
}
}
}
},
{
"$addFields": {
"tasklist": {
"$filter": {
"input": "$tasklist",
"as": "t",
"cond": {
"$in": [
"$$t",
"$valid"
]
}
}
}
}
},
{
"$unset": [
"valid"
]
}
])
Results (tasks 4,6,7 wasnt found in the task collection,and removed)
[
{
"_id": 1,
"tasklist": [
1,
2,
3
]
},
{
"_id": 2,
"tasklist": [
5
]
}
]
Edit
If you want to use index to do the $lookup you can try this
Test code here
Tasks have index on _id so no need to make one, if you dont join on _id make one.
db.tasklists.aggregate([
{
"$unwind": {
"path": "$tasklist"
}
},
{
"$lookup": {
"from": "tasks",
"localField": "tasklist",
"foreignField": "_id",
"as": "joined"
}
},
{
"$match": {
"$expr": {
"$gt": [
{
"$size": "$joined"
},
0
]
}
}
},
{
"$unset": [
"joined"
]
},
{
"$group": {
"_id": "$_id",
"tasklist": {
"$push": "$tasklist"
},
"afield": {
"$first": "$afield"
}
}
}
])
After that you can do $out or $merge with replace option.
But both lose the updated data if any while this was happening.
Only solution for this(if it is a problem) $merge with pipeline,
You need to keep also in the pipeline above an extra array with the initial tasklist, so you remove the valid ones, to have the invalid ones, and then on merge with pipeline to filter the array, and just removed those invalid. (this is safe, from data loss)
I think the best approach instead of doing all those is to have an index on tasklist(multikey index) and when an _id is deleted from tasks,to delete the _id from the array in tasklist.With index it will be fast, so you dont need to check for invalid _ids.
Afaik there's no other way than you described in order to achieve the desired outcome, but you can greatly simplify the second step to find the non-matching items. In fact it's the set difference between the taskList-ids and the existing task-ids.
So you could use the $setDifference-operator to calculate that difference:
db.tasks.aggregate([
{
$group: {
_id: "null",
ids: {
"$addToSet": "$_id"
}
}
},
{
$project: {
nonMatchingTaskIds: {
$setDifference: [
[
"taskId1",
"taskId2",
"taskId7",
"taskId8"
],
"$ids"
]
}
}
}
])
Assuming your tasks collection contains taskId1, task2 (and other documents), but not taskId7 and taskId8, the query will result in nonMatchingTaskIds containing taskId7 and taskId8.
Here's an example on mongoplayground: https://mongoplayground.net/p/75BpiGBJi3Q
So what I came to do now is a few stepped method.
This is quite fast but sicne the taskIds collected from Sets are currently way smaller than the entire amount of sets I imagine working with the $setDifference operator mentioned by eol will be faster once I get that many references.
let taskIdsInSets = []
// Get all referenced task ids
const result = await this.setSchema.aggregate([
{
'$project': {
'taskList': 1
}
}
])
// Map all elements in one row
result.forEach(set => taskIdsInSets.push(...set.taskList.map(x=> x.toString())))
// Delete duplicates of taskIds here
taskIdsInSets.filter((item, index) => taskIdsInSets.indexOf(item) != index)
// Get the existing task ids that are referenced in a Set
const result2 = await this.taskSchema.aggregate([
{
'$match': {
'_id': {
'$in': [...taskIdsInSets.map(x => Types.ObjectId(x.toString()))]
}
}
}, {
'$project': {
'_id': 1
}
}
])
let existingIdsInTasks = []
// Getting ids from result2 Object into
result2.forEach(set => existingIdsInTasks.push(set._id.toString()))
// Filtering out the ids that don't actually exist
let nonExistingTaskIds = taskIdsInSets.filter(x => existingIdsInTasks.indexOf(x) === -1);
// Deleting the ids that don't actually exist but are in Sets
const finalResult = await this.setSchema.updateMany(
{
$pullAll: {
taskList: [...nonExistingTaskIds.map(x => Types.ObjectId(x.toString()))]
}
})
console.log(finalResult)
return finalResult // returns the information how much got changed. unfortunately in mongoose there isn't the option to use findAndModify with `{new:true}` or atleast I didn't manage to make it work.
for some reason what the database returns neither matches the Mongo ObjectId nor strings so I have to do some castings there.
I am working on versioning, We have documents based on UUIDs andjobUuids, andjobUuids are the documents associated with the currently working user. I have some aggregate queries on these collections which I need to update based on the job UUIDs,
The results fetched by the aggregate query should be such that,
if the current usersjobUuid document does not exist then the master document with jobUuid: "default" will be returned(The document without any jobUuid),
if job uuid exists then only the document is returned.
I have a$match used to get these documents based on certain conditions, from those documents I need to filter out the documents based on the above conditions, and an example is shown below,
The data looks like this:
[
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"name": "adam",
"jobUuid": "default",
},
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "adam"
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"name": "eve",
"jobUuid": "default",
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "eve"
},
{
"uuid": "26cba689-7eb6-4a9e-a04e-24ede0309e50",
"name": "john",
"jobUuid": "default",
}
]
Results for "jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12" should be:
[
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "adam"
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "eve"
},
{
"uuid": "26cba689-7eb6-4a9e-a04e-24ede0309e50",
"name": "john",
"jobUuid": "default",
}
]
Based on the conditions mentioned above, is it possible to filter the document within the aggregate query to extract the document of a specific job uuid?
Edit 1: I got the following solution, which is working fine, I want a better solution, eliminating all those nested stages.
Edit 2: Updated the data with actual UUIDs and I just included only the name as another field, we do have n number of fields which are not relevant to include here but needed at the end (mentioning this for those who want to use the projection over all the fields).
Update based on comment:
but the UUIDs are alphanumeric strings, as shown above, does it have
an effect on these sorting, and since we are not using conditions to
get the results, I am worried it will cause issues.
You could use additional field to match the sort order to be the same order as values in the in expression. Make sure you provide the values with default as the last value.
[
{"$match":{"jobUuid":{"$in":["d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12","default"]}}},
{"$addFields":{ "order":{"$indexOfArray":[["d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12","default"], "$jobUuid"]}}},
{"$sort":{"uuid":1, "order":1}},
{
"$group": {
"_id": "$uuid",
"doc":{"$first":"$$ROOT"}
}
},
{"$project":{"doc.order":0}},
{"$replaceRoot":{"newRoot":"$doc"}}
]
example here - https://mongoplayground.net/p/wXiE9i18qxf
Original
You could use below query. The query will pick the non default document if it exists for uuid or else pick the default as the only document.
[
{"$match":{"jobUuid":{"$in":[1,"default"]}}},
{"$sort":{"uuid":1, "jobUuid":1}},
{
"$group": {
"_id": "$uuid",
"doc":{"$first":"$$ROOT"}
}
},
{"$replaceRoot":{"newRoot":"$doc"}}
]
example here - https://mongoplayground.net/p/KrL-1s8WCpw
Here is what I would do:
match stage with $in rather than an $or (for readability)
group stage with _id on $uuid, just as you did, but instead of pushing all the data into an array, be more selective. _id is already storing $uuid, so no reason to capture it again. name must always be the same for each $uuid, so take only the first instance. Based on the match, there are only two possibilities for jobUuid, but this will assume it will be either "default" or something else, and that there can be more than one occurrence of the non-"default" jobUuid. Using "$addToSet" instead of pushing to an array in case there are multiple occurrences of the same jobUuid for a user, also, before adding to the set, use a conditional to only add non-"default" jobUuids, using $$REMOVE to avoid inserting a null when the jobUuid is "default".
Finally, "$project" to clean things up. If element 0 of the jobUuids array does not exist (is null), there is no other possibility for this user than for the jobUuid to be "default", so use "$ifNull" to test and set "default" as appropriate. There could be more than 1 jobUuid here, depending if that is allowed in your db/application, up to you to decide how to handle that (take the highest, take the lowest, etc).
Tested at: https://mongoplayground.net/p/e76cVJf0F3o
[{
"$match": {
"jobUuid": {
"$in": [
"1",
"default"
]
}
}
},
{
"$group": {
"_id": "$uuid",
"name": {
"$first": "$name"
},
"jobUuids": {
"$addToSet": {
"$cond": {
"if": {
"$ne": [
"$jobUuid",
"default"
]
},
"then": "$jobUuid",
"else": "$$REMOVE"
}
}
}
}
},
{
"$project": {
"_id": 0,
"uuid": "$_id",
"name": 1,
"jobUuid": {
"$ifNull": [{
"$arrayElemAt": [
"$jobUuids",
0
]
},
"default"
]
}
}
}]
I was able to solve this problem with the following aggregate query,
We are first extracting the results matching only the jobUuid provided by the user or the "default" in the match section.
Then the results are grouped based on the uuid, using a group stage and we are counting the results as well.
Using the conditions in replaceRoot first we are checking the length of the grouped document,
If the grouped document length is greater than or equal to 2, we are
filtering the document that matches the provided jobUuid.
If it's less or equal to the 1, then we are checking if it's matching the default jobUuid and returning it.
The Query is below:
[
{
$match: {
$or: [{ jobUuid:1 },{ jobUuid: 'default'}]
}
},
{
$group: {
_id: '$uuid',
count: {
$sum: 1
},
docs: {
$push: '$$ROOT'
}
}
},
{
$replaceRoot: {
newRoot: {
$cond: {
if: {
$gte: [
'$count',
2
]
},
then: {
$arrayElemAt: [
{
$filter: {
input: '$docs',
as: 'item',
cond: {
$ne: [
'$$item.jobUuid',
'default'
]
}
}
},
0
]
},
else: {
$arrayElemAt: [
{
$filter: {
input: '$docs',
as: 'item',
cond: {
$eq: [
'$$item.jobUuid',
'default'
]
}
}
},
0
]
}
}
}
}
}
]
I currently have a MongoDB collection that looks like so:
{
{
"_id": ObjectId,
"user_id": Number,
"updates": [
{
"_id": ObjectId,
"mode": Number,
"score": Number
},
{
"_id": ObjectId,
"mode": Number,
"score": Number
},
{
"_id": ObjectId,
"mode": Number,
"score": Number
}
]
}
}
I am looking to find a way to find the users with the largest number of updates per mode. For instance, if I specify mode 0, I want it to load the users in order of greatest number of updates with mode: 0.
Is this possible in MongoDB? It does not need to be a fast algorithm, as it will be cached for quite a while, and it will run asynchronously.
The fastest way would be to store a count for each "mode" within the document as another field, then you could just sort on that:
var update = {
"$push": { "updates": updateDoc },
};
var countDoc = {};
countDoc["counts." + updateDoc.mode] = 1;
update["$inc"] = countDoc;
Model.update(
{ "_id": id },
update,
function(err,numAffected) {
}
);
Which would use $inc to increment a "counts" field for each "mode" value as a key for each "mode" pushed to the "updates" array. All the calculation happens on update, so it's fast and so is the query that can be applied with a sort on that value:
Model.find({ "updates.mode": 0 }).sort({ "counts.0": -1 }).exec(function(err,users) {
});
If you don't want to or cannot store such a field then the other option is to calculate at query time with .aggregate():
Model.aggregate(
[
{ "$match": { "updates.mode": 0 } },
{ "$project": {
"user_id": 1,
"updates": 1,
"count": {
"$size": {
"$setDifference": [
{ "$map": {
"input": "$updates",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.mode", 0 ] },
"$$el",
false
]
}
}},
[false]
]
}
}
}},
{ "$sort": { "count": -1 } }
],
function(err,results) {
}
);
Which isn't bad since the filtering of the array and getting the $size is fairly effecient, but it's not as fast as just using a stored value.
The $map operator allows inline processing of the array elements which are tested by $cond to see if it returns a match or false. Then $setDifference removes any false values. A much better way to filter array content than using $unwind, which can slow things down significantly and should not be used unless your intent to to aggregate array content across documents.
But the better approach is to store the value for the count instead, since this does not require runtime calculation and can even use an index
I think this is a duplicate of this question:
Mongo find query for longest arrays inside object
The accepted answer seem to be doing exactly what you ask for.
db.collection.aggregate( [
{ $unwind : "$l" },
{ $group : { _id : "$_id", len : { $sum : 1 } } },
{ $sort : { len : -1 } },
{ $limit : 25 }
] )
just replace "$l" with "$updates".
[edit:] and you probably do not want the result limited to 25, so you should also get rid of the { $limit : 25 }
I have a deeply nested document in mongoDB and I would like to fetch individual sub-objects.
Example:
{
"schoolName": "Cool School",
"principal": "Joe Banks",
"rooms": [
{
"number": 100
"teacher": "Alvin Melvin"
"students": [
{
"name": "Bort"
"currentGrade": "A"
},
// ... many more students
]
},
// ... many more rooms
]
}
Recently Mongo updated to allow 1-level-deep sub-object retrieval using $elemMatch projection:
var projection = { _id: 0, rooms: { $elemMatch: { number: 100 } } };
db.schools.find({"schoolName": "Cool School"}, projection);
// returns { "rooms": [ /* array containing only the matching room */ ] }
But when I try to fetch a student (2 levels deep) in this same fashion, I get an error:
var projection = { _id: 0, "rooms.students": { $elemMatch: { name: "Bort" } } };
db.schools.find({"schoolName": "Cool School"}, projection);
// "$err": "Cannot use $elemMatch projection on a nested field (currently unsupported).", "code": 16344
Is there a way to retrieve arbitrarily deep sub-objects in a mongoDB document?
I am using Mongo 2.2.1
I recently asked a similar question and can provide a suitably general answer (see Using MongoDB's positional operator $ in a deeply nested document query)
This solution is only supported for Mongo 2.6+, but from then you can use the aggregation framework's $redact function.
Here is an example query which should return just your student Bort.
db.users.aggregate({
$match: { schoolName: 'Cool School' }
}, {
$project: {
_id: 0,
'schoolName': 1,
'rooms.number': 1,
'rooms.students': 1
}
}, {
$redact: {
$cond: {
"if": {
$or: [{
$gte: ['$schoolName', '']
}, {
$eq: ['$number', 100]
}]
},
"then": "$$DESCEND",
"else": {
$cond: {
"if": {
$eq: ['$name', 'Bort']
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}
}
}
});
$redact can be used to make sub-queries by matching or pruning sub-documents recursively in the matched documents.
You can read about $redact here to understand more about what's going on but the design pattern I've identified has the following requirements:
The redact condition is applied at each sub-document level so you need a unique field at each level e.g. you can't have number as a key on both rooms and students say
It only works on data fields not array indices so if you want to know the returned position of a nested document (for example to update it) you need to include that and maintain it in your documents
Each part of the $or statement in $redact should match the documents you want at a specific level
Therefore each part of the $or statement needs to include a match to the unique field of the document at that level. For example, $eq: ['$number', 100] matches the room with number 100
If you aren't specifying a query at a level, you need to still include the unique field. For example, if it is a string you can match it with $gte: ['$uniqueField': '']
The last document level goes in the second if expression so that all of that document is kept.
I don't have mongodb 2.2 handy at the moment, so I can't test this, but have you tried?
var projection = { _id: 0, rooms: { $elemMatch: { "students.name": "Bort" } } };
db.schools.find({"schoolName": "Cool School"}, projection);