I have the following schema:
{ "_id": {
"$oid": "58c0204d9f10810115f13e5d"
},"OrgName": "A",
"modules": [
{
"name": "test",
"fullName": "john smith",
"_id": {
"$oid": "58c0204d9f10810115f13e5e"
},
"TimeSavedPlanning": 520,
"TimeSavedWorking": 1000,
"costSaved": 0
},
{
"name": "test1",
"fullName": "john smith",
"_id": {
"$oid": "58c020f85437c22215be92cc"
},
"TimeSavedPlanning": 0,
"TimeSavedWorking": 1000,
"costSaved": 500
}
]
}
I want to aggregate the data within the "modules" array for all documents where OrgName = A and outputs the following totals.
TimeSavedPlanning = 520 (because 520 + 0 = 520)
TimeSavedWorking = 2000 (because 1000 + 1000 = 2000)
costSaved = 500 (because 0 + 500)
Just supply each field for the $group accumulators. And use the "double barreled" $sum to "sum" both from arrays, and from documents:
Model.aggregate([
{ "$match": { "OrgName": "A" } },
{ "$group": {
"_id": null,
"TimeSavedPlanning": { "$sum": { "$sum":"$modules.TimeSavedPlanning" } },
"TimeSavedWorking": { "$sum": { "$sum": "$modules.TimeSavedWorking" } },
"costSaved": { "$sum": { "$sum": { "$modules.costSaved" } }
}}
])
You have been allowed to use $sum like that since MongoDB 3.2. Since that release it has "two" functions:
Takes an "array" of values and "sums" them together.
Acts and an "accumulator" within $group to "sum" values provided from documents.
So here you use "both" functions by "reducing" the arrays down to numeric values per document, and then "accumulating" via the $group.
Of course the $match does the "selection" right at the beginning of the operation chain. Since that determines the selection of data, and you put that there for that purpose, as well as the fact you can use an "index" from that "first" stage.
Related
I'm using Mongoose in a Node.js backend and I need to update a subset of elements of an array within a document based on a condition. I used to perform the operations using save(), like this:
const channel = await Channel.findById(id);
channel.messages.forEach((i) =>
i._id.toString() === messageId && i.views < channel.counter
? i.views++
: null
);
await channel.save();
I'd like to change this code by using findByIdAndUpdate since it is only an increment and for my use case, there isn't the need of retrieving the document. Any suggestion on how I can perform the operation?
Of course, channel.messages is the array under discussion. views and counter are both of type Number.
EDIT - Example document:
{
"_id": {
"$oid": "61546b9c86a9fc19ac643924"
},
"counter": 0,
"name": "#TEST",
"messages": [{
"views": 0,
"_id": {
"$oid": "61546bc386a9fc19ac64392e"
},
"body": "test",
"sentDate": {
"$date": "2021-09-29T13:36:03.092Z"
}
}, {
"views": 0,
"_id": {
"$oid": "61546dc086a9fc19ac643934"
},
"body": "test",
"sentDate": {
"$date": "2021-09-29T13:44:32.382Z"
}
}],
"date": {
"$date": "2021-09-29T13:35:33.011Z"
},
"__v": 2
}
You can try updateOne method if you don't want to retrieve document in result,
match both fields id and messageId conditions
check expression condition, $filter to iterate loop of messages array and check if messageId and views is less than counter then it will return result and $ne condition will check the result should not empty
$inc to increment the views by 1 if query matches using $ positional operator
messageId = mongoose.Types.ObjectId(messageId);
await Channel.updateOne(
{
_id: id,
"messages._id": messageId,
$expr: {
$ne: [
{
$filter: {
input: "$messages",
cond: {
$and: [
{ $eq: ["$$this._id", messageId] },
{ $lt: ["$$this.views", "$counter"] }
]
}
}
},
[]
]
}
},
{ $inc: { "messages.$.views": 1 } }
)
Playground
I am trying to count distinct(not unique) or Emp No in same department.but getting error
query failed: unknown group operator '$group'
here is my code
https://mongoplayground.net/p/UvYF9NB7vZx
db.collection.aggregate([
{
$group: {
_id: "$Department",
total: {
"$group": {
_id: "$Emp No"
}
}
}
}
])
Expected output
[
{
"_id": "HUAWEI”,
“total”:1
},
{
"_id": "THBS”,
“total”:2
}
]
THBShave two different Emp No A10088P2C and A20088P2C
HUAWEI have only one Emp No A1016OBW
so, $group is Pipeline stage, you can only use it in upper level.
But for your required output there is lots of ways i believe,
we can do something like this as well:
db.collection.aggregate([
{
$group: {
_id: {
dept: "$Department",
emp: "$Emp No"
},
total: {
"$sum": 1
}
}
},
{
$group: {
_id: "$_id.dept",
total: {
"$sum": 1
}
}
}
])
Here, in first stage we are grouping with Department and its Emp No , and also we are having count of how many Emp No is in each dept.
[this count you can remove though as we are not using it.]
result of this stage will be:
[
{
"_id": {
"dept": "THBS",
"emp": "A10088P2C"
},
"total": 2
},
{
"_id": {
"dept": "THBS",
"emp": "A20088P2C"
},
"total": 1
},
{
"_id": {
"dept": "HUAWEI",
"emp": "A1016OBW"
},
"total": 3
}
]
next on top of this part data, i'm grouping again, with the dept. which comes in $_id.dept, and making count in the same way, which gives the result in your required format.
[
{
"_id": "HUAWEI",
"total": 1
},
{
"_id": "THBS",
"total": 2
}
]
Demo
Is it possible to have facet to return as an object instead of an array? It seems a bit counter intuitive to need to access result[0].total instead of just result.total
code (using mongoose):
Model
.aggregate()
.match({
"name": { "$regex": name },
"user_id": ObjectId(req.session.user.id),
"_id": { "$nin": except }
})
.facet({
"results": [
{ "$skip": start },
{ "$limit": finish },
{
"$project": {
"map_levels": 0,
"template": 0
}
}
],
"total": [
{ "$count": "total" },
]
})
.exec()
Each field you get using $facet represents separate aggregation pipeline and that's why you always get an array. You can use $addFields to overwrite existing total with single element. To get that first item you can use $arrayElemAt
Model
.aggregate()
.match({
"name": { "$regex": name },
"user_id": ObjectId(req.session.user.id),
"_id": { "$nin": except }
})
.facet({
"results": [
{ "$skip": start },
{ "$limit": finish },
{
"$project": {
"map_levels": 0,
"template": 0
}
}
],
"total": [
{ "$count": "total" },
]
})
.addFields({
"total": {
$arrayElemAt: [ "$total", 0 ]
}
})
.exec()
You can try this as well
Model
.aggregate()
.match({
"name": { "$regex": name },
"user_id": ObjectId(req.session.user.id),
"_id": { "$nin": except }
})
.facet({
"results": [
{ "$skip": start },
{ "$limit": finish },
{
"$project": {
"map_levels": 0,
"template": 0
}
}
],
"total": [
{ "$count": "total" },
]
})
.addFields({
"total": {
"$ifNull": [{ "$arrayElemAt": [ "$total.total", 0 ] }, 0]
}
})
.exec()
imagine that you want to pass the result of $facet to the next stage, let's say $match. well $match accepts an array of documents as input and return an array of documents that matched an expression, if the output of $facet was just an element we can't pass its output to $match because the type of output of $facet is not the same as the type of input of $match ($match is just an example). In my opinion it's better to keep the output of $facet as array to avoid handling those types of situations.
PS : nothing official in what i said
My data looks like this:
{
"foo_list": [
{
"id": "98aa4987-d812-4aba-ac20-92d1079f87b2",
"name": "Foo 1",
"slug": "foo-1"
},
{
"id": "98aa4987-d812-4aba-ac20-92d1079f87b2",
"name": "Foo 1",
"slug": "foo-1"
},
{
"id": "157569ec-abab-4bfb-b732-55e9c8f4a57d",
"name": "Foo 3",
"slug": "foo-3"
}
]
}
Where foo_list is a field in a model called Bar. Notice that the first and second objects in the array are complete duplicates.
Aside from the obvious solution of switching to PostgresSQL, what MongoDB query can I run to remove duplicate entries from foo_list?
Similar answers that do not quite cut it:
https://stackoverflow.com/a/16907596/432
https://stackoverflow.com/a/18804460/432
These questions answer the question if the array had bare strings in it. However in my situation the array is filled with objects.
I hope it is clear that I am not interested querying the database; I want the duplicates to be gone from the database forever.
Purely from an aggregation framework point of view there are a few approaches to this.
You can either just apply $setUnion in modern releases:
db.collection.aggregate([
{ "$project": {
"foo_list": { "$setUnion": [ "$foo_list", "$foo_list" ] }
}}
])
Or more traditionally with $unwind and $addToSet:
db.collection.aggregate([
{ "$unwind": "$foo_list" },
{ "$group": {
"_id": "$_id",
"foo_list": { "$addToSet": "$foo_list" }
}}
])
Or if you were just interested in the duplicates only then by general grouping:
db.collection.aggregate([
{ "$unwind": "$foo_list" },
{ "$group": {
"_id": {
"_id": "$_id",
"foo_list": "$foo_list"
},
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$ne": 1 } } },
{ "$group": {
"_id": "$_id._id",
"foo_list": { "$push": "$_id.foo_list" }
}}
])
The last form could be useful to you if you actually want to "remove" the duplicates from your data with another update statement as it identifies the elements which are duplicates.
So in that last form the returned result from your sample data identifies the duplicate:
{
"_id" : ObjectId("53f5f7314ffa9b02cf01c076"),
"foo_list" : [
{
"id" : "98aa4987-d812-4aba-ac20-92d1079f87b2",
"name" : "Foo 1",
"slug" : "foo-1"
}
]
}
Where results are returned from your collection per document that contains duplicate entries in the array and which entries are duplicated. This is the information you need to update, and you loop the results as you need to specify the update information from the results in order to remove duplicates.
This is actually done with two update statements per document, as a simple $pull operation would remove "both" items, which is not what you want:
var cursor = db.collection.aggregate([
{ "$unwind": "$foo_list" },
{ "$group": {
"_id": {
"_id": "$_id",
"foo_list": "$foo_list"
},
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$ne": 1 } } },
{ "$group": {
"_id": "$_id._id",
"foo_list": { "$push": "$_id.foo_list" }
}}
])
var batch = db.collection.initializeOrderedBulkOp();
var count = 0;
cursor.forEach(function(doc) {
doc.foo_list.forEach(function(dup) {
batch.find({ "_id": doc._id, "foo_list": { "$elemMatch": dup } }).updateOne({
"$unset": { "foo_list.$": "" }
});
batch.find({ "_id": doc._id }).updateOne({
"$pull": { "foo_list": null }
});
});
count++;
if ( count % 500 == 0 ) {
batch.execute();
batch = db.collection.initializeOrderedBulkOp();
}
});
if ( count % 500 != 0 ) {
batch.execute();
}
That's the modern MongoDB 2.6 and above way to do it, with a cursor result from aggregation and Bulk operations for updates. But the principles remain the same:
Identify the duplicates in documents
Loop the results to issue the updates to the affected documents
Use $unset with the positional $ operator to set the "first" matched array element to null
Use $pull to remove the null entry from the array
So after processing the above operations your sample now looks like this:
{
"_id" : ObjectId("53f5f7314ffa9b02cf01c076"),
"foo_list" : [
{
"id" : "98aa4987-d812-4aba-ac20-92d1079f87b2",
"name" : "Foo 1",
"slug" : "foo-1"
},
{
"id" : "157569ec-abab-4bfb-b732-55e9c8f4a57d",
"name" : "Foo 3",
"slug" : "foo-3"
}
]
}
The duplicate is removed with the "duplicated" item still intact. That is how you process to identify and remove the duplicate data from your collection.
Can someone please help me update a collection based on another? I have a pickups collection like so.
{
"_id": {
"$oid": "53a46be700b94521574b6f75"
},
"created": {
"$date": 1403236800000
},
"receivers": [
{
"model": "somemodel1",
"serial": "someserial1",
"access": "someaccess1"
},
{
"model": "somemodel2",
"serial": "someserial2",
"access": "someaccess2"
},
{
"model": "somemodel3",
"serial": "someserial3",
"access": "someaccess3"
}
],
"__v": 0
}
I would like to iterate through the receivers array and search each access in another collection and if found add the activity it was found in.
Here is the workorders collection I want to search in.
{
"_id": {
"$oid": "53af72481b2aeade0b46d025"
},
"activityNumber": "someactivity",
"date": "06/28/2014",
"lines": [
{
"Line #": "1",
"Access Card #": "someaccess1"
},
{
"Line #": "2",
"Access Card #": "someaccess2"
},
{
"Line #": "3",
"Access Card #": "someacess3"
}
],
}
And this is what I would like to end up with.
{
"_id": {
"$oid": "53a46be700b94521574b6f75"
},
"created": {
"$date": 1403236800000
},
"receivers": [
{
"model": "somemodel1",
"serial": "someserial1",
"access": "someaccess1",
"activityNumber": "someactivity"
},
{
"model": "somemodel2",
"serial": "someserial2",
"access": "someaccess2",
"activityNumber": "someactivity"
},
{
"model": "somemodel3",
"serial": "someserial3",
"access": "someaccess3",
"activityNumber": "someactivity"
}
],
"__v": 0
}
I have created an array containing all the access from pickups.
var prodValues = db.pickups.aggregate([
{ "$unwind":"$receivers" },
{ "$group": {
"_id": null,
"products": { "$addToSet": "$receivers.access"}
}}
])
I can easily iterate through the array and search the workorders colleciton and return the activity these are used in. But I'm not sure how to perform a find and update the pickups collection when found.
db.workorders.find({ "lines.Access Card #": { "$in": prodValues.result[0].products }},{activityNumber:1})
Thank you for your help.
Really I would loop this in the completely opposite order as that should be more efficient:
var result = db.workorders.aggregate([
{ "$project": {
"activityNumber": 1,
"access": "$lines.Access Card #",
}}
]).result;
result.forEach(function(res) {
res.access.forEach(function(acc) {
db.pickups.update(
{ "receivers.access": acc },
{ "$set": { "receivers.$.activityNumber": res.activityNumber } }
);
});
});
With MongDB 2.6 you can clean this up a bit with a cursor on the aggregate output and the use of the bulk operations API:
var batch = db.pickups.initializeOrderedBulkOp();
var counter = 0;
db.workorders.aggregate([
{ "$project": {
"activityNumber": 1,
"access": "$lines.Access Card #",
}}
]).forEach(function(res) {
res.access.forEach(function(acc) {
batch.find({ "receivers.access": acc }).updateOne(
{ "$set": { "receivers.$.activityNumber": res.activityNumber } }
);
});
if ( counter % 500 == 0 ) {
batch.execute();
var batch = db.pickups.initializeOrderedBulkOp();
counter = 0;
}
});
if ( counter > 0 )
batch.execute();
Either way, you are basically matching the document and the position of the array on the values of "access" returned from the first aggregation query, and in the current line. This allows the update of the related information at the specified position.
The MongoDB 2.6 improvements are that you are not pulling all the results out of the "workoders" collection into memory as an array, so only each document is pulled in from the the cursor results.
The Bulk operations actions store the "updates" in manageable blocks that should fall under the 16MB BSON limit and then you send this in those blocks instead of individual updates. The driver implementation should handle most of this, but there is some "self management" added in just to be safe.