MongoDB aggregate merge two different fields as one and get count

MongoDB aggregate merge two different fields as one and get count - javascript

I have following data in MongoDB:
[{id:3132, home:'NSH', away:'BOS'}, {id:3112, home:'ANA', away:'CGY'}, {id:3232, home:'MIN', away:'NSH'}]
Is it possible to get total game count for each team with aggregate pipeline?
desired result:
[{team: 'NSH', totalGames: 2}, {team:'MIN', totalGames: 1}, ...}]
i can get each on seperately to their own arrays with two aggregate calls:
[{$group: {_id: "$home", gamesLeft: {$sum: 1}}}]
and
[{$group: {_id: "$away", gamesLeft: {$sum: 1}}}]
resulting
var homeGames = [ { _id: 'NSH', totalGames: 1 }, { _id: 'SJS', totalGames: 2 }, ...]
var awayGames = [ { _id: 'NSH', totalGames: 1 }, { _id: 'SJS', totalGames: 4 }, ...]
But i really want to get it working with just one query. If not possible what would be the best way to combine these two results in to one using javascript?

After some puzzling, I found a way to get it done using an aggregate pipeline. Here is the result:
db.games.aggregate([{
$project: {
isHome: { $literal: [true, false] },
home: true,
away: true
}
}, {
$unwind: '$isHome'
}, {
$group: {
_id: { $cond: { if: '$isHome', then: '$home', else: '$away' } },
totalGames: { $sum: 1 }
}
}
]);
As you can see it consists of three stages. The first two are meant to duplicate each document into one for the home team and one for the away team. To do this, the project stage first creates a new isHome field on each document containing a true and a false value, which the unwind stage then splits into separate documents containing either the true or the false value.
Then in the group phase, we let the isHome field decide whether to group on the home or the away field.
It would be nicer if we could create a team field in the project step, containing the array [$home, $away], but mongo only supports adding array literals here, hence the workaround.

Related

How can i improve my query speed in MongoDB, NodeJS?

I have one collection who include some value coming from sensor. My collection look like this.
const MainSchema: Schema = new Schema(
{
deviceId: {
type: mongoose.Types.ObjectId,
required: true,
ref: 'Device',
},
sensorId: {
type: mongoose.Types.ObjectId,
default: null,
ref: 'Sensor',
},
value: {
type: Number,
},
date: {
type: Date,
},
},
{
versionKey: false,
}
);
I want to get data from this collection with my endpoint. This collection should has more 300.000 documents. I want to get data from this collection with sensor data. (like name and desc. to "Sensor")
My Sensor Collection:
const Sensor: Schema = new Schema(
{
name: {
type: String,
required: true,
min: 3,
},
description: {
type: String,
default: null,
},
type: {
type: String,
},
},
{
timestamps: true,
versionKey: false,
}
);
I use 2 method for get data from MainSchema. First approach is look like this (Include aggregate):
startDate, endDate and _sensorId are passed by parameter for this functions.
const data= await MainSchema.aggregate([
{
$lookup: {
from: 'Sensor',
localField: 'sensorId',
foreignField: '_id',
as: 'sensorDetail',
},
},
{
$unwind: '$sensorDetail',
},
{
$match: {
$and: [
{ sensorId: new Types.ObjectId(_sensorId) },
{
date: {
$gte: new Date(startDate),
$lt: new Date(endDate),
},
},
],
},
},
{
$project: {
sensorDetail: {
name: 1,
description: 1,
},
value: 1,
date: 1,
},
},
{
$sort: {
_id: 1,
},
},
]);
Second approach look like this (Include find and populate):
const data= await MainSchema.find({
sensorId: _sensorId,
date: {
$gte: new Date(startDate),
$lte: new Date(endDate),
},
})
.lean()
.sort({ date: 1 })
.populate('sensorId', { name: 1, description: 1});
Execution time for same data set:
First approach: 25 - 30 second
Second approach: 11 - 15 second
So how can i get this data more faster. Which one is best practise?
And how can i do extras for improve the query speed?

Overall #NeNaD's answer touches on a lot of the important points. What I'm going to say in this one should be considered in addition to that other information.
Index
Just to clarify, the ideal index here would be a compound index of { sensorId: 1, date: 1 }. This index follows the ESR Guidance for index key ordering and will provide the most efficient retrieval of the data according to the query predicates specified in the $match stage.
If the index: true annotation in Mongoose creates two separate single field indexes, then you should go manually create this index in the collection. MongoDB will only use one of those indexes to execute this query which will not be as efficient as using the compound index described above.
Also regarding the existing approach, what is the purpose of the trailing $sort?
If the application (a chart in this situation) does not need sorted results then you should remove this stage entirely. If the client does need sorted results then you should:
Move the $sort stage earlier in the pipeline (behind the $match), and
Test if including the sort field in the index improves performance.
As written, the $sort is currently a blocking operation which is going to prevent any results from being returned to the client until they are all processed. If you move the $sort stage up and can change it to sort on date (which probably makes sense for sensor data) the it should automatically use the compound index that we mentioned earlier to provide the sort in a non-blocking manner.
Stage Ordering
Ordering of aggregation stages is important, both for semantic purposes as well as for performance reasons. The database itself will attempt to do various things (such as reordering stages) to improve performance so long as it does not logically change the result set in any way. Some of these optimizations are described here. As these are version specific anyway, you can always take a look at the explain plan to get a better indication of what specific changes the database has applied. The fact that performance did not improve when you manually moved the $match to the beginning (which is generally a best practice) could suggest that the database was able to automatically do that on your behalf.
Schema
I'm a little curious about the schema itself. Is there any reason that there are two separate collections here?
My guess is that this is mostly a play at 'normalization' to help reduce data duplication. That is mostly fine, unless you find yourself constantly performing $lookups like this for most of your read operations. You could certainly consider testing what performance (and storage) looks like if you combine them.
Also for this particular operation, would it make sense to just issue two separate queries, one to get the measurements and one to get the sensor data (a single time)? The aggregation matches on sensorId and the value of that field is what is then used to match against the _id field from the other collection. Unless I'm doing the logic wrong, this should be the same data for each of the source documents.
Time Series Collections
Somewhat related to schema, have you looked into using Time Series Collections? I don't know what your specific goals or pain points are, but it seems that you may be working with IoT data. Time Series collections are purpose-built to help handle use cases like that. Might be worth looking into as they may help you achieve your goals with less hassle or overhead.

Frist step
Create index for sensorId and date properties in the collection. You can do it by specifying index: true in your model:
const MainSchema: Schema = new Schema(
{
deviceId: { type: mongoose.Types.ObjectId, required: true, ref: 'Device' },
sensorId: { type: mongoose.Types.ObjectId, default: null, ref: 'Sensor', index: true },
value: { type: Number },
date: { type: Date, index: true },
},
{
versionKey: false,
}
);
Second step
Aggregation queries can take leverage of indexes only if your $match stage is the first stage in the pipeline, so you should change the order of the items in your aggregation query:
const data= await MainSchema.aggregate([
{
$match: {
{ sensorId: new Types.ObjectId(_sensorId) },
{
date: {
$gte: new Date(startDate),
$lt: new Date(endDate),
},
},
},
},
{
$lookup: {
from: 'Sensor',
localField: 'sensorId',
foreignField: '_id',
as: 'sensorDetail',
},
},
{
$unwind: '$sensorDetail',
},
{
$project: {
sensorDetail: {
name: 1,
description: 1,
},
value: 1,
date: 1,
},
},
{
$sort: {
_id: 1,
},
},
]);

Mongo aggregate – return zero count

I need your help in aggregate functions in Mongo.
I have such aggregation:
const likes = await this.aggregate([
{
$match: { post: postId },
},
{
$group: {
_id: '$likeType',
count: { $sum: 1 },
},
},
]);
It collects all likes/dislikes for a post and returns this:
[ { _id: 'pos', count: 40 }, { _id: 'neg', count: 3 } ]
I faced a problem: if there is only one type of likes (for example only 'pos'), it returns this:
[ { _id: 'pos', count: 40 } ]
But I need this array to show zero value too:
[ { _id: 'pos', count: 40 }, { _id: 'neg', count: 0 } ]
Is there any way to set default values for all types of _ids?
I understand that it can't find any 'neg's and it can't return them. So I want to set defaults to let the system know, that there are only two types: 'pos' and 'neg'.
Are there any solutions for such cases?
Thanks!

My suggestion is:
Get distinct Ids: https://docs.mongodb.com/manual/reference/method/db.collection.distinct/
Do your search with your query param.
Filter distinct Ids which is not your query param. Append default values to result.

Nested queries using mongoose

I'm using collection espData which contains documents of the following type:
{
mac: String,
hash: String,
rssi: Number
}
Using Mongoose I want to select those lines with same mac and same hash and if the count(*) is equal to 2 then the line is selected. Then I want to perform an aggregation which return mac and the respectively average of the rssi.
I made this piece of code but it doesn't work.
EspDataModel.aggregate([
{ $group: {
_id:{
mac:"$mac",
hash:"$hash",
},
count: {$sum:1}
}},
{$match: {count : 2}
}
], function(err,result){
result.map(function(doc){
EspDataModel.aggregate([
{
$match: { mac: doc._id.mac, hash:doc._id.hash }
},
{
$group:{
mac:"$_id.mac",
averageRSSI: {$avg: "$rssi"}
}
}], function(err,result){
console.log(result)
})
})
})
The first aggregate works and effectively select those lines I'm interested in but is there a proper way to match mac and hash of the original collection and compute the average?
Thank you for your help!

Using $lookup on an array of objects to join two documents in MongoDB [duplicate]

This question already has answers here:
MongoDB join data inside an array of objects
(2 answers)
Closed 4 years ago.
I have many Shop documents that each contain a field products which is an array of objects where the key is the product ID and the value is the quantity in stock:
{ products: [{"a": 3}, {"b": 27}, {"c": 4}] }
I have a collection Products where each product has a document containing productId, name, etc:
{ productId: "a", "name": "Spanner"}
I would like to pull in/aggregate/join the product information for each of those items. I know how to do it when the field is a single ID, and I have seen this answer which describes how to do it for an array of IDs. But I am having a bit of trouble wrapping my head around what to do with an array of objects containing IDs.
Desired output:
{
products: [
{ {productId: "a", "name": "Spanner"}: 3 }
]
}
(And no, it is not within my control to switch to a relational database.)

I think if you want to using ID for reference, try to avoid place it as object keys, instead make it as object property like { products: [{"productId": $_id, "quantity": 3}]}, that could be a reason for downvote.
But if you cant change it, you can using $objectToArray in aggregation to convert your array.
One more thing, your desire output is unreal because object property in js cant not be an object.
Try it:
db.Shop.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path : "$products"
}
},
// Stage 2
{
$project: {
products: { $objectToArray: "$products" }
}
},
// Stage 3
{
$unwind: {
path : "$products"
}
},
// Stage 4
{
$project: {
productId: "$products.k",
productQuantity: "$products.v"
}
},
// Stage 5
{
$lookup: {
"from" : "products",
"localField" : "productId",
"foreignField" : "productId",
"as" : "products"
}
},
// Stage 6
{
$unwind: {
path : "$products"
}
},
// Stage 7
{
$project: {
productId: "$productId",
productQuantity: "$productQuantity",
productName: "$products.name"
}
},
]);
Good luck

Mongoose: Sorting

what's the best way to sort the following documents in a collection:
{"topic":"11.Topic","text":"a.Text"}
{"topic":"2.Topic","text":"a.Text"}
{"topic":"1.Topic","text":"a.Text"}
I am using the following
find.(topic:req.body.topic).(sort({topic:1}))
but is not working (because the fields are strings and not numbers so I get):
{"topic":"1.Topic","text":"a.Text"},
{"topic":"11.Topic","text":"a.Text"},
{"topic":"2.Topic","text":"a.Text"}
but i'd like to get:
{"topic":"1.Topic","text":"a.Text"},
{"topic":"2.Topic","text":"a.Text"},
{"topic":"11.Topic","text":"a.Text"}
I read another post here that this will require complex sorting which mongoose doesn't have. So perhaps there is no real solution with this architecture?
Your help is greatly appreciated

i will suggest you make your topic filed as type : Number, and create another field topic_text.
Your Schema would look like:
var documentSchema = new mongoose.Schema({
topic : Number,
topic_text : String,
text : String
});
Normal document would look something like this:
{document1:[{"topic":11,"topic_text" : "Topic" ,"text":"a.Text"},
{"topic":2,"topic_text" : "Topic","text":"a.Text"},
{"topic":1,"topic_text" : "Topic","text":"a.Text"}]}
Thus, you will be able to use .sort({topic : 1}) ,and get the result you want.
while using topic value, append topic_text to it.
find(topic:req.body.topic).sort({topic:1}).exec(function(err,result)
{
var topic = result[0].topic + result[0].topic_text;//use index i to extract the value from result array.
})

If you do not want (or maybe do not even can) change the shape of your documents to include a numeric field for the topic number then you can achieve your desired sorting with the aggregation framework.
The following pipeline essentially splits the topic strings like '11.Topic' by the dot '.' and then prefixes the first part of the resulting array with a fixed number of leading zeros so that sorting by those strings will result in 'emulated' numeric sorting.
Note however that this pipeline uses $split and $strLenBytes operators which are pretty new so you may have to update your mongoDB instance - I used version 3.3.10.
db.getCollection('yourCollection').aggregate([
{
$project: {
topic: 1,
text: 1,
tmp: {
$let: {
vars: {
numStr: { $arrayElemAt: [{ $split: ["$topic", "."] }, 0] }
},
in: {
topicNumStr: "$$numStr",
topicNumStrLen: { $strLenBytes: "$$numStr" }
}
}
}
}
},
{
$project: {
topic: 1,
text: 1,
topicNumber: { $substr: [{ $concat: ["_0000", "$tmp.topicNumStr"] }, "$tmp.topicNumStrLen", 5] },
}
},
{
$sort: { topicNumber: 1 }
},
{
$project: {
topic: 1,
text: 1
}
}
])

Develop Reference

JavaScript is the programming language of the Web.