How can i improve my query speed in MongoDB, NodeJS?

How can i improve my query speed in MongoDB, NodeJS? - javascript

I have one collection who include some value coming from sensor. My collection look like this.
const MainSchema: Schema = new Schema(
{
deviceId: {
type: mongoose.Types.ObjectId,
required: true,
ref: 'Device',
},
sensorId: {
type: mongoose.Types.ObjectId,
default: null,
ref: 'Sensor',
},
value: {
type: Number,
},
date: {
type: Date,
},
},
{
versionKey: false,
}
);
I want to get data from this collection with my endpoint. This collection should has more 300.000 documents. I want to get data from this collection with sensor data. (like name and desc. to "Sensor")
My Sensor Collection:
const Sensor: Schema = new Schema(
{
name: {
type: String,
required: true,
min: 3,
},
description: {
type: String,
default: null,
},
type: {
type: String,
},
},
{
timestamps: true,
versionKey: false,
}
);
I use 2 method for get data from MainSchema. First approach is look like this (Include aggregate):
startDate, endDate and _sensorId are passed by parameter for this functions.
const data= await MainSchema.aggregate([
{
$lookup: {
from: 'Sensor',
localField: 'sensorId',
foreignField: '_id',
as: 'sensorDetail',
},
},
{
$unwind: '$sensorDetail',
},
{
$match: {
$and: [
{ sensorId: new Types.ObjectId(_sensorId) },
{
date: {
$gte: new Date(startDate),
$lt: new Date(endDate),
},
},
],
},
},
{
$project: {
sensorDetail: {
name: 1,
description: 1,
},
value: 1,
date: 1,
},
},
{
$sort: {
_id: 1,
},
},
]);
Second approach look like this (Include find and populate):
const data= await MainSchema.find({
sensorId: _sensorId,
date: {
$gte: new Date(startDate),
$lte: new Date(endDate),
},
})
.lean()
.sort({ date: 1 })
.populate('sensorId', { name: 1, description: 1});
Execution time for same data set:
First approach: 25 - 30 second
Second approach: 11 - 15 second
So how can i get this data more faster. Which one is best practise?
And how can i do extras for improve the query speed?

Overall #NeNaD's answer touches on a lot of the important points. What I'm going to say in this one should be considered in addition to that other information.
Index
Just to clarify, the ideal index here would be a compound index of { sensorId: 1, date: 1 }. This index follows the ESR Guidance for index key ordering and will provide the most efficient retrieval of the data according to the query predicates specified in the $match stage.
If the index: true annotation in Mongoose creates two separate single field indexes, then you should go manually create this index in the collection. MongoDB will only use one of those indexes to execute this query which will not be as efficient as using the compound index described above.
Also regarding the existing approach, what is the purpose of the trailing $sort?
If the application (a chart in this situation) does not need sorted results then you should remove this stage entirely. If the client does need sorted results then you should:
Move the $sort stage earlier in the pipeline (behind the $match), and
Test if including the sort field in the index improves performance.
As written, the $sort is currently a blocking operation which is going to prevent any results from being returned to the client until they are all processed. If you move the $sort stage up and can change it to sort on date (which probably makes sense for sensor data) the it should automatically use the compound index that we mentioned earlier to provide the sort in a non-blocking manner.
Stage Ordering
Ordering of aggregation stages is important, both for semantic purposes as well as for performance reasons. The database itself will attempt to do various things (such as reordering stages) to improve performance so long as it does not logically change the result set in any way. Some of these optimizations are described here. As these are version specific anyway, you can always take a look at the explain plan to get a better indication of what specific changes the database has applied. The fact that performance did not improve when you manually moved the $match to the beginning (which is generally a best practice) could suggest that the database was able to automatically do that on your behalf.
Schema
I'm a little curious about the schema itself. Is there any reason that there are two separate collections here?
My guess is that this is mostly a play at 'normalization' to help reduce data duplication. That is mostly fine, unless you find yourself constantly performing $lookups like this for most of your read operations. You could certainly consider testing what performance (and storage) looks like if you combine them.
Also for this particular operation, would it make sense to just issue two separate queries, one to get the measurements and one to get the sensor data (a single time)? The aggregation matches on sensorId and the value of that field is what is then used to match against the _id field from the other collection. Unless I'm doing the logic wrong, this should be the same data for each of the source documents.
Time Series Collections
Somewhat related to schema, have you looked into using Time Series Collections? I don't know what your specific goals or pain points are, but it seems that you may be working with IoT data. Time Series collections are purpose-built to help handle use cases like that. Might be worth looking into as they may help you achieve your goals with less hassle or overhead.

Frist step
Create index for sensorId and date properties in the collection. You can do it by specifying index: true in your model:
const MainSchema: Schema = new Schema(
{
deviceId: { type: mongoose.Types.ObjectId, required: true, ref: 'Device' },
sensorId: { type: mongoose.Types.ObjectId, default: null, ref: 'Sensor', index: true },
value: { type: Number },
date: { type: Date, index: true },
},
{
versionKey: false,
}
);
Second step
Aggregation queries can take leverage of indexes only if your $match stage is the first stage in the pipeline, so you should change the order of the items in your aggregation query:
const data= await MainSchema.aggregate([
{
$match: {
{ sensorId: new Types.ObjectId(_sensorId) },
{
date: {
$gte: new Date(startDate),
$lt: new Date(endDate),
},
},
},
},
{
$lookup: {
from: 'Sensor',
localField: 'sensorId',
foreignField: '_id',
as: 'sensorDetail',
},
},
{
$unwind: '$sensorDetail',
},
{
$project: {
sensorDetail: {
name: 1,
description: 1,
},
value: 1,
date: 1,
},
},
{
$sort: {
_id: 1,
},
},
]);

Related

How do I add a new Subdocument if it's not exist already, then push another subdocument under it

I have a Schema named user on mongoose, and that schema has lastExams property as below:
lastExams: [{
lecture: {
type: String,
required: true
},
exams: [{
examID: {type: [mongoose.Schema.Types.ObjectId], ref: Exam, required: true},
resultID: {type: [mongoose.Schema.Types.ObjectId], ref: Result, required: true},
date: {type: Date, required: true},
result: {}
}]}]
With this, I want to keep the last 10 exams user have taken for each lecture they have. So after each exam, I want to check if the corresponding 'lastExams.lecture' subdocument already exists, if so, push the result that lastExams.$.exams array. Else, upsert that subdocument with first element of the exams array.
for example, thats an user document without any exams on it;
user: {
name: { firstName: '***', lastName: '***' },
email: '****#****.***',
photo: 'https://****.jpg',
role: 0,
status: true,
program: {
_id: 6017b829c878b5bf117dfb92,
dID: '***',
eID: '****',
pID: '****',
programName: '****',
__v: 0
},
lectures: [
{
some data
}
],
currentExams: [
some data
],
lastExams: []
}}
If user sends an exam data for math-1 lecture, since there is no exam with that lecture name, I need to upsert that document to get user document to become as below;
user: {
name: {
firstName: '***',
lastName: '***'
},
email: '****#****.***',
photo: 'https://****.jpg',
role: 0,
status: true,
program: {
_id: 6017 b829c878b5bf117dfb92,
dID: '***',
eID: '****',
pID: '****',
programName: '****',
__v: 0
},
lectures: [{
some data
}],
currentExams: [
some data
],
lastExams: [{
lecture: 'math-1',
exams: [
examID: 601 ba71e62c3d45a4f10f080,
resultID: '602c09b2148214693694b16c',
date: 2021 - 02 - 16 T18: 06: 42.559 Z,
result: {
corrects: 11,
wrongs: 9,
empties: 0,
net: 8.75,
score: 43.75,
qLite: [
'some question objects'
]
}
]
}]
}
}
I can do that like this;
User.findOneAndUpdate({email: result.user}, {$addToSet: {'lastExams': {
lecture: result.lecture,
exams: [{
examID: doc.examID, // btw, idk why, these id's saving to database as arrays
resultID: doc.id,
date: doc.createdAt,
result: doc.results
}]
}}})
But since this adds new subdoc with same lecture value each time. I am having to check if there is a subdoc with that lecture value first manually. if not so, run the above code, else, to push just exam data to that lectures subdoc, I am using below code;
User.findOneAndUpdate({email: result.user, 'lastExams.lecture': result.lecture }, {$addToSet: {'lastExams.$.exams': {
examID: doc.examID,
resultID: doc.id,
date: doc.createdAt,
result: doc.results
}}})
So, I am having to make User.find() query first to see if that lecture is already there, and pop an item if it's exams.lengt is 10. then deciding to what kind of User.findOneAndUpdate() to use.
Do you think there is any way to make this proccess in a single query? Without going to database 2-3 times for each exam save?
I know it's too long, but i couldn't put it straight with my poor english. Sorry.

Two methods:
Like you already did, multiple queries.
In the first query, you have to check if the subdocument exists, if it does, update it in the second query, else create subdocument with first item in the second query.
Using $out in an aggregation pipeline
If multiple round trips is only the issue, checkout $out aggregation pipeline method, which allows to write aggregation output to a collection. In there, you can first match you document, check if the subdocument exists, using $filter followed by $cond. Once you have the data ready, use $out to write it back to the collection.
NB: Aggregation pipeline is expensive operation than findOneAndUpdate IMO, so make sure you test the average latency for both 2 query method and single aggregation method and decide which is faster in your case.
PS: Sorry for not providing an example, I simply don't know it very well to create a working example for you. You can refer to the mongoDB docs for details.
https://docs.mongodb.com/manual/reference/operator/aggregation/out/
Also, there is a Jira ticket discussion going on for this specific use case in MongoDB. Hoping some simple solution will be implemented in MongoDB in the upcoming versions

Mongoose: Sorting

what's the best way to sort the following documents in a collection:
{"topic":"11.Topic","text":"a.Text"}
{"topic":"2.Topic","text":"a.Text"}
{"topic":"1.Topic","text":"a.Text"}
I am using the following
find.(topic:req.body.topic).(sort({topic:1}))
but is not working (because the fields are strings and not numbers so I get):
{"topic":"1.Topic","text":"a.Text"},
{"topic":"11.Topic","text":"a.Text"},
{"topic":"2.Topic","text":"a.Text"}
but i'd like to get:
{"topic":"1.Topic","text":"a.Text"},
{"topic":"2.Topic","text":"a.Text"},
{"topic":"11.Topic","text":"a.Text"}
I read another post here that this will require complex sorting which mongoose doesn't have. So perhaps there is no real solution with this architecture?
Your help is greatly appreciated

i will suggest you make your topic filed as type : Number, and create another field topic_text.
Your Schema would look like:
var documentSchema = new mongoose.Schema({
topic : Number,
topic_text : String,
text : String
});
Normal document would look something like this:
{document1:[{"topic":11,"topic_text" : "Topic" ,"text":"a.Text"},
{"topic":2,"topic_text" : "Topic","text":"a.Text"},
{"topic":1,"topic_text" : "Topic","text":"a.Text"}]}
Thus, you will be able to use .sort({topic : 1}) ,and get the result you want.
while using topic value, append topic_text to it.
find(topic:req.body.topic).sort({topic:1}).exec(function(err,result)
{
var topic = result[0].topic + result[0].topic_text;//use index i to extract the value from result array.
})

If you do not want (or maybe do not even can) change the shape of your documents to include a numeric field for the topic number then you can achieve your desired sorting with the aggregation framework.
The following pipeline essentially splits the topic strings like '11.Topic' by the dot '.' and then prefixes the first part of the resulting array with a fixed number of leading zeros so that sorting by those strings will result in 'emulated' numeric sorting.
Note however that this pipeline uses $split and $strLenBytes operators which are pretty new so you may have to update your mongoDB instance - I used version 3.3.10.
db.getCollection('yourCollection').aggregate([
{
$project: {
topic: 1,
text: 1,
tmp: {
$let: {
vars: {
numStr: { $arrayElemAt: [{ $split: ["$topic", "."] }, 0] }
},
in: {
topicNumStr: "$$numStr",
topicNumStrLen: { $strLenBytes: "$$numStr" }
}
}
}
}
},
{
$project: {
topic: 1,
text: 1,
topicNumber: { $substr: [{ $concat: ["_0000", "$tmp.topicNumStr"] }, "$tmp.topicNumStrLen", 5] },
}
},
{
$sort: { topicNumber: 1 }
},
{
$project: {
topic: 1,
text: 1
}
}
])

MongoDB aggregate merge two different fields as one and get count

I have following data in MongoDB:
[{id:3132, home:'NSH', away:'BOS'}, {id:3112, home:'ANA', away:'CGY'}, {id:3232, home:'MIN', away:'NSH'}]
Is it possible to get total game count for each team with aggregate pipeline?
desired result:
[{team: 'NSH', totalGames: 2}, {team:'MIN', totalGames: 1}, ...}]
i can get each on seperately to their own arrays with two aggregate calls:
[{$group: {_id: "$home", gamesLeft: {$sum: 1}}}]
and
[{$group: {_id: "$away", gamesLeft: {$sum: 1}}}]
resulting
var homeGames = [ { _id: 'NSH', totalGames: 1 }, { _id: 'SJS', totalGames: 2 }, ...]
var awayGames = [ { _id: 'NSH', totalGames: 1 }, { _id: 'SJS', totalGames: 4 }, ...]
But i really want to get it working with just one query. If not possible what would be the best way to combine these two results in to one using javascript?

After some puzzling, I found a way to get it done using an aggregate pipeline. Here is the result:
db.games.aggregate([{
$project: {
isHome: { $literal: [true, false] },
home: true,
away: true
}
}, {
$unwind: '$isHome'
}, {
$group: {
_id: { $cond: { if: '$isHome', then: '$home', else: '$away' } },
totalGames: { $sum: 1 }
}
}
]);
As you can see it consists of three stages. The first two are meant to duplicate each document into one for the home team and one for the away team. To do this, the project stage first creates a new isHome field on each document containing a true and a false value, which the unwind stage then splits into separate documents containing either the true or the false value.
Then in the group phase, we let the isHome field decide whether to group on the home or the away field.
It would be nicer if we could create a team field in the project step, containing the array [$home, $away], but mongo only supports adding array literals here, hence the workaround.

MongoDB queries optimisation

I wish to retrieve several information from my User model that looks like this:
var userSchema = new mongoose.Schema({
email: { type: String, unique: true, lowercase: true },
password: String,
created_at: Date,
updated_at: Date,
genre : { type: String, enum: ['Teacher', 'Student', 'Guest'] },
role : { type: String, enum: ['user', 'admin'], default: 'user' },
active : { type: Boolean, default: false },
profile: {
name : { type: String, default: '' },
headline : { type: String, default: '' },
description : { type: String, default: '' },
gender : { type: String, default: '' },
ethnicity : { type: String, default: '' },
age : { type: String, default: '' }
},
contacts : {
email : { type: String, default: '' },
phone : { type: String, default: '' },
website : { type: String, default: '' }
},
location : {
formattedAddress : { type: String, default: '' },
country : { type: String, default: '' },
countryCode : { type: String, default: '' },
state : { type: String, default: '' },
city : { type: String, default: '' },
postcode : { type: String, default: '' },
lat : { type: String, default: '' },
lng : { type: String, default: '' }
}
});
In Homepage I have a filter for location where you can browse Users from Country or City.
All the fields contains also the number of users in there:
United Kingdom
All Cities (300)
London (150)
Liverpool (80)
Manchester (70)
France
All Cities (50)
Paris (30)
Lille (20)
Nederland
All Cities (10)
Amsterdam (10)
Etc...
This in the Homepage, then I have also the Students and Teachers pages where I wish to have information only about how many teachers there are in those Countries and Cities...
What I'm trying to do is to create a query to MongoDB to retrieve all these information with a single query.
At the moment the query looks like this:
User.aggregate([
{
$group: {
_id: { city: '$location.city', country: '$location.country', genre: '$genre' },
count: { $sum: 1 }
}
},
{
$group: {
_id: '$_id.country',
count: { $sum: '$count' },
cities: {
$push: {
city: '$_id.city',
count: '$count'
}
},
genres: {
$push: {
genre: '$_id.genre',
count: '$count'
}
}
}
}
], function(err, results) {
if (err) return next();
res.json({
res: results
});
});
The problem is that I don't know how to get all the information I need.
I don't know how to get the length of the total users in every Country.
I have the users length for each Country.
I have the users length for each city.
I don't know how to get the same but for specific genre.
Is it possible to have all these information with a single query in Mongo?
Otherwise:
Creating few promises with 2, 3 different requests to Mongo like this:
getSomething
.then(getSomethingElse)
.then(getSomethingElseAgain)
.done
I'm sure it would be easier storing every time specified data but: is it good for performance when there are more than 5000 / 10000 users in the DB?
Sorry but I'm still in the process of learning and I think these things are crucial to understand MongoDB performance / optimisation.
Thanks

What you want is a "faceted search" result where you hold the statistics about the matched terms in the current result set. Subsequently, while there are products that "appear" to do all the work in a single response, you have to consider that most generic storage engines are going to need multiple operations.
With MongoDB you can use two queries to get the results themselves and another to get the facet information. This would give similar results to the faceted results available from dedicated search engine products like Solr or ElasticSearch.
But in order to do this effectively, you want to include this in your document in a way it can be used effectively. A very effective form for what you want is using an array of tokenized data:
{
"otherData": "something",
"facets": [
"country:UK",
"city:London-UK",
"genre:Student"
]
}
So "factets" is a single field in your document and not in multiple locations. This makes it very easy to index and query. Then you can effectively aggregate across your results and get the totals for each facet:
User.aggregate(
[
{ "$unwind": "$facets" },
{ "$group": {
"_id": "$facets",
"count": { "$sum": 1 }
}}
],
function(err,results) {
}
);
Or more ideally with some criteria in $match:
User.aggregate(
[
{ "$match": { "facets": { "$in": ["genre:student"] } } },
{ "$unwind": "$facets" },
{ "$group": {
"_id": "$facets",
"count": { "$sum": 1 }
}}
],
function(err,results) {
}
);
Ultimately giving a response like:
{ "_id": "country:FR", "count": 50 },
{ "_id": "country:UK", "count": 300 },
{ "_id": "city:London-UK", "count": 150 },
{ "_id": "genre:Student": "count": 500 }
Such a structure is easy to traverse and inspect for things like the discrete "country" and the "city" that belongs to a "country" as that data is just separated consistently by a hyphen "-".
Trying to mash up documents within arrays is a bad idea. There is a BSON size limit of 16MB to be respected also, from which mashing together results ( especially if you are trying to keep document content ) is most certainly going to end up being exceeded in the response.
For something as simple as then getting the "overall count" of results from such a query, then just sum up the elements of a particular facet type. Or just issue your same query arguments to a .count() operation:
User.count({ "facets": { "$in": ["genre:Student"] } },function(err,count) {
});
As said here, particularly when implementing "paging" of results, then the roles of getting "Result Count", "Facet Counts" and the actual "Page of Results" are all delegated to "separate" queries to the server.
There is nothing wrong with submitting each of those queries to the server in parallel and then combining a structure to feed to your template or application looking much like the faceted search result from one of the search engine products that offers this kind of response.
Concluding
So put something in your document to mark the facets in a single place. An array of tokenized strings works well for this purpose. It also works well with query forms such as $in and $all for either "or" or "and" conditions on facet selection combinations.
Don't try and mash results or nest additions just to match some perceived hierarchical structure, but rather traverse the results received and use simple patterns in the tokens. It's very simple to
Run paged queries for the content as separate queries to either facets or overall counts. Trying to push all content in arrays and then limit out just to get counts does not make sense. The same would apply to a RDBMS solution to do the same thing, where paging result counts and the current page are separate query operations.
There is more information written on the MongoDB Blog about Faceted Search with MongoDB that also explains some other options. There are also articles on integration with external search solutions using mongoconnector or other approaches.

query Marionette.js collections

I would like to perform fairly complex filtering on Marionette Collections.
Is there way to search for models with a DB like querys like the MongoDB API?
Example:
MarionetteCollection.find(
{
type: 'product',
$or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ],
$and [ { active: true} ],
$sortby{'name'},
$order {'asc'}
});
Maybe an extension to Marionette.js?

There is nothing in Marionette to help you here and Marionette doesn't make any changes/additions to the regular Backbone.Collection.
You could take a look at backbone-query. It appears to do what you are wanting.

Backbone has a simple implementation of what you are asking. Collection.where() && Collection.findWhere() can take an object and will find the model based on your object. But it doesn't more complex matchings like, greater than, less than, etc.
MarionetteCollection.find(
{
type: 'product',
qty: 55,
active: true
});

Develop Reference

JavaScript is the programming language of the Web.