I've written a MongoDB aggregation query that uses a number of stages. At the end, I'd like the query to return my data in the following format:
{
data: // Array of the matching documents here
count: // The total count of all the documents, including those that are skipped and limited.
}
I'm going to use the skip and limit features to eventually pare down the results. However, I'd like to know the count of the number of documents returned before I skip and limit them. Presumably, the pipeline stage would have to occur somewhere after the $match stage but before the $skip and $limit stages.
Here's the query I've currently written (it's in an express.js route, which is why I'm using so many variables:
const {
minDate,
maxDate,
filter, // Text to search
filterTarget, // Row to search for text
sortBy, // Row to sort by
sortOrder, // 1 or -1
skip, // rowsPerPage * pageNumber
rowsPerPage, // Limit value
} = req.query;
db[source].aggregate([
{
$match: {
date: {
$gt: minDate, // Filter out by time frame...
$lt: maxDate
}
}
},
{
$match: {
[filterTarget]: searchTerm // Match search query....
}
},
{
$sort: {
[sortBy]: sortOrder // Sort by date...
}
},
{
$skip: skip // Skip the first X number of doucuments...
},
{
$limit: rowsPerPage
},
]);
Thanks for your help!
We can use facet to run parallel pipelines on the data and then merge the output of each pipeline.
The following is the updated query:
db[source].aggregate([
{
$match: {
date: {
$gt: minDate, // Filter out by time frame...
$lt: maxDate
}
}
},
{
$match: {
[filterTarget]: searchTerm // Match search query....
}
},
{
$set: {
[filterTarget]: { $toLower: `$${filterTarget}` } // Necessary to ensure that sort works properly...
}
},
{
$sort: {
[sortBy]: sortOrder // Sort by date...
}
},
{
$facet:{
"data":[
{
$skip: skip
},
{
$limit:rowsPerPage
}
],
"info":[
{
$count:"count"
}
]
}
},
{
$project:{
"_id":0,
"data":1,
"count":{
$let:{
"vars":{
"elem":{
$arrayElemAt:["$info",0]
}
},
"in":{
$trunc:"$$elem.count"
}
}
}
}
}
]).pretty()
I think I figured it out. But if someone knows that this answer is slow, or at least faulty in some way, please let me know!
It's to add a $group stage, passing null as the value, then pushing each document, $$ROOT, into the data array, and for each one, incrementing count by 1 with the $sum operator.
Then, in the next $project stage, I simply remove the _id property, and slice down the array.
db[source].aggregate([
{
$match: {
date: {
$gt: minDate, // Filter out by time frame...
$lt: maxDate
}
}
},
{
$match: {
[filterTarget]: searchTerm // Match search query....
}
},
{
$set: {
[filterTarget]: { $toLower: `$${filterTarget}` } // Necessary to ensure that sort works properly...
}
},
{
$sort: {
[sortBy]: sortOrder // Sort by date...
}
},
{
$group: {
_id: null,
data: { $push: "$$ROOT" }, // Push each document into the data array.
count: { $sum: 1 }
}
},
{
$project: {
_id: 0,
count: 1,
data: {
$slice: ["$data", skip, rowsPerPage]
},
}
}
]).pretty()
Related
I'm trying to find Users who logged in the last day.
The userActivity field is an object on User. The userActivity object contains a field called hourly which is an array of dates. I want the find any users who's hourly array contains a date greater than a day ago using aggregation.
User schema
{
userName:"Bob",
userActivity:
{"hourly":
[
"2022-05-09T02:31:12.062Z", // the user logged in
"2022-05-09T19:37:42.870Z" // saved as date object in the db
]
}
}
query that didn't work
const oneDayAgo = new Date();
oneDayAgo.setDate(oneDayAgo.getDate() - 1);
const usersActiveToday = await User.aggregate([
{
$match: {
$expr: { $gt: [oneDayAgo, '$userActivity'] },
},
},
]);
If today is September 13, 11pm, I'd expect the results to the above to show users who had activity between the 12th and 13th.
Instead I am getting all users returned.
If you want to use an aggregation pipeline, then one option is to use $max to find if there are items that are greater than oneDayAgo:
db.collection.aggregate([
{$match: {$expr: {$gt: [{$max: "$userActivity.hourly"}, oneDayAgo]}}}
])
See how it works on the playground example - aggregation
But, you can also do it simply by using find:
db.collection.find({
"userActivity.hourly": {$gte: oneDayAgo}
})
See how it works on the playground example - find
This can be considered as a 3 step process
Find the max value in hourly array and store it in some key(here max)
Check if the max value is greater than or equal to oneDayAgo timestamp
Unset the key that stored max value
Working Code Snippet:
const oneDayAgo = new Date();
oneDayAgo.setDate(oneDayAgo.getDate() - 1);
const usersActiveToday = await User.aggregate([
{
$set: {
max: {
$max: {
$map: {
input: "$userActivity.hourly",
in: {
$max: "$$this",
},
},
},
},
},
},
{
$match: {
max: {
$gte: oneDayAgo.toISOString(),
},
},
},
{
$unset: "max",
},
]);
Here's code in action: Mongo Playground
You can try something along these lines:
db.collection.aggregate([
{
"$addFields": {
"loggedInADayBefore": {
"$anyElementTrue": [
{
"$map": {
"input": "$userActivity.hourly",
"as": "time",
"in": {
"$gte": [
"$$time",
ISODate("2022-05-13T00:00:00.000Z")
]
}
}
}
]
}
}
},
{
"$match": {
loggedInADayBefore: true
}
},
{
"$project": {
loggedInADayBefore: 0
}
}
])
Here, we $anyElementTrue, to find if any element in the array hourly, is greater than the day before, and store it as a boolean value in a new field. Then we filter the docs on the basis of that field using $match.
Here's the playground link.
I'm having some trouble with this aggregate function. It works correctly when I only have a single match argument (created_at), however when I add a second one (release_date) it never returns any results, even though it should. I've also tried the matches with the '$and' parameter with no luck.
Here is the code. Anyone know what I'm doing wrong?
Thanks!
db.collection('votes).aggregate([
{
$match: {
$and:
[
{ created_at: { $gte: ISODate("2021-01-28T05:37:58.549Z") }},
{ release_date: { $gte: ISODate("2018-01-28T05:37:58.549Z") }}
]
}
},
{
$group: {
_id: '$title',
countA: { $sum: 1 }
}
},
{
$sort: { countA: -1 }
}
])
I have a large MongoDB dataset of around 34gb and I am using Fastify and Mongoose for the API. I want to retrieve all list of unique userUuid from the date range. I tried the distinct method from Mongoose:
These are my filters:
let filters = {
applicationUuid: opts.applicationUuid,
impressions: {
$gte: opts.impressions
},
date: {
$gte: moment(opts.startDate).tz('America/Chicago').format(),
$lt: moment(opts.endDate).tz('America/Chicago').format()
}
}
This is my distinct Mongoose function:
return await Model.distinct("userUuid", filters)
This method will return an array with unique userUuid based from the filters.
This works fine for small dataset, but it has a memory cap of 16MB when it comes to huge dataset.
Therefore, I tried the aggregate method to achieve similar results, having read that it is better optimized. Nevertheless, the same filters object above does not work inside the match pipeline because aggregate does not accept string date that comes as the result of moment; but only JavaScript Date is accepted. However, JavaScript date dissregards all the timezones since it is unix based.
This is my aggregate function to get distinct values based on filters.
return await Model.aggregate(
[
{
$match: filters
},
{
$group: {
_id: {userUuid: "$userUuid" }
}
}
]
).allowDiskUse(true);
As I said, $match does not work with moment, but only with new Date(opts.startDate), however, JavaScript's new Date disregards moment's timezone. Nor it has a proper native timezone. Any thought on how to achieve this array of unique ids based on filters with Mongoose?
This is the solution I came up with and it works pretty well regarding the performance. Use this solution for large dataset:
let filters = {
applicationUuid: opts.applicationUuid,
impressions: { $gte: opts.impressions },
$expr: {
$and: [
{
$gte: [
'$date',
{
$dateFromString: {
dateString: opts.startDate,
timezone: 'America/Chicago',
},
},
],
},
{
$lt: [
'$date',
{
$dateFromString: {
dateString: opts.endDate,
timezone: 'America/Chicago',
},
},
],
},
],
},
}
return Model.aggregate([
{ $match: filters },
{
$group: {
_id: '$userUuid',
},
},
{
$project: {
_id: 0,
userUuid: '$_id',
},
},
])
.allowDiskUse(true)
Which will return a list of unique ids i.e.
[
{ userUuid: "someId" },
{ userUuid: "someId" }
]
Use the following method on small dataset which is more convenient:
let filters = {
applicationUuid: opts.applicationUuid,
impressions: {
$gte: opts.impressions
},
date: {
$gte: opts.startDate,
$lte: opts.endDate
}
}
return Model.distinct("userUuid", filters)
Which will return the following result:
[ "someId", "someOtherId" ]
In MongoDB shell version v4.4.6
the following code works perfectly.
db['pri-msgs'].findOne({tag:'aaa&%qqq'},{msgs:{$slice:-2}})
But in nodeJs mongoDB the following code doesn't work.
db.collection('pri-msgs').findOne({
tag: 'aaa&%qqq'
}, {
msgs: {
slice: -2
}
})
My document-->
{"_id":{"$oid":"60c4730fadf6891850db90f9"},"tag":"aaa&%qqq","msgs":[{"msg":"abc","sender":0,"mID":"ctYAR5FDa","time":1},{"msg":"bcd","sender":0,"mID":"gCjgPf85z","time":2},{"msg":"def","sender":0,"mID":"lAhc4yLr6","time":3},{"msg":"efg","sender":0,"mID":"XcBLC2rGf","time":4,"edited":true},{"msg":"fgh","sender":0,"mID":"9RWVcEOlD","time":5},{"msg":"hij","sender":0,"mID":"TJXVTuWrR","time":6},{"msg":"jkl","sender":0,"mID":"HxUuzwrYN","time":7},{"msg":"klm","sender":0,"mID":"jXEOhARC2","time":8},{"msg":"mno","sender":0,"mID":"B8sVt4kCy","time":9}]}
Actually what I'm trying to do is Get last 2 itmes from msgs Array where time is greater than 'n'. Here 'n' is a number.
You can use aggregation-pipeline to get the results you are looking for. The steps are the following.
Match the documents you want by tag.
Unwind the msgs array.
Sort descending by msgs.time.
Limit first 2 elements.
Match the time you are looking for using a range query.
Group the documents back by _id.
Your query should look something like this:
db['pri-msgs'].aggregate([
{ $match: { tag: 'aaa&%qqq' } },
{ $unwind: '$msgs' },
{
$sort: {
'msgs.time': -1 //DESC
}
},
{ $limit: 2 },
{
$match: {
'msgs.time': {
$gt: 2 //n
}
}
},
{
$group: {
_id: '$_id',
tag: { $first: '$tag' },
msgs: {
$push: { msg: '$msgs.msg', sender: '$msgs.sender', mID: '$msgs.mID', time: '$msgs.time' }
}
}
}
]);
How can I retrieve data with a custom sort in Mongoose?
There is a job starting date that needs to be sorted by the month and year, but currently this script is only sorting from December to January.
router.get('/', (req, res) => {
Job.find()
.sort({ from: -1 })
.then(jobs => res.json(jobs))
.catch(err => res.status(404).json(err));
});
The problem is in the sort; values for from is like 12.2018, 06.2019, 03.2020, 11.2009 and so on.
I want to sort these results first from the year (which is after the dot) and then sort from the months. I cannot currently change how the data is set and it's stored as a String in the model Schema.
You have to use aggregation framework to first transform your string to a valid date by
$spliting it,
$convert parts from string to int
and using $dateFromParts,
then you sort and finally remove created field.
Here's the query :
db.collection.aggregate([
{
$addFields: {
date: {
$dateFromParts: {
year: {
$convert: {
input: {
$arrayElemAt: [
{
$split: [
"$from",
"."
]
},
1
]
},
to: "int"
}
},
month: {
$convert: {
input: {
$arrayElemAt: [
{
$split: [
"$from",
"."
]
},
0
]
},
to: "int"
}
},
}
}
}
},
{
$sort: {
date: -1
}
},
{
$project: {
date: 0
}
}
])
You can test it here