Lookup using an ID array with pipeline

Lookup using an ID array with pipeline - javascript

I was trying to write a lookup function that takes an array with object ids and timestamps of object y. This worked flawlessly with localflied and foreignfield but I cannot reproduce the same result using pipeline.
(Names like y are made up to keep it general)
Working version:
$lookup: {
from: 'y',
localField: 'ys.object_id',
foreignField: '_id',
as: 'docs',
},
ys is an array of objects structured like this:
{
object_id: ObjectID(),
timestamp: Date(),
}
I would like to rewrite this expression to use pipeline because I already want to filter some of the objects looked up out using their timestamp attribute.
What I have tried:
$lookup: {
from: 'y',
let: { ys: '$ys' },
pipeline: [
{
$match: { $expr: { $eq: ['$_id', '$$ys.object_id'] } },
},
],
as: 'docs',
},
Database size: 20.4GB
Full Query:
const query = [
{
$match: { 'ys.timestamp': { $lte: date, $gt: previousMonth } }, // I have shorten this part a little (It's not the same but the logic was flawed anyway)
},
{
$limit: 100,
},
{
$lookup: {
from: 'y',
let: { ys: '$ys' },
pipeline: [
{
$match: { $expr: { $in: ['$_id', '$$ys.object_id'] } },
},
{
$sort: { timestamp: -1 },
},
{
$limit: 1,
},
],
as: 'doc',
},
},
];
The above solution doesn't work it seems to get stuck and never actually return anything. (Times out after some time)
Is there a proper way of rewriting the working solution to a pipeline solution?
IMPORTANT:
I have changed the query to look for one specific element by ID and then perform the lookup. This action did work but took about 20 seconds. I am pretty certain this is why my query times out when I run it with my usual query. Can anyone explain why there is a performance difference between the 2 approaches and if I can somehow bypass that?

Very close - use $in instead of $eq:
$lookup: {
from: 'y',
let: { ys: '$ys' },
pipeline: [
{
$match: { $expr: { $in: ['$_id', '$$ys.object_id'] } },
},
],
as: 'docs',
},
If you use $eq you're looking for a value that is equal to that array. Using $in means you're looking for a value that is contained within that array (like includes).

Related

MongoDB Aggregation - match documents with array of objects, by another array of objects filter

I have documents that consist of an array of objects, and each object in this array consists of another array of objects.
For simplicity, irrelevant fields of the documents were omitted.
It looks like this (2 documents):
{
title: 'abc',
parts: [
{
part: "verse",
progressions: [
{
progression: "62a4a87da7fdbdabf787e47f",
key: "Ab",
_id: "62b5aaa0c9e9fe8a7d7240d3"
},
{
progression: "62adf477ed11cbbe156d5769",
key: "C",
_id: "62b5aaa0c9e9fe8a7d7240d3"
},
],
_id: "62b5aaa0c9e9fe8a7d7240d2"
},
{
part: "chorus",
progressions: [
{
progression: "62a4a51b4693c43dce9be09c",
key: "E",
_id: "62b5aaa0c9e9fe8a7d7240d9"
}
],
_id: "62b5aaa0c9e9fe8a7d7240d8"
}
],
}
{
title: 'def',
parts: [
{
part: "verse",
progressions: [
{
progression: "33a4a87da7fopvvbf787erwe",
key: "E",
_id: "62b5aaa0c9e9fe8a7d7240d3"
},
{
progression: "98opf477ewfscbbe156d5442",
key: "Bb",
_id: "62b5aaa0c9e9fe8a7d7240d3"
},
],
_id: "12r3aaa0c4r5me8a7d72oi8u"
},
{
part: "bridge",
progressions: [
{
progression: "62a4a51b4693c43dce9be09c",
key: "C#",
_id: "62b5aaa0c9e9fe8a7d7240d9"
}
],
_id: "62b5aaa0rwfvse8a7d7240d8"
}
],
}
The parameters that the client sends with a request are an array of objects:
[
{ part: 'verse', progressions: ['62a4a87da7fdbdabf787e47f', '62a4a51b4693c43dce9be09c'] },
{ part: 'chorus', progressions: ['62adf477ed11cbbe156d5769'] }
]
I want to retrieve, through mongodb aggregation, the documents that at least one of objects in the input array above is matching them:
In this example, documents that have in their parts array field, an object that has the value 'verse' in the part property and one of the progressions id's ['62a4a87da7fdbdabf787e47f', '62a4a51b4693c43dce9be09c'] in the progression property in one of the objects in the progressions property, or documents that have in their parts array field, an object that has the value 'chorus' in the part property and one of the progressions id's ['62adf477ed11cbbe156d5769'] in the progression property in one of the objects in the progressions property.
In this example, the matching document is the first one (with the title 'abc'), but in actual use, there might be many matching documents.
I tried to create an aggregation pipeline myself (using the mongoose 'aggregate' method):
// parsedProgressions = [
// { part: 'verse', progressions: ['62a4a87da7fdbdabf787e47f', '62a4a51b4693c43dce9be09c'] },
// { part: 'chorus', progressions: ['62adf477ed11cbbe156d5769'] }
// ]
songs.aggregate([
{
$addFields: {
"tempMapResults": {
$map: {
input: parsedProgressions,
as: "parsedProgression",
in: {
$cond: {
if: { parts: { $elemMatch: { part: "$$parsedProgression.part", "progressions.progression": mongoose.Types.ObjectId("$$parsedProgression.progression") } } },
then: true, else: false
}
}
}
}
}
},
{
$addFields: {
"isMatched": { $anyElementTrue: ["$tempMapResults"] }
}
},
{ $match: { isMatched: true } },
{ $project: { title: 1, "parts.part": 1, "parts.progressions.progression": 1 } }
]);
But it didn't work - as I understand it, because the $elemMatch can be used only in the $match stage.
Anyway, I guess I overcomplicated the aggregation pipeline, so I will be glad if you can fix my aggregation pipeline/offer a better working one.

This is not a simple case as these are both nested arrays and we need to match both the part and the progressions, which are not on the same level
One option looks complicated a bit, but keeps your data small:
In order to make things easier, $set a new array field called matchCond which includes an array called progs containing the parts.progressions. To each sub-object inside it insert the matching progressions input array. We do need to be careful here and handle the case where there is no matching progressions input arrayprogressions input array, as this is the case for the "bridge" part on the second document.
Now we just need to check if for any of these progs items, the progression field is matching one option in input array. This is done using $filter, and $rediceing the number of results.
Just match document which have results and format the answer
db.collection.aggregate([
{
$set: {
matchCond: {
$map: {
input: "$parts",
as: "parts",
in: {progs: {
$map: {
input: "$$parts.progressions",
in: {$mergeObjects: [
"$$this",
{input: {progressions: []}},
{input: {$first: {
$filter: {
input: inputData,
as: "inputPart",
cond: {$eq: ["$$inputPart.part", "$$parts.part"]}
}
}}}
]}
}
}}
}
}
}
},
{$set: {
matchCond: {
$reduce: {
input: "$matchCond",
initialValue: 0,
in: {$add: [
"$$value",
{$size: {
$filter: {
input: "$$this.progs",
as: "part",
cond: {$in: ["$$part.progression", "$$part.input.progressions"]}
}
}
}
]
}
}
}
}
},
{$match: {matchCond: {$gt: 0}}},
{$project: {title: 1, parts: 1}}
])
See how it works on the playground example
Another option is to use $unwind, which looks simple, but will duplicate your data, thus, likely to be slower:
db.collection.aggregate([
{$addFields: {inputData: inputData, cond: "$parts"}},
{$unwind: "$cond"},
{$unwind: "$cond.progressions"},
{$unwind: "$inputData"},
{$match: {
$expr: {
$and: [
{$eq: ["$cond.part", "$inputData.part"]},
{$in: ["$cond.progressions.progression", "$inputData.progressions"]}
]
}
}
},
{$project: {title: 1, parts: 1}}
])
See how it works on the playground example - unwind
There are several options between these two...

How to properly get distinct values with Mongoose in large dataset with date filters with timezone?

I have a large MongoDB dataset of around 34gb and I am using Fastify and Mongoose for the API. I want to retrieve all list of unique userUuid from the date range. I tried the distinct method from Mongoose:
These are my filters:
let filters = {
applicationUuid: opts.applicationUuid,
impressions: {
$gte: opts.impressions
},
date: {
$gte: moment(opts.startDate).tz('America/Chicago').format(),
$lt: moment(opts.endDate).tz('America/Chicago').format()
}
}
This is my distinct Mongoose function:
return await Model.distinct("userUuid", filters)
This method will return an array with unique userUuid based from the filters.
This works fine for small dataset, but it has a memory cap of 16MB when it comes to huge dataset.
Therefore, I tried the aggregate method to achieve similar results, having read that it is better optimized. Nevertheless, the same filters object above does not work inside the match pipeline because aggregate does not accept string date that comes as the result of moment; but only JavaScript Date is accepted. However, JavaScript date dissregards all the timezones since it is unix based.
This is my aggregate function to get distinct values based on filters.
return await Model.aggregate(
[
{
$match: filters
},
{
$group: {
_id: {userUuid: "$userUuid" }
}
}
]
).allowDiskUse(true);
As I said, $match does not work with moment, but only with new Date(opts.startDate), however, JavaScript's new Date disregards moment's timezone. Nor it has a proper native timezone. Any thought on how to achieve this array of unique ids based on filters with Mongoose?

This is the solution I came up with and it works pretty well regarding the performance. Use this solution for large dataset:
let filters = {
applicationUuid: opts.applicationUuid,
impressions: { $gte: opts.impressions },
$expr: {
$and: [
{
$gte: [
'$date',
{
$dateFromString: {
dateString: opts.startDate,
timezone: 'America/Chicago',
},
},
],
},
{
$lt: [
'$date',
{
$dateFromString: {
dateString: opts.endDate,
timezone: 'America/Chicago',
},
},
],
},
],
},
}
return Model.aggregate([
{ $match: filters },
{
$group: {
_id: '$userUuid',
},
},
{
$project: {
_id: 0,
userUuid: '$_id',
},
},
])
.allowDiskUse(true)
Which will return a list of unique ids i.e.
[
{ userUuid: "someId" },
{ userUuid: "someId" }
]
Use the following method on small dataset which is more convenient:
let filters = {
applicationUuid: opts.applicationUuid,
impressions: {
$gte: opts.impressions
},
date: {
$gte: opts.startDate,
$lte: opts.endDate
}
}
return Model.distinct("userUuid", filters)
Which will return the following result:
[ "someId", "someOtherId" ]

Mongoose date $gte operator not working as expected

I am trying to write query for last week but it is not working as expected in mongoDB.
[{
$lookup: {
from: 'reviews',
localField: 'groupReviews',
foreignField: '_id',
as: 'groupReviews'
}
}, {
$match: {
$and: [{
_id: {
$eq: ObjectId('5f247eea8ad8eb53883f4a9b')
}
},
{
"groupReviews.reviewCreated": {
$gte: ISODate('2020-06-20T10:24:51.303Z')
}
}
]
}
}, {
$project: {
count: {
$size: "$groupReviews",
},
groupReviews: {
$slice: ["$groupReviews", 0, 20],
}
}
}, {
$sort: {
"groupReviews.reviewCreated": -1
}
}]
the actual result: above code returning results which is older than 2020-06-20.
the expected result: it should not display older than 2020-06-20.
I am attaching an image for more reference.
Image Link

The $match stages matches entire documents, not individual array elements. If the array contains at least one element that satisfies the $gte condition, the document will be matched and passed along the pipeline.
If you want to remove the individual array elements that are older than the given date, you could either
$unwind the array before matching and $group to rebuild it with only the matching entries
Use $filter in your $project stage to eliminate the unwanted elements prior to slicing

mongoDB project array element after filter

I have the following (simplified) aggregation:
Model.aggregate([
{
$lookup: {
from: 'orders',
localField: '_id',
foreignField: 'customer',
as: 'orders',
},
},
{
$project: {
openOrders: {
$filter: {
input: '$orders',
as: 'order',
cond: { $eq: ['$$order.status', 'open'] },
},
},
},
},
])
which returns the following:
{
_id: ...,
openOrders: [
[Object], [Object]
],
}
Those [Object]'s are simply the returned objects, persisted in the database, with all their fields.
I don't find a way to project/filter out those objects' fields and instead return only their _id's:
{
_id: ...,
openOrders: [
_id: ...,
_id: ....
],
}
EDIT: I'd rather prefer the following expected output:
{
_id: ...,
openOrders: [
{ _id: ... },
{ _id: ... }
],
}
I tried adding a new $project stage at various points of the aggregation with no success. Can someone help me?

You should add a $project stage like below:
{
$project: {
openOrders: 'openOrders._id'
}
}
This will give the output like:
{
_id: ...,
openOrders: [
_id1,
_id2,
...
],
}
instead of
{
_id: ...,
openOrders: [
_id: ...,
_id: ....
],
}
I suggest this type of querying because, if you actually see openOrders, it's just the array of _ids, so adding only one _id field inside the array doesn't make sence.
If you still want the output to be like the array of object, then you can use the below:
{
$project: {
'openOrders._id': 1
}
}

As you need an array of _id's like this
openOrders: [ _id: ..., _id: .... ]
but not an array of _id's in objects :
openOrders: [ {_id: ...}, {_id: ....} ]
You need to use $reduce instead of $filter :
Try below query :
db.collection.aggregate([
{
$project: {
openOrders: {
$reduce: {
input: "$orders", // Same like `$filter` use reduce to iterate on array
initialValue: [], // consider an initial value
in: { // If condition is met, push value to array else return holding array as is.
$cond: [ { $eq: [ "$$this.status", "open" ] },
{ $concatArrays: [ "$$value", [ "$$this._id" ] ] },
"$$value"
]
}
}
}
}
}
])
Test : mongoplayground
Note : In javaScript - if you're printing a JSON with objects, you need to print it with JSON.stringify(yourJSON) - which makes it a string, So that you don't see [Object], [Object] in console rather you would see actual objects.
Update :
If you need an array of objects with _id field just add another $project stage at the end, but I would highly suggest to use $reduce and get an array for your scenario :
{ $project: { "openOrders._id": 1 } } // which would just extract `_id` fields in each objects
Test : mongoplayground

How to query documents by a condition on the subdocument with the latest date [duplicate]

This question already has an answer here:
Query on Last Array Value
(1 answer)
Closed 4 years ago.
I'm trying to figure out the best way to query documents based on a criteria on the latest subdocument.
So my data might look like this:
[{
_id: '59bb31efae69726bd5fc9391',
name: 'Something',
terms: [
{
_id: '58e54f5aad59a6000cdcd590',
begDate: '2017-06-13T07:00:00.000Z',
endDate: '2018-01-01T07:59:59.999Z'
},
{
_id: '59bb32765e651d28909ed706',
begDate: '2018-01-01T08:00:00.000Z',
endDate: '2019-01-01T07:59:59.999Z'
}
]
}, {
_id: '59f20ddeef426f6bca3abbf1',
name: 'Something',
terms: [
{
_id: '59f20e35c8257b5b0f22d2a6',
begDate: '2018-06-13T07:00:00.000Z',
endDate: '2019-01-01T07:59:59.999Z'
},
{
_id: '59f20e9394c8108d9db33bf9',
begDate: '2019-01-01T08:00:00.000Z',
endDate: '2020-01-01T07:59:59.999Z'
}
]
}]
What I want is to get all documents whose last term's endDate is 2019-01-01T07:59:59.999Z This could be done by either getting the last term in an array, or more reliably sorting terms, and then grabbing the last one.
I can see how I could do this with $where but I know if I can find another way it would be more performant.
I also want to add, whatever I do here would accompany other query parameters. For example:
{
_id: {
'$in': [
ObjectId("591e5e37abddad14afe1b272"),
ObjectId("591e5e37abddad14afe1b123")
]
}
}
UPDATE:
As noted, this question has a duplicate (which was hard for me to find as the question referenced is difficult to understand). That being said, I'm not only looking for the last in an array but also the most recent (I agree that's not clear in the body of the question). I'm not arguing against the duplicate question reference, but for the sake of making this easier for future readers, you'll find in the accepted answer a clean solution for mongo 3.6+ as well as a reference to another question in the comments which should help if you want to query by date in subdocuments.

Using $expr to perform a 'complex' match and $let to have an intermediate variable storing the last element of arrays found with "$arrayElemAt": [ "$terms", -1 ] in order to compare it to the date in question:
db.collection.find({
$expr: {
$let: {
vars: { "last": { $arrayElemAt: [ "$terms", -1 ] } },
in: { $eq: [ "$$last.endDate", "2019-01-01T07:59:59.999Z" ] }
}
}
})
which returns with the input you provided the first record.
And, as per your requirements, in order not to exclude the possibility to add additional filters, you can add them using $and:
db.collection.find({
$and: [
{ $expr: { $let: {
vars: { "last": { $arrayElemAt: [ "$terms", -1 ] } },
in: { $eq: [ "$$last.endDate", "2019-01-01T07:59:59.999Z" ] }
}}},
{ "_id": { $ne: "sss" } } // actually whatever additional filter
]
})
Exact same thing can be achieved with an aggregate pipeline, if you wish to perform additional stages with your matching documents:
db.collection.aggregate([
{ $match: {
$and: [
{ $expr: { $let: {
vars: { "last": { $arrayElemAt: [ "$terms", -1 ] } },
in: { $eq: [ "$$last.endDate", "2019-01-01T07:59:59.999Z" ] }
}}},
{ "_id": { $ne: "sss" } }
]
}},
{ ... }
])

Develop Reference

JavaScript is the programming language of the Web.

Lookup using an ID array with pipeline - javascript

Related

MongoDB Aggregation - match documents with array of objects, by another array of objects filter

How to properly get distinct values with Mongoose in large dataset with date filters with timezone?

Mongoose date $gte operator not working as expected

mongoDB project array element after filter

How to query documents by a condition on the subdocument with the latest date [duplicate]

Categories

Resources