hey I am quite new to mongoose and can't get my head around search.
models
User->resumes[]->employments[]
UserSchema
{
resumes: [ResumeSchema],
...
}
ResumeSchema
{
employments: [EmploymentSchema],
...
}
EmploymentSchema
{
jobTitle: {
type: String,
required: [true, "Job title is required."]
},
...
}
Background
User has to enter job title and needs suggestions from the existing data of the already present resumes and their employment's job title
I have tried the following code.
let q = req.query.q; // Software
User.find({ "resumes.employments.jobTitle": new RegExp(req.query.q, 'ig') }, {
"resumes.employments.$": 1
}, (err, docs) => {
res.json(docs);
})
Output
[
{
_id: '...',
resumes:[
{
employments: [
{
jobTitle: 'Software Developer',
...
},
...
]
},
...
]
},
...
]
Expected OutPut
["Software Developer", "Software Engineer", "Software Manager"]
Problem
1:) The Data returned is too much as I only need jobTitle
2:) All employments are being returned whereas the query matched one of them
3:) Is there any better way to do it ? via index or via $search ? I did not find much of information in mongoose documentation to create search index (and I also don't really know how to create a compound index to make it work)
I know there might be a lot of answers but none of them helped or I was not able to make them work ... I am really new to mongodb I have been working with relational databases via SQL or through ORM so my mongodb concepts and knowledge is limited.
So please let me know if there is a better solution to do it. or something to make the current one working.
You can use one of the aggregation query below to get this result:
[
{
"jobTitle": [
"Software Engineer",
"Software Manager",
"Software Developer"
]
}
]
Query is:
First using $unwind twice to deconstructs the arrays and get the values.
Then $match to filter by values you want using $regex.
Then $group to get all values together (using _id: null and $addToSet to no add duplicates).
And finally $project to shown only the field you want.
User.aggregate({
"$unwind": "$resumes"
},
{
"$unwind": "$resumes.employments"
},
{
"$match": {
"resumes.employments.jobTitle": {
"$regex": "software",
"$options": "i"
}
}
},
{
"$group": {
"_id": null,
"jobTitle": {
"$addToSet": "$resumes.employments.jobTitle"
}
}
},
{
"$project": {
"_id": 0
}
})
Example here
Also another option is using $filter into $project stage:
Is similar as before but using $filter instead of $unwind twice.
User.aggregate({
"$unwind": "$resumes"
},
{
"$project": {
"jobs": {
"$filter": {
"input": "$resumes.employments",
"as": "e",
"cond": {
"$regexMatch": {
"input": "$$e.jobTitle",
"regex": "Software",
"options": "i"
}
}
}
}
}
},
{
"$unwind": "$jobs"
},
{
"$group": {
"_id": null,
"jobTitle": {
"$addToSet": "$jobs.jobTitle"
}
}
},
{
"$project": {
"_id": 0
}
})
Example here
Related
I am stuck in a problem where I have a field which is sometimes string and sometimes the output of that field is in array so how can i tackle that in $addField query
I am sharing my mongo query code
db.ledger_scheme_logs.aggregate([
{
$match:{
"type":{ $in: ["add","edit"]},
}
},
{
"$addFields": {
"trail_beginning": {
$substr: [ "$metadata.schemes._trail", 0, 36 ]
}
}
},
{
$group: {
"_id": {
"trail_beginning":"$trail_beginning"
},
"count": { $sum: 1 },
"items": { $push: "$$ROOT" },
}
},
{
"$sort": {
count: -1
}
}
])
In this query the "$metadata.schemes._trail" here schemes is in array in some array of objects and because of that I am getting mongo error -> "message" : "can't convert from BSON type array to String" so how can I solve this type of problem any help with example would be appreciated.
Thanks in advance!
The bigger and trickier question here is about what behavior you would like the system to have rather than how to actually make the database do it. There's a closely related topic around (consistent) schema design that naturally follows.
To directly answer your question, you can use the $cond operator to conditionally calculate the new trail_beginning field based on the data type of the source document currently being processed. An example would be something like:
{
"$addFields": {
"trail_beginning": {
"$cond": {
"if": {
$eq: [
{
$type: "$metadata.schemes"
},
"array"
]
},
"then": {
"$map": {
"input": "$metadata.schemes._trail",
"in": {
$substr: [
"$$this",
0,
3
]
}
}
},
"else": {
$substr: [
"$metadata.schemes._trail",
0,
3
]
}
}
}
}
}
Using two sample documents with different schemas yields the following as demonstrated in this playground example:
[
{
"_id": 1,
"metadata": {
"schemes": {
"_trail": "ABCDEFG"
}
},
"trail_beginning": "ABC"
},
{
"_id": 2,
"metadata": {
"schemes": [
{
"_trail": "HIJKLMN"
},
{
"_trail": "OPQRSTU"
}
]
},
"trail_beginning": [
"HIJ",
"OPQ"
]
}
]
Taking a glance at the rest of your pipeline though, I suspect (but can't say for sure) that this isn't actually what you want to do. This is because the subsequent $group will use the entire array of values to do the grouping, but I'm (again) guessing that you want to group based on individual values.
If my assumptions are correct, then logically what you really want to do is $unwind the array first before you do the substring transformation. This will correct the subsequent grouping logic and, as a side effect, it will also eliminate your problem of having different possible input types during the $addFields stage. Your full pipeline would look something like this:
db.ledger_scheme_logs.aggregate([
{
$match:{
"type":{ $in: ["add","edit"]},
}
},
{
$unwind: "$metadata.schemes"
},
{
"$addFields": {
"trail_beginning": {
$substr: [ "$metadata.schemes._trail", 0, 36 ]
}
}
},
{
$group: {
"_id": {
"trail_beginning":"$trail_beginning"
},
"count": { $sum: 1 },
"items": { $push: "$$ROOT" },
}
},
{
"$sort": {
count: -1
}
}
])
Playground demonstration (using a shorter substring) here.
This works because $unwind will treat non-array field paths as a single element array. However, having a discrepancy in the schema may frequently result in you having to put in special conditional logic to account for the difference in various places in the application. Consider simplifying development by making the schema consistent (converting the non-arrays to arrays with single values).
I have two mongoDB queries one is aggregate and another one is a find.
They are coded in a way that if "aggregate" query gives result then "find" query doesn't run, otherwise, if "aggregate" gives no result then find query runs.
In the following way:-
var pipeline1 = [{
$match: { "user_id": "123" } //dynamic value based on request
}, {
$lookup: {
from: "config_rules",
localField: "group_id",
foreignField: "rule_type_value", //this field has group id mapped || or can be null
as: "rule"
}
},
{
$unwind: "$rule"
},{
$match:{ "rule.configtype": "profile" } //dynamic value based on request
}];
db.getCollection("user_group_mapping").aggregate(pipeline);
If the above aggregate gives a result then, the same is returned. or else we run the following find query to get config rule for the general user, and return it
var query = {
$and: [
{ rule_type_value: null }, //null for general user rules
{ configtype: "profile" }
]
}
db.getCollection("config_rules").find(query)
In simple words for a request, we check if the requester is in a group if yes, then we return config rule based on this group,
If the requester is not in any group then we return general config rule.
So my query is as seen above these are two different query running on different collection, and requires two separate mongo calls. Can I somehow combine these queries into 1 query?,
Like- If for a given user he is in a group return group-specific config or return general config rule.
I want to combine these so that in my code I will need to make only one DB call(this db call itself has both query consolidated in one) instead of two.
Sample document in user_group_mapping collection
{ "user_id": "123",
"group_id": "beta_users"
},
{ "user_id": "213",
"group_id": "alpha_testers";
}
Sample data in config_rules :
{ "rule_type_value":"beta_users",
"configType": "help",
"configVersion": "1.1"
},
{ "rule_type_value":null,
"configType": "help",
"configVersion": "1.0"
},
{ "rule_type_value":"alpha_testers",
"configType": "help",
"configVersion": "1.3"
}
Sample Input:
Req 1 user_id: "123"
configType: "help"
Req 2 user_id : "678"
configType: "help"
Sample output: (I have only written rule content for simplicity)
Req 1 config v1.1 will be returned
{ "rule_type_value":"beta_users",
"configType": "help",
"configVersion": "1.1"
}
Req 2 v1.0 will be returned
{ "rule_type_value":null,
"configType": "help",
"configVersion": "1.0"
}
try:
https://mongoplayground.net/p/m3HxBQIuqpS
please set configType at line 23 and set user_id at line 31
db.config_rules.aggregate([
{
$lookup: {
from: "user_group_mapping",
localField: "rule_type_value",
foreignField: "group_id",
as: "rule"
}
},
{
$addFields: {
"ruleCount": {
$size: "$rule",
},
"user_id": {
$first: "$rule.user_id"
}
}
},
{
$match: {
"configType": "help"
}
},
{
$match: {
$or: [
{
user_id: {
$eq: "678"//123 or 678
}
},
{
user_id: {
$exists: false
}
}
]
}
},
{
$sort: {
"ruleCount": -1
}
},
{
$limit: 1
},
{
$project: {
"_id": 0,
"rule_type_value": 1,
"configType": 1,
"configVersion": 1
}
}
])
MongoDB aggregation does not have flow control, and it will not execute subsequent stages if there are no documents output from a stage.
If you want to retrieve 1 of 2 possible values from the linked collection, change the $lookup stage so that all potential documents are selected, and filter the returned list afterward. Perhaps something similar to:
[
{$match: { "user_id": "123" }},
{$lookup: {
from: "config_rules",
let: {targetgroup: "$group_id"},
pipeline: [{$match:{
configtype: "profile",
$or:[
{$expr:{$eq:["$rule_type_value","$$targetgroup"]}},
{ rule_type_value: null, }
]
}}],
as: "rule"
}},
{$set: {
rule: {$cond: {
if: {$in: ["$group_id", "$rule.rule_type_value"]},
then: {$filter: {
input: "$rule",
cond: {$eq: ["$group_id", "$$this.rule_type_value"]}
}},
else: "$rule"
}},
I am trying to make a healthcheck on references in one of my collections. so to see if objects referenced to still exist and if not I want to delete that _id in the array
I haven't found anything to that so my idea is to get the reversed result of a $lookup
Is it possible to get the reversed result of a lookup in MongoDB?
Here is an example of a collection and its taskList with references to the tasks collection.
Now I want to delete all the id's in there that do not have an existing result in the tasks collection.
How I solve it right now which is tons of queries:
get all the ids from taskList
Send a query for every single one of them to see if there is no match with the task collection
Send a query to pull that empty reference out of the array
I think this does what you want, its ok even if you have big collections.
But its not an update you can do after that a $merge stage, to the tasklists (if match on _id replace)(requires MongoDB >= 4.4) or you can do a $out stage to another collection, and replace the tasklist collection.
Test code here
Data in
db={
"tasklists": [
{
"_id": 1,
"tasklist": [
1,
2,
3,
4
]
},
{
"_id": 2,
"tasklist": [
5,
6,
7
]
}
],
"tasks": [
{
"_id": 1
},
{
"_id": 2
},
{
"_id": 3
},
{
"_id": 5
}
]
}
db.tasklists.aggregate([
{
"$lookup": {
"from": "tasks",
"let": {
"tasklist": "$tasklist"
},
"pipeline": [
{
"$match": {
"$expr": {
"$in": [
"$_id",
"$$tasklist"
]
}
}
}
],
"as": "valid"
}
},
{
"$addFields": {
"valid": {
"$map": {
"input": "$valid",
"as": "v",
"in": "$$v._id"
}
}
}
},
{
"$addFields": {
"tasklist": {
"$filter": {
"input": "$tasklist",
"as": "t",
"cond": {
"$in": [
"$$t",
"$valid"
]
}
}
}
}
},
{
"$unset": [
"valid"
]
}
])
Results (tasks 4,6,7 wasnt found in the task collection,and removed)
[
{
"_id": 1,
"tasklist": [
1,
2,
3
]
},
{
"_id": 2,
"tasklist": [
5
]
}
]
Edit
If you want to use index to do the $lookup you can try this
Test code here
Tasks have index on _id so no need to make one, if you dont join on _id make one.
db.tasklists.aggregate([
{
"$unwind": {
"path": "$tasklist"
}
},
{
"$lookup": {
"from": "tasks",
"localField": "tasklist",
"foreignField": "_id",
"as": "joined"
}
},
{
"$match": {
"$expr": {
"$gt": [
{
"$size": "$joined"
},
0
]
}
}
},
{
"$unset": [
"joined"
]
},
{
"$group": {
"_id": "$_id",
"tasklist": {
"$push": "$tasklist"
},
"afield": {
"$first": "$afield"
}
}
}
])
After that you can do $out or $merge with replace option.
But both lose the updated data if any while this was happening.
Only solution for this(if it is a problem) $merge with pipeline,
You need to keep also in the pipeline above an extra array with the initial tasklist, so you remove the valid ones, to have the invalid ones, and then on merge with pipeline to filter the array, and just removed those invalid. (this is safe, from data loss)
I think the best approach instead of doing all those is to have an index on tasklist(multikey index) and when an _id is deleted from tasks,to delete the _id from the array in tasklist.With index it will be fast, so you dont need to check for invalid _ids.
Afaik there's no other way than you described in order to achieve the desired outcome, but you can greatly simplify the second step to find the non-matching items. In fact it's the set difference between the taskList-ids and the existing task-ids.
So you could use the $setDifference-operator to calculate that difference:
db.tasks.aggregate([
{
$group: {
_id: "null",
ids: {
"$addToSet": "$_id"
}
}
},
{
$project: {
nonMatchingTaskIds: {
$setDifference: [
[
"taskId1",
"taskId2",
"taskId7",
"taskId8"
],
"$ids"
]
}
}
}
])
Assuming your tasks collection contains taskId1, task2 (and other documents), but not taskId7 and taskId8, the query will result in nonMatchingTaskIds containing taskId7 and taskId8.
Here's an example on mongoplayground: https://mongoplayground.net/p/75BpiGBJi3Q
So what I came to do now is a few stepped method.
This is quite fast but sicne the taskIds collected from Sets are currently way smaller than the entire amount of sets I imagine working with the $setDifference operator mentioned by eol will be faster once I get that many references.
let taskIdsInSets = []
// Get all referenced task ids
const result = await this.setSchema.aggregate([
{
'$project': {
'taskList': 1
}
}
])
// Map all elements in one row
result.forEach(set => taskIdsInSets.push(...set.taskList.map(x=> x.toString())))
// Delete duplicates of taskIds here
taskIdsInSets.filter((item, index) => taskIdsInSets.indexOf(item) != index)
// Get the existing task ids that are referenced in a Set
const result2 = await this.taskSchema.aggregate([
{
'$match': {
'_id': {
'$in': [...taskIdsInSets.map(x => Types.ObjectId(x.toString()))]
}
}
}, {
'$project': {
'_id': 1
}
}
])
let existingIdsInTasks = []
// Getting ids from result2 Object into
result2.forEach(set => existingIdsInTasks.push(set._id.toString()))
// Filtering out the ids that don't actually exist
let nonExistingTaskIds = taskIdsInSets.filter(x => existingIdsInTasks.indexOf(x) === -1);
// Deleting the ids that don't actually exist but are in Sets
const finalResult = await this.setSchema.updateMany(
{
$pullAll: {
taskList: [...nonExistingTaskIds.map(x => Types.ObjectId(x.toString()))]
}
})
console.log(finalResult)
return finalResult // returns the information how much got changed. unfortunately in mongoose there isn't the option to use findAndModify with `{new:true}` or atleast I didn't manage to make it work.
for some reason what the database returns neither matches the Mongo ObjectId nor strings so I have to do some castings there.
I am working on versioning, We have documents based on UUIDs andjobUuids, andjobUuids are the documents associated with the currently working user. I have some aggregate queries on these collections which I need to update based on the job UUIDs,
The results fetched by the aggregate query should be such that,
if the current usersjobUuid document does not exist then the master document with jobUuid: "default" will be returned(The document without any jobUuid),
if job uuid exists then only the document is returned.
I have a$match used to get these documents based on certain conditions, from those documents I need to filter out the documents based on the above conditions, and an example is shown below,
The data looks like this:
[
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"name": "adam",
"jobUuid": "default",
},
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "adam"
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"name": "eve",
"jobUuid": "default",
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "eve"
},
{
"uuid": "26cba689-7eb6-4a9e-a04e-24ede0309e50",
"name": "john",
"jobUuid": "default",
}
]
Results for "jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12" should be:
[
{
"uuid": "5cdb5a10-4f9b-4886-98c1-31d9889dd943",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "adam"
},
{
"uuid": "b745baff-312b-4d53-9438-ae28358539dc",
"jobUuid": "d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12",
"name": "eve"
},
{
"uuid": "26cba689-7eb6-4a9e-a04e-24ede0309e50",
"name": "john",
"jobUuid": "default",
}
]
Based on the conditions mentioned above, is it possible to filter the document within the aggregate query to extract the document of a specific job uuid?
Edit 1: I got the following solution, which is working fine, I want a better solution, eliminating all those nested stages.
Edit 2: Updated the data with actual UUIDs and I just included only the name as another field, we do have n number of fields which are not relevant to include here but needed at the end (mentioning this for those who want to use the projection over all the fields).
Update based on comment:
but the UUIDs are alphanumeric strings, as shown above, does it have
an effect on these sorting, and since we are not using conditions to
get the results, I am worried it will cause issues.
You could use additional field to match the sort order to be the same order as values in the in expression. Make sure you provide the values with default as the last value.
[
{"$match":{"jobUuid":{"$in":["d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12","default"]}}},
{"$addFields":{ "order":{"$indexOfArray":[["d275781f-ed7f-4ce4-8f7e-a82e0e9c8f12","default"], "$jobUuid"]}}},
{"$sort":{"uuid":1, "order":1}},
{
"$group": {
"_id": "$uuid",
"doc":{"$first":"$$ROOT"}
}
},
{"$project":{"doc.order":0}},
{"$replaceRoot":{"newRoot":"$doc"}}
]
example here - https://mongoplayground.net/p/wXiE9i18qxf
Original
You could use below query. The query will pick the non default document if it exists for uuid or else pick the default as the only document.
[
{"$match":{"jobUuid":{"$in":[1,"default"]}}},
{"$sort":{"uuid":1, "jobUuid":1}},
{
"$group": {
"_id": "$uuid",
"doc":{"$first":"$$ROOT"}
}
},
{"$replaceRoot":{"newRoot":"$doc"}}
]
example here - https://mongoplayground.net/p/KrL-1s8WCpw
Here is what I would do:
match stage with $in rather than an $or (for readability)
group stage with _id on $uuid, just as you did, but instead of pushing all the data into an array, be more selective. _id is already storing $uuid, so no reason to capture it again. name must always be the same for each $uuid, so take only the first instance. Based on the match, there are only two possibilities for jobUuid, but this will assume it will be either "default" or something else, and that there can be more than one occurrence of the non-"default" jobUuid. Using "$addToSet" instead of pushing to an array in case there are multiple occurrences of the same jobUuid for a user, also, before adding to the set, use a conditional to only add non-"default" jobUuids, using $$REMOVE to avoid inserting a null when the jobUuid is "default".
Finally, "$project" to clean things up. If element 0 of the jobUuids array does not exist (is null), there is no other possibility for this user than for the jobUuid to be "default", so use "$ifNull" to test and set "default" as appropriate. There could be more than 1 jobUuid here, depending if that is allowed in your db/application, up to you to decide how to handle that (take the highest, take the lowest, etc).
Tested at: https://mongoplayground.net/p/e76cVJf0F3o
[{
"$match": {
"jobUuid": {
"$in": [
"1",
"default"
]
}
}
},
{
"$group": {
"_id": "$uuid",
"name": {
"$first": "$name"
},
"jobUuids": {
"$addToSet": {
"$cond": {
"if": {
"$ne": [
"$jobUuid",
"default"
]
},
"then": "$jobUuid",
"else": "$$REMOVE"
}
}
}
}
},
{
"$project": {
"_id": 0,
"uuid": "$_id",
"name": 1,
"jobUuid": {
"$ifNull": [{
"$arrayElemAt": [
"$jobUuids",
0
]
},
"default"
]
}
}
}]
I was able to solve this problem with the following aggregate query,
We are first extracting the results matching only the jobUuid provided by the user or the "default" in the match section.
Then the results are grouped based on the uuid, using a group stage and we are counting the results as well.
Using the conditions in replaceRoot first we are checking the length of the grouped document,
If the grouped document length is greater than or equal to 2, we are
filtering the document that matches the provided jobUuid.
If it's less or equal to the 1, then we are checking if it's matching the default jobUuid and returning it.
The Query is below:
[
{
$match: {
$or: [{ jobUuid:1 },{ jobUuid: 'default'}]
}
},
{
$group: {
_id: '$uuid',
count: {
$sum: 1
},
docs: {
$push: '$$ROOT'
}
}
},
{
$replaceRoot: {
newRoot: {
$cond: {
if: {
$gte: [
'$count',
2
]
},
then: {
$arrayElemAt: [
{
$filter: {
input: '$docs',
as: 'item',
cond: {
$ne: [
'$$item.jobUuid',
'default'
]
}
}
},
0
]
},
else: {
$arrayElemAt: [
{
$filter: {
input: '$docs',
as: 'item',
cond: {
$eq: [
'$$item.jobUuid',
'default'
]
}
}
},
0
]
}
}
}
}
}
]
I have a blogs collection which has almost the following schema:
{
title: { name: "My First Blog Post",
postDate: "01-28-11" },
content: "Here is my super long post ...",
comments: [ { text: "This post sucks!"
, name: "seanhess"
, created: 01-28-14}
, { text: "I know! I wish it were longer"
, name: "bob"
, postDate: 01-28-11}
]
}
I mainly want to run three queries:
Give me all the comments made by only bob
Find all the comments made at the same day the post is written which is comments.postDate = title.postDate.
Find all the comments made by bob on the same day the post is written
My questions are as following:
These three are going to be really frequent queries, so is it a good idea to use aggregation framework?
For the third query, I can simply make a query like db.blogs.find({"comments.name":"bob"}, {comments.name:1, comments.postDate:1, title.postDate:1}) and then do a client side post processing to loop through the returned results. Is it a good idea? I'd like to note that it is possible that this might return several thousand documents back.
I will be happy if you can propose some ways to make the third query.
It probably is best practice here to "break-up" your multiple questions in to several questions, if not only for that maybe the answer on one question would have led you to understand the other.
I am also not very keen on answering anything where there is no example shown of what yo have tried to do. But with that said and "shooting myself in the foot", the questions are reasonable from a design approach so I will answer.
Point 1 : Comments by "bob"
Standard $unwind and filter the results. Use $match first so you don't process unneeded documents.
db.collection.aggregate([
// Match to "narrow down" the documents.
{ "$match": { "comments.name": "bob" }},
// Unwind the array
{ "$unwind": "$comments" },
// Match and "filter" just the "bob" comments
{ "$match": { "comments.name": "bob" }},
// Possibly wind back the array
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"content": { "$first": "$content" },
"comments": { "$push": "$comments" }
}}
])
Point 2: All comments on the same day
db.collection.aggregate([
// Try and match posts within a date or range
// { "$match": { "title.postDate": Date( /* something */ ) }},
// Unwind the array
{ "$unwind": "$comments" },
// Aha! Project out the same day. Not the time-stamp.
{ "$project": {
"title": 1,
"content": 1,
"comments": 1,
"same": { "$eq": [
{
"year" : { "$year": "$title.postDate" },
"month" : { "$month": "$title.postDate" },
"day": { "$dayOfMonth": "$title.postDate" }
},
{
"year" : { "$year": "$comments.postDate" },
"month" : { "$month": "$comments.postDate" },
"day": { "$dayOfMonth": "$comments.postDate" }
}
]}
}},
// Match the things on the "same
{ "$match": { "same": true } },
// Possibly wind back the array
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"content": { "$first": "$content" },
"comments": { "$push": "$comments" }
}}
])
Point 3: "bob" on the same date
db.collection.aggregate([
// Try and match posts within a date or range
// { "$match": { "title.postDate": Date( /* something */ ) }},
// Unwind the array
{ "$unwind": "$comments" },
// Aha! Project out the same day. Not the time-stamp.
{ "$project": {
"title": 1,
"content": 1,
"comments": 1,
"same": { "$eq": [
{
"year" : { "$year": "$title.postDate" },
"month" : { "$month": "$title.postDate" },
"day": { "$dayOfMonth": "$title.postDate" }
},
{
"year" : { "$year": "$comments.postDate" },
"month" : { "$month": "$comments.postDate" },
"day": { "$dayOfMonth": "$comments.postDate" }
}
]}
}},
// Match the things on the "same" field
{ "$match": { "same": true, "comments.name": "bob" } },
// Possibly wind back the array
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"content": { "$first": "$content" },
"comments": { "$push": "$comments" }
}}
])
Results
Honestly, and especially if you are using some indexing to feed to the initial $match stages of these operations, then it should be very clear that this will "run rings" around trying to iterate this in code.
At the very least this reduces the returned records "over the wire", so there is less network traffic. And of course there is less (or nothing) to post process once the query results have been received.
As a general convention, database server hardware tends to be an order of magnitude higher rated in performance than "application server" hardware. So again the general condition is that anything executed on the server will run faster.
Is aggregation the right thing: "Yes". and by a long long way. You even get a cursor very soon.
How can you do the queries you want: Shown to be pretty simple. And in real world code we never "hard code" this, we build it dynamically. So adding conditions and attributes should be as simple as all you normal data manipulation code.
So I would not normally answer this style of question. But say thank-you! Please ?