Query nested document with mongoose - javascript

I know this question has been asked a lot of times but I'm kinda new to mongo and mongoose as well and I couldn't figure it out !
My problem:
I have a which looks like this:
var rankingSchema = new Schema({
userId : { type : Schema.Types.ObjectId, ref:'User' },
pontos : {type: Number, default:0},
placarExato : {type: Number, default:0},
golVencedor : {type: Number, default:0},
golPerdedor : {type: Number, default:0},
diferencaVencPerd : {type: Number, default:0},
empateNaoExato : {type: Number, default:0},
timeVencedor : {type: Number, default:0},
resumo : [{
partida : { type : Schema.Types.ObjectId, ref:'Partida' },
palpite : [Number],
quesito : String
}]
});
Which would return a document like this:
{
"_id" : ObjectId("539d0756f0ccd69ac5dd61fa"),
"diferencaVencPerd" : 0,
"empateNaoExato" : 0,
"golPerdedor" : 0,
"golVencedor" : 1,
"placarExato" : 2,
"pontos" : 78,
"resumo" : [
{
"partida" : ObjectId("5387d991d69197902ae27586"),
"_id" : ObjectId("539d07eb06b1e60000c19c18"),
"palpite" : [
2,
0
]
},
{
"partida" : ObjectId("5387da7b27f54fb425502918"),
"quesito" : "golsVencedor",
"_id" : ObjectId("539d07eb06b1e60000c19c1a"),
"palpite" : [
3,
0
]
},
{
"partida" : ObjectId("5387dc012752ff402a0a7882"),
"quesito" : "timeVencedor",
"_id" : ObjectId("539d07eb06b1e60000c19c1c"),
"palpite" : [
2,
1
]
},
{
"partida" : ObjectId("5387dc112752ff402a0a7883"),
"_id" : ObjectId("539d07eb06b1e60000c19c1e"),
"palpite" : [
1,
1
]
},
{
"partida" : ObjectId("53880ea52752ff402a0a7886"),
"quesito" : "placarExato",
"_id" : ObjectId("539d07eb06b1e60000c19c20"),
"palpite" : [
1,
2
]
},
{
"partida" : ObjectId("53880eae2752ff402a0a7887"),
"quesito" : "placarExato",
"_id" : ObjectId("539d0aa82fb219000054c84f"),
"palpite" : [
2,
1
]
}
],
"timeVencedor" : 1,
"userId" : ObjectId("539b2f2930de100000d7356c")
}
My question is, first: How can I filter the resumo nested document by quesito ? Is it possible to paginate this result, since this array is going to increase. And last question, is this a nice approach to this case ?
Thank you guys !

As noted, your schema implies that you actually have embedded data even though you are storing an external reference. So it is not clear if you are doing both embedding and referencing or simply embedding by itself.
The big caveat here is the difference between matching a "document" and actually filtering the contents of an array. Since you seem to be talking about "paging" your array results, the large focus here is on doing that, but still making mention of the warnings.
Multiple "filtered" matches in an array requires the aggregation framework. You can generally "project" the single match of an array element, but this is needed where you expect more than one:
Ranking.aggregate(
[
// This match finds "documents" that "contain" the match
{ "$match": { "resumo.quesito": "value" } },
// Unwind de-normalizes arrays as documents
{ "$unwind": "$resumo" },
// This match actually filters those document matches
{ "$match": { "resumo.quesito": "value" } },
// Skip and limit for paging, which really only makes sense on single
// document matches
{ "$skip": 0 },
{ "$limit": 2 },
// Return as an array in the original document if you really want
{ "$group": {
"_id": "$_id",
"otherField": { "$first": "$otherField" },
"resumo": { "$push": "$resumo" }
}}
],
function(err,results) {
}
)
Or the MongoDB 2.6 way by "filtering" inside a $project using the $map operator. But still you need to $unwind in order to "page" array positions, but there is possibly less processing as the array is "filtered" first:
Ranking.aggregate(
[
// This match finds "documents" that "contain" the match
{ "$match": { "resumo.quesito": "value" } },
// Filter with $map
{ "$project": {
"otherField": 1,
"resumo": {
"$setDifference": [
{
"$map": {
"input": "$resumo",
"as": "el",
"in": { "$eq": ["$$el.questio", "value" ] }
}
},
[false]
]
}
}},
// Unwind de-normalizes arrays as documents
{ "$unwind": "$resumo" },
// Skip and limit for paging, which really only makes sense on single
// document matches
{ "$skip": 0 },
{ "$limit": 2 },
// Return as an array in the original document if you really want
{ "$group": {
"_id": "$_id",
"otherField": { "$first": "$otherField" },
"resumo": { "$push": "$resumo" }
}}
],
function(err,results) {
}
)
The inner usage of $skip and $limit here really only makes sense when you are processing a single document and just "filtering" and "paging" the array. It is possible to do this with multiple documents, but is very involved as there is no way to just "slice" the array. Which brings us to the next point.
Really with embedded arrays, for paging that does not require any filtering you just use the $slice operator, which was designed for this purpose:
Ranking.find({},{ "resumo": { "$slice": [0,2] } },function(err,docs) {
});
Your alternate though is to simply reference the documents in the external collection and then pass the arguments to mongoose .populate() to filter and "page" the results. The change in the schema itself would just be:
"resumo": [{ "type": "Schema.Types.ObjectId", "ref": "Partida" }]
With the external referenced collection now holding the object detail rather than embedding directly in the array. The use of .populate() with filtering and paging is:
Ranking.find().populate({
"path": "resumo",
"match": { "questio": "value" },
"options": { "skip": 0, "limit": 2 }
}).exec(function(err,docs) {
docs = docs.filter(function(doc) {
return docs.comments.length;
});
});
Of course the possible problem there is that you can no longer actually query for the documents that contain the "embedded" information as it is now in another collection. This results in pulling in all documents, though possibly by some other query condition, but then manually testing them to see if they were "populated" by the filtered query that was sent to retrieve those items.
So it really does depend on what you are doing and what your approach is. If you regularly intend to "search" on inner arrays then embedding will generally suit you better. Also if you really only interesting in "paging" then the $slice operator works well for this purpose with embedded documents. But beware growing embedded arrays too large.
Using a referenced schema with mongoose helps with some size concerns, and there is methodology in place to assist with "paging" results and filtering them as well. The drawback is that you can no longer query "inside" those elements from the parent itself. So parent selection by the inner elements is not well suited here. Also keep in mind that while not all of the data is embedded, there is still the reference to the _id value of the external document. So you can still end up with large arrays, which may not be desirable.
For anything large, consider that you will likely be doing the work yourself, and working backwards from the "child" items to then match the parent(s).

I am not sure that you can filter sub-document directly with mongoose. However you can get the parent document with Model.find({'resumo.quesito': 'THEVALUE'}) (you should also and an index on it)
Then when you have the parent you can get the child by comparing the quesito
Additionnal doc can be found here: http://mongoosejs.com/docs/subdocs.html

Related

MongoDB aggregation - combine multiple values of a document from collection A to lookup a single value within a document from collection B

Say I have the following two collections, sites and webpages. I'm trying to understand how to create an aggregation that'll allow me to combine values of a document from the sites collection and use that to lookup a value from the webpages collection. In addition, I need to prepend the combined values with a string.
// sites collection
[
{ "_id" : 3, "host" : "www.example-foo.com", "path": "/bar", "hasVisited": false },
]
// webpages collection
[
{ "_id" : 5, "url" : "https://www.example-foo.com/bar" },
{ "_id" : 8, "url" : "https://www.fizz.com/buzz" },
]
Without an aggregation I would do something like the following.
const site = await db.sites.findOne({ hasVisited: { $eq: false } });
const pages = await db.webpages.find({
url: `https://${site.host}${site.path}`, // <--- how to construct this in a lookup aggregation? string + value + value
});
// pages = [{ "_id" : 5, "url" : "https://www.example-foo.com/bar" }]
This is like translation of your code with the 2 find queries in 1 using $lookup
Query
first findOne is the $match and the $limit 1
$set url is to make the string concat
second find is to do the $lookup (with the 1 site from above stages)
*if you want to do it for more than 1 sites remove the limit, and project more fields, to know where this pages belong to(which site)
Test code here
db.sites.aggregate([
{
"$match": {
"hasVisited": {
"$eq": false
}
}
},
{
"$limit": 1
},
{
"$set": {
"url": {
"$concat": [
"https://",
"$host",
"$path"
]
}
}
},
{
"$lookup": {
"from": "webpages",
"localField": "url",
"foreignField": "url",
"as": "pages"
}
},
{
"$project": {
"_id": 0,
"pages": 1
}
}
])

Count and Sort On Array Intersection

I have this schema
module.exports = function(conn, mongoose) {
// var autoIncrement = require('mongoose-auto-increment');
var UsersSchema = new mongoose.Schema({
first_name: String,
last_name:String,
sex: String,
fk_hobbies: []
}
, {
timestamps: true
}, {collection: 'wt_users'});
return conn.model('wt_users', UsersSchema);
};
And for example I have these users in data base
{
"_id" : ObjectId("5aca2ac25c1d8adeb4a2dab0"),
first_name:"Pierro",
last_name:"pierre",
sex:"H",
fk_hobbies: [
{
"_id" : ObjectId("5ac9f84d5c1f8adeb4a2da97"),
"name" : "Art"
},
{
"_id" : ObjectId("5ac9f84d5c8d8adeb4a2da97"),
"name" : "Sport"
},
{
"_id" : ObjectId("5ac9f84d9c1d8adeb4a2da97"),
"name" : "Fete"
},
{
"_id" : ObjectId("5acaf84d5c1d8adeb4a2da97"),
"name" : "Série"
},
{
"_id" : ObjectId("6ac9f84d5c1d8adeb4a2da97"),
"name" : "Jeux vidéo"
}
]
},
{
"_id" : ObjectId("5ac9fa075c1d8adeb4a2da99"),
first_name:"jean",
last_name:"mark",
sex:"H",
fk_hobbies: [
{
"_id" : ObjectId("5ac7f84d5c1d8adeb4a2da97"),
"name" : "Musique"
},
{
"_id" : ObjectId("5ac9f24d5c1d8adeb4a2da97"),
"name" : "Chiller"
},
{
"_id" : ObjectId("5ac9f84c5c1d8adeb4a2da97"),
"name" : "Papoter"
},
{
"_id" : ObjectId("5ac9f84d2c1d8adeb4a2da97"),
"name" : "Manger"
},
{
"_id" : ObjectId("5ac9f84d5c1d8adeb4a2da97"),
"name" : "Film"
}
]
},
{
"_id" : ObjectId("5aca0a635c1d8adeb4a2da9d"),
first_name:"michael",
last_name:"ferrari",
sex:"H",
fk_hobbies: [
{
"_id" : ObjectId("5ac9f84d5c1d8adeb4a2ea97"),
"name" : "fashion"
},
{
"_id" : ObjectId("5ac9f84d5c1e8adeb4a2da97"),
"name" : "Voyage"
},
{
"_id" : ObjectId("5ac9f84c5c1d8adeb4a2da97"),
"name" : "Papoter"
},
{
"_id" : ObjectId("5ac9f84d2c1d8adeb4a2da97"),
"name" : "Manger"
},
{
"_id" : ObjectId("5ac9f84d5c1d8adeb4a2da97"),
"name" : "Film"
}
]
},
{
"_id" : ObjectId("5ac9fa074c1d8adeb4a2da99"),
first_name:"Philip",
last_name:"roi",
sex:"H",
fk_hobbies:
[
{
"_id" : ObjectId("5ac7f84d5c1d8adeb4a2da97"),
"name" : "Musique"
},
{
"_id" : ObjectId("5ac9f24d5c1d8adeb4a2da97"),
"name" : "Chiller"
},
{
"_id" : ObjectId("5ac9f84c5c1d8adeb4a2da97"),
"name" : "Papoter"
},
{
"_id" : ObjectId("5ac9f84d2c1d8adeb4a2da97"),
"name" : "Manger"
},
{
"_id" : ObjectId("5ac9f84d5c1d8adeb4a2da97"),
"name" : "Film"
}
]
}
I want to create a mongoose query that match user getted by id, with others users in database according this :
the query will return firstly the users that have the max number of the same hobbies, that is 5, then the users that have the same 4 hobbies ...
I create a solution fully Javascipt / node js, Is there any query with mongo ?
this is my solution
//var user : the current user that search other similar users : jean mark : 5ac9fa075c1d8adeb4a2da99
//var users : all other users
var tab = []
async.each(users, function(item, next1){
var j = 0;
var hobbies = item["fk_hobbies"]
for(var i = 0; i < 5; i++)
{
var index = hobbies.findIndex(x => x["_id"] == user[0]["fk_hobbies"][i]["_id"].toString());
if(index != -1)
j++
}
if(j != 0)
tab.push({nbHob:j, user:item})
next1()
}, function ()
{
var tab2 = tab.sort(compare)
res.json({success:true, data:tab2})
})
function compare(a,b) {
if (a.nbHob > b.nbHob)
return -1;
if (a.nbHob < b.nbHob)
return 1;
return 0;
}
the displayed result is like this
nbHob : represents the number of similar hobbies
{"success":true,"data":[{"nbHob":5,"user":{"_id":"5ac9fa074c1d8adeb4a2da99","u_first_name":"Akram","u_last_name":"Cherif","u_email":"","u_login":"","u_password":"","u_user_type":0,"u_date_of_birth":"","u_civility":0,"u_sex":"H","u_phone_number":"","u_facebook_id":"","u_google_id":"","u_twitter_id":"","u_profile_image":"","u_about":"","u_profession":"","u_fk_additional_infos":[null],"u_budget":0,"u_address":{"country":"France","state":"Paris","city":"TM","zip":76001},"u_fk_hobbies":[{"name":"Musique","_id":"5ac7f84d5c1d8adeb4a2da97"},{"name":"Chiller","_id":"5ac9f24d5c1d8adeb4a2da97"},{"name":"Papoter","_id":"5ac9f84c5c1d8adeb4a2da97"},{"name":"Manger","_id":"5ac9f84d2c1d8adeb4a2da97"},{"name":"Film","_id":"5ac9f84d5c1d8adeb4a2da97"}]}},{"nbHob":3,"user":{"_id":"5aca0a635c1d8adeb4a2da9d","u_first_name":"Chawki","u_last_name":"Gasmi","u_email":"","u_login":"","u_password":"","u_user_type":0,"u_date_of_birth":"","u_civility":0,"u_sex":"H","u_phone_number":"","u_facebook_id":"","u_google_id":"","u_twitter_id":"","u_profile_image":"","u_about":"","u_profession":"","u_fk_additional_infos":[null],"u_budget":{"min":500,"max":850},"u_address":{"country":"","state":"","city":"","zip":0},"u_fk_hobbies":[{"name":"fashion","_id":"5ac9f84d5c1d8adeb4a2ea97"},{"name":"Voyage","_id":"5ac9f84d5c1e8adeb4a2da97"},{"name":"Papoter","_id":"5ac9f84c5c1d8adeb4a2da97"},{"name":"Manger","_id":"5ac9f84d2c1d8adeb4a2da97"},{"name":"Film","_id":"5ac9f84d5c1d8adeb4a2da97"}]}}]}
Your question data seems a bit messed up due to probably far to liberal copy/paste since every hobby has the same ObjectId value. But I can correct that with a full self contained example:
const { Schema } = mongoose = require('mongoose');
const uri = 'mongodb://localhost/people';
mongoose.Promise = global.Promise;
mongoose.set('debug', true);
const hobbySchema = new Schema({
name: String
});
const userSchema = new Schema({
first_name: String,
last_name: String,
sex: String,
fk_hobbies: [hobbySchema]
});
const Hobby = mongoose.model('Hobby', hobbySchema)
const User = mongoose.model('User', userSchema);
const userData = [
{
"first_name" : "Pierro",
"last_name" : "pierre",
"sex" : "H",
"fk_hobbies" : [
"Art", "Sport", "Fete", "Série", "Jeux vidéo"
]
},
{
"first_name": "jean",
"last_name" : "mark",
"sex" : "H",
"fk_hobbies" : [
"Musique", "Chiller", "Papoter", "Manger", "Film"
]
},
{
"first_name" : "michael",
"last_name" : "ferrari",
"sex" : "H",
"fk_hobbies" : [
"fashion", "Voyage", "Papoter", "Manger", "Film"
]
},
{
"first_name" : "Philip",
"last_name" : "roi",
"sex" : "H",
"fk_hobbies" : [
"Musique", "Chiller", "Papoter", "Manger", "Film"
]
}
];
const log = data => console.log(JSON.stringify(data, undefined, 2));
(async function() {
try {
const conn = await mongoose.connect(uri);
await Promise.all(
Object.entries(conn.models).map(([k,m]) => m.remove())
);
const hobbies = await Hobby.insertMany(
[
...userData
.reduce((o, u) => [ ...o, ...u.fk_hobbies ], [])
.reduce((o, u) => o.set(u,1) , new Map())
]
.map(([name,v]) => ({ name }))
);
const users = await User.insertMany(userData.map(u =>
({
...u,
fk_hobbies: u.fk_hobbies.map(f => hobbies.find(h => f === h.name))
})
));
let user = await User.findOne({
"first_name" : "Philip",
"last_name" : "roi"
});
let user_hobbies = user.fk_hobbies.map(h => h._id );
let result = await User.aggregate([
{ "$match": {
"_id": { "$ne": user._id },
"fk_hobbies._id": { "$in": user_hobbies }
}},
{ "$addFields": {
"numHobbies": {
"$size": {
"$setIntersection": [
"$fk_hobbies._id",
user_hobbies
]
}
},
"fk_hobbies": {
"$map": {
"input": "$fk_hobbies",
"in": {
"$mergeObjects": [
"$$this",
{
"shared": {
"$cond": {
"if": { "$in": [ "$$this._id", user_hobbies ] },
"then": true,
"else": "$$REMOVE"
}
}
}
]
}
}
}
}},
{ "$sort": { "numHobbies": -1 } }
]);
log(result);
mongoose.disconnect();
} catch(e) {
} finally {
process.exit();
}
})()
Most of that is just "setup" to re-create the data set, but simply put we're just adding the users and their hobbies and keeping a "unique" identifier for each "unique hobby" by name. This is probably what you actually meant in the question, and it's the sort of model you should be following.
The interesting part is all in the .aggregate() statement, which is how we "query" then "count" the matching hobbies and enable the "server" to sort the results before returning to the client.
Given a current user ( and the last one in the list you included has the most interesting matches ), we then focus on this section of the code:
// Simulates getting the current user to compare against
let user = await User.findOne({
"first_name" : "Philip",
"last_name" : "roi"
});
// Just get the list of _id values from the current user for reference
let user_hobbies = user.fk_hobbies.map(h => h._id );
let result = await User.aggregate([
// Find all users not the current user with at least one of the hobbies
{ "$match": {
"_id": { "$ne": user._id },
"fk_hobbies._id": { "$in": user_hobbies }
}},
// Add the count of matches, "optionally" we are marking the matched
// hobbies in the array as well.
{ "$addFields": {
"numHobbies": {
"$size": {
"$setIntersection": [
"$fk_hobbies._id",
user_hobbies
]
}
},
"fk_hobbies": {
"$map": {
"input": "$fk_hobbies",
"in": {
"$mergeObjects": [
"$$this",
{
"shared": {
"$cond": {
"if": { "$in": [ "$$this._id", user_hobbies ] },
"then": true,
"else": "$$REMOVE"
}
}
}
]
}
}
}
}},
// Sort the results by the "most" hobbies, which is "descending" order
{ "$sort": { "numHobbies": -1 } }
]);
I've commented those steps for you but let's expand on that.
Firstly we presume you have the current user already returned from the database by whatever means you have already done. For the purposes of the rest of the operations, all your really need from that user is the _id of the "User" itself and of course the _id values from each of that user's chosen hobbies. We can do a quick .map() operation as it shown here, but we keep a copy for ease of reference and not repeating that through the remaining code.
Then we get to the actual aggregate statement. The first condition there is the $match, this works like a standard query expression with all the same operators. We want two things from these query conditions:
Get all users except the current user for consideration;
AND where those users contain at least one match on the same hobbies, by _id value.
So the condition for "everyone else" is essentially to supply the $ne "not equal to" operator in argument to the _id value, comparing of course to the current user _id. The second condition to get only those with the same hobbies uses the $in operator against the _id field of the fk_hobbies array. In MongoDB query parlance we denote this as "$fk_hobbies._id" in order to match against the "inner" _id property values.
The $in operator itself takes a "list" as it's argument and compares each value in the list supplied to the property the condition is assigned to. MongoDB itself does not care that fk_hobbies is an array or a single value, and will simply look for an match for anything in the provided list. Think of $in as a short way of writing $or, except you don't need to explicitly include the same property name on every condition.
Now you have the correct documents selected and have discarded any users who do not share any of the same hobbies we can move on to the next stage. Note also that the whole $match considers it logical that you only want those "matching" users. If you actually wanted to see "all users" including those with "no matches", then you can simply omit the whole $match pipeline stage. Your code is discarding anything that was not counted, so this code simply doesn't bother to count anything which "must" have a 0 count.
The $addFields stage pipeline stage is a quick way to "add new fields" to the document returned in results. The main output you want here is the "numHobbies" in addition to the other user details, so this pipeline stage operator is the optimal way to do this, but if you're MongoDB server is a bit older then you can simply specify "all" fields you want to include in addition to any new ones using $project instead.
In order to "count" the number of hobbies in common we essentially use two aggregation operators, which are $setIntersection and $size. Both of these should be available in an MongoDB version you really should be using in production.
In respective order the $setIntersection operator "compares sets" which is in this case the list of _id values within fk_hobbies, both from the current selected user we stored earlier and from the present document being considered in the expression. The result from this operator is the list of values which are the "same" between both lists.
Naturally the $size operator looks at the returned list ( or set ) from $setIntersection and returns the number of entries in that list. This of course is the "matched count".
The next part involves projecting a "re-written" form of the fk_hobbies array. This is totally optional and by my own design for demonstration purposes. "If" you wanted to do what I am doing here as well, then what this bit of code does is adds an additional property to the objects of the fk_hobbies array to indicate where that particular hobby was one of those which matched the list.
I'm saying this is "optional" because I'm actually demonstrating two features available for MongoDB 3.6 only. These involve the usage of $mergeObjects on the inner array elements and the usage of Conditionally Exlcuding Fields.
Stepping through that, since fk_hobbies is an array we need to use the $map operator in order to "reshape" the objects inside it. This operator allows us to process each array member and return a new value based on the transformations we include as it's argument. It's usage is much the same as .map() for JavaScript or any other language which implements a similar operation.
Therefore for each object in the array ( $$this ) we apply the $mergeObjects operator which will "merge" the result of it's arguments. These are provided as the $$this for the current object as it already is, and the second argument in the expression which is doing something new and interesting.
Here we use the $cond operator, which is a "ternary" operator ( or if..then..else expression ) which considers a condition if and then returns either the then argument where that expression was true, or the else expression where it was false. The expression here is another form of $in used as an aggregation expression. In this form the first argument is a singular value $$this._id which will be compared to a list expression in the second argument. That second argument is of course the list of the current user hobby id's we kept earlier, and are using again for comparison.
That usage of $in alone would return either true or false where it was a match. But the extra demonstrated action here is that within the $cond expresion, our else condition for false returns the new and special $$REMOVE value. What this means is that with our "shared" property we are adding to each object in the array, rather than assigning it a value of false where there was no match, we actually don't include that property in the output document at all.
That "optional" part is really just there as a "nice touch" to indicate which "hobbies" were matched in the conditions, rather than simply returning the count. If you like it then use it, and if you don't have MongoDB 3.6 with those features you can simply do that same alteration in the returned documents from the aggregation output anyway:
let result = await User.aggregate([
{ "$match": {
"_id": { "$ne": user._id },
"fk_hobbies._id": { "$in": user_hobbies }
}},
{ "$addFields": {
"numHobbies": {
"$size": {
"$setIntersection": [
"$fk_hobbies._id",
user_hobbies
]
}
}
}},
{ "$sort": { "numHobbies": -1 } }
]);
// map each result after return
result = result.map(r =>
({
...r,
fk_hobbies: r.fk_hobbies.map(h =>
({
...h,
...(( user_hobbies.map(i => i.toString() ).indexOf( h._id.toString() ) != -1 )
? { "shared": true } : {} )
})
)
})
)
Either way, the main thing you wanted out of any $addFields or $project statement was the actual "numHobbies" value indicating the count. And the main reason we did that on the server was so that we can also $sort on the server, which would in turn allow you to add things like $limit and $skip to larger result sets for purposes of paging where it simply would not be practical to get all the results from the collection, even if they were filtered in the initial match or regular query.
Anyhow, from the small sample of documents in the question as also generated in the sample listing, we get a result like this:
[
{
"_id": "5ad6bbe63365bc3428feed8a",
"first_name": "jean",
"last_name": "mark",
"sex": "H",
"fk_hobbies": [
{
"_id": "5ad6bbe63365bc3428feed7d",
"name": "Musique",
"__v": 0,
"shared": true
},
{
"_id": "5ad6bbe63365bc3428feed7e",
"name": "Chiller",
"__v": 0,
"shared": true
},
{
"_id": "5ad6bbe63365bc3428feed7f",
"name": "Papoter",
"__v": 0,
"shared": true
},
{
"_id": "5ad6bbe63365bc3428feed80",
"name": "Manger",
"__v": 0,
"shared": true
},
{
"_id": "5ad6bbe63365bc3428feed81",
"name": "Film",
"__v": 0,
"shared": true
}
],
"__v": 0,
"numHobbies": 5
},
{
"_id": "5ad6bbe63365bc3428feed90",
"first_name": "michael",
"last_name": "ferrari",
"sex": "H",
"fk_hobbies": [
{
"_id": "5ad6bbe63365bc3428feed82",
"name": "fashion",
"__v": 0
},
{
"_id": "5ad6bbe63365bc3428feed83",
"name": "Voyage",
"__v": 0
},
{
"_id": "5ad6bbe63365bc3428feed7f",
"name": "Papoter",
"__v": 0,
"shared": true
},
{
"_id": "5ad6bbe63365bc3428feed80",
"name": "Manger",
"__v": 0,
"shared": true
},
{
"_id": "5ad6bbe63365bc3428feed81",
"name": "Film",
"__v": 0,
"shared": true
}
],
"__v": 0,
"numHobbies": 3
}
]
So there are two users that were returned and we counted the matching hobbies as 5 and 3 respectively and returned the one with the most matched first. You can also see the addition of the "shared" property on each of the matched hobbies to indicate which of the hobbies in each of the returned users lists were also shared with the original user they were compared with.
NOTE: You were probably just "trying things" but your usage of async.each() in your question was not really necessary since none of the inner code is actually "async" itself. Even in the listing here, the only thing you actually need to "await" as an async call after you have the current user to compare is the .aggregate() response itself.
So if at any part of this you were presuming you would be "awaiting requests within a loop", then you were mistaken. Simply ask the database for the results and await their return.
One request to the database is all that is required.
N.B It's also 2018, so you really should start to understand Promises and usage of async/await with them. The code is much cleaner that way and surely any newly developed application should be running in an environment with this support. So "callback helper" libraries like "node async", are a little "old hat" and outmoded in a modern context.

Count keys within array elements

Hi I want to return all the collections that have arrays count lesser than 2 inside features array in mongodb. I tried using $size but it is not possible.
I don't want to get the result and loop each of the features and count it. I want to return the productId 123 because it has a count of 1 in one of features array. Please take below document as an example:
{
"productId" : 123.0,
"features" : [
{
"a" : true
},
{
"a" : true,
"b" : true
}
]
},
{
"productId" : 456.0,
"features" : [
{
"a" : true,
"b" : true
},
{
"a" : true,
"b" : true
}
]
}
What you are actually asking for is matching on the "count of the number of keys" within the array elements. You have different approaches to this depending on the available MongoDB version.
MongoDB 3.4.4 and upwards
You can use $objectToArray to coerce each element into an "array" itself, representing the "key/value" pairs of the elements:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$anyElementTrue": {
"$map": {
"input": "$features",
"as": "f",
"in": {
"$lt": [
{ "$size": { "$objectToArray": "$$f" } },
2
]
}
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
You basically feed the condition with $redact which determines that for the results of $map where the $objectToArray is applied to each element and then tested for the $size, where any of the tested array elements returned true via $anyElementTrue.
All Other Versions
Anywhere else, looking somewhat more brief but actually not as performance effective is using $where to apply a JavaScript expression to test the array elements. Same principle though using Object.keys() and Array.some():
db.collection.find({
"$where": function() {
return this.features.some(f => Object.keys(f).length < 2 )
}
})
Same deal but since the JavaScript requires interpretation and evaluation against every document, it actually runs quite a bit slower that the aggregation expression given.
Both return the same document, which is the one which has an element with "less than two keys" in the inner object, just as asked:
/* 1 */
{
"productId" : 123.0,
"features" : [
{
"a" : true
},
{
"a" : true,
"b" : true
}
]
}

Ordering by count of filtered subdocument array elements

I currently have a MongoDB collection that looks like so:
{
{
"_id": ObjectId,
"user_id": Number,
"updates": [
{
"_id": ObjectId,
"mode": Number,
"score": Number
},
{
"_id": ObjectId,
"mode": Number,
"score": Number
},
{
"_id": ObjectId,
"mode": Number,
"score": Number
}
]
}
}
I am looking to find a way to find the users with the largest number of updates per mode. For instance, if I specify mode 0, I want it to load the users in order of greatest number of updates with mode: 0.
Is this possible in MongoDB? It does not need to be a fast algorithm, as it will be cached for quite a while, and it will run asynchronously.
The fastest way would be to store a count for each "mode" within the document as another field, then you could just sort on that:
var update = {
"$push": { "updates": updateDoc },
};
var countDoc = {};
countDoc["counts." + updateDoc.mode] = 1;
update["$inc"] = countDoc;
Model.update(
{ "_id": id },
update,
function(err,numAffected) {
}
);
Which would use $inc to increment a "counts" field for each "mode" value as a key for each "mode" pushed to the "updates" array. All the calculation happens on update, so it's fast and so is the query that can be applied with a sort on that value:
Model.find({ "updates.mode": 0 }).sort({ "counts.0": -1 }).exec(function(err,users) {
});
If you don't want to or cannot store such a field then the other option is to calculate at query time with .aggregate():
Model.aggregate(
[
{ "$match": { "updates.mode": 0 } },
{ "$project": {
"user_id": 1,
"updates": 1,
"count": {
"$size": {
"$setDifference": [
{ "$map": {
"input": "$updates",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.mode", 0 ] },
"$$el",
false
]
}
}},
[false]
]
}
}
}},
{ "$sort": { "count": -1 } }
],
function(err,results) {
}
);
Which isn't bad since the filtering of the array and getting the $size is fairly effecient, but it's not as fast as just using a stored value.
The $map operator allows inline processing of the array elements which are tested by $cond to see if it returns a match or false. Then $setDifference removes any false values. A much better way to filter array content than using $unwind, which can slow things down significantly and should not be used unless your intent to to aggregate array content across documents.
But the better approach is to store the value for the count instead, since this does not require runtime calculation and can even use an index
I think this is a duplicate of this question:
Mongo find query for longest arrays inside object
The accepted answer seem to be doing exactly what you ask for.
db.collection.aggregate( [
{ $unwind : "$l" },
{ $group : { _id : "$_id", len : { $sum : 1 } } },
{ $sort : { len : -1 } },
{ $limit : 25 }
] )
just replace "$l" with "$updates".
[edit:] and you probably do not want the result limited to 25, so you should also get rid of the { $limit : 25 }

Aggregation output to Nest Arrays

I have a dataset of records stored in mongodb and i have been trying to extract a complex set of data from the records.
Sample records are as follows :-
{
bookId : '135wfkjdbv',
type : 'a',
store : 'crossword',
shelf : 'A1'
}
{
bookId : '13erjfn',
type : 'b',
store : 'crossword',
shelf : 'A2'
}
I have been trying to extract data such that for each bookId, i get a count (of records) for each shelf per store name that holds the book identified by bookId where the type of the book is 'a'.
I understand that the aggregation query allows a pipeline that allows grouping, matching etc, but I have not been able to reach a solution.
The desired output is of the form :-
{
bookId : '135wfkjdbv',
stores : [
{
name : 'crossword'
shelves : [
{
name : 'A1',
count : 12
},
]
},
{
name : 'granth'
shelves : [
{
name : 'C2',
count : 12
},
{
name : 'C4',
count : 12
},
]
}
]
}
The process isn't really that difficult when you look at at. The aggregation "pipeline" is exactly that, where each "stage" feeds a result into the next for processing. Just like unix "pipe":
ps -ef | grep mongo | tee out.txt
So it's just adding stages, and in fact three $group stages where the first does the basic aggregation and the remaining two simply "roll up" the arrays required in the output.
db.collection.aggregate([
{ "$group": {
"_id": {
"bookId": "$bookId",
"store": "$store",
"shelf": "$shelf"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": {
"bookId": "$_id.bookId",
"store": "$_id.store"
},
"shelves": {
"$push": {
"name": "$_id.shelf",
"count": "$count"
}
}
}},
{ "$group": {
"_id": "$_id.bookId",
"stores": {
"$push": {
"name": "$_id.store",
"shelves": "$shelves"
}
}
}}
])
You could possibly $project at the end to change the _id to bookId, but you should already know that is what it is and get used to treating _id as a primary key. There is a cost to such operations, so it is a habit you should not get into and learn doing things correctly from the start.
So all that really happens here is all the fields that would make up the grouping detail are made the primary key of $group with the other field being produced as count, to count the shelves within that grouping. Think the SQL equivalent:
GROUP BY bookId, store, shelf
All each other stage does is transpose each grouping level into array entries, first by shelf within the store and then the store within the bookId. Each time the fields in the primary grouping key are reduced down by the content going into the produced array.
When you start thinking in terms of "pipeline" processing, then it becomes clear. As you construct one form, then take that output and move it to the next form and so on. This is basically how you fold the results within two arrays.

Categories

Resources