Get count of siblings in subdocument with mongodb aggregate query - javascript

I have a document collection with a subdocument of tags.
{
title:"my title",
slug:"my-title",
tags:[
{tagname:'tag1', id:1},
{tagname:'tag2', id:2},
{tagname:'tag3', id:3}]
}
{
title:"my title2",
slug:"my-title2",
tags:[
{tagname:'tag1', id:1},
{tagname:'tag2', id:2}]
}
{
title:"my title3",
slug:"my-title3",
tags:[
{tagname:'tag1', id:1},
{tagname:'tag3', id:3}]
}
{
title:"my title4",
slug:"my-title4",
tags:[
{tagname:'tag1', id:1},
{tagname:'tag2', id:2},
{tagname:'tag3', id:3}]
}
[...]
Getting a count of every tag is quite simple with an $unwind + group count aggregate
However, I would like to find a count of which tags are found together, or more precisely, which sibling shows up most often beside one another, ordered by count. I have not found an example nor can I figure out how to do this without multiple queries.
Ideally the end result would be:
{'tag1':{
'tag2':3, // tag1 and tag2 were found in a document together 3 times
'tag3':3, // tag1 and tag3 were found in a document together 3 times
[...]}}
{'tag2':{
'tag1':3, // tag2 and tag1 were found in a document together 3 times
'tag3':2, // tag2 and tag3 were found in a document together 2 times
[...]}}
{'tag3':{
'tag1':3, // tag3 and tag1 were found in a document together 3 times
'tag2':2, // tag3 and tag2 were found in a document together 2 times
[...]}}
[...]

As stated earlier it just simply is not possible to have the aggregation framework generate arbitrary key names from data. It's also not possible to do this kind of analysis in a single query.
But there is a general approach to doing this over your whole collection for an undetermined number of tag names. Essentially you are going to need to get a distinct list of the "tags" and process another query for each distinct value to get the "siblings" to that tag and the counts.
In general:
// Get a the unique tags
db.collection.aggregate([
{ "$unwind": "$tags" },
{ "$group": {
"_id": "$tags.tagname"
}}
]).forEach(function(tag) {
var tagDoc = { };
tagDoc[tag._id] = {};
// Get the siblings count for that tag
db.collection.aggregate([
{ "$match": { "tags.tagname": tag._id } },
{ "$unwind": "$tags" },
{ "$match": { "tags.tagname": { "$ne": tag._id } } },
{ "$group": {
"_id": "$tags.tagname",
"count": { "$sum": 1 }
}}
]).forEach(function(sibling) {
// Set the value in the master document
tagDoc[tag._id][sibling._id] = sibling.count;
});
// Just emitting for example purposes in some way
printjson(tagDoc);
});
The aggregation framework can return a cursor in releases since MongoDB 2.6, so even with a large number of tags this can work in an efficient way.
So that's the way you would handle this, but there really is no way to have this happen in a single query. For a shorter run time you might look at frameworks that allow many queries to be run in parallel either combining the results or emitting to a stream.

Related

mongoose find() sorting and organizing returned results from a product database in js

I have a problem with organizing my mongoDB data to send to my page in my res and cant figure out how to do the correct js. Here is a simplified version of my schema
var productSchema = new mongoose.Schema({
medium: String,
brand: String,
group: String
});
Here is what a typical entry looks like
medium :"Acrylic",
brand :"liquitex",
group :"heavy body"
there are many more entries in the schema, but these are the only ones I need to be able to sort and organize the returned results with. The problem is I have a route that returns all colors in my database and I want to be able to display them in sections on my page that are grouped under Brand, and then has the individual colors listed under the correct group.
The problem is there are paints from other brands that fall into the heavy body group and so when I use a filter function to sort my data by group, some brands get mixed together. I cant filter by brand, because some brands have acrylic and watercolor so then those get lumped together.
I need some way to filter the returned results of a
mongoose.find({})
that can use the group data as a filter, but then filter those results by the brands so they get separated into the correct brand categories.
I have this so far:
this is all a stripped down version of my app.js file:
//finds all colors in the DB
Color.find({}).lean().exec(function( err, colors)
var groups = [];
// find all groups in the databse
colors.forEach( function(color){
groups.push(color["group"]);
});
//returns only unique names to filter out duplicates
var groupTypes = Array.from(new Set(groups));
var tempVariableBrands = [];
// this sorts all returned paints into their respective group, but we get paints from multiple brands under the same group and that is not good
groupTypes.forEach( function(group){
var name = group;
var result = colors.filter(obj => { return obj.group === group });
tempVariable.push( {group : name, result } );
});
// the tempVariable gets sent to my page like so
res.render("landing", {colorEntry:tempVariable} );
and this works fine to allow me to display each paint by its grouping, but that fails when there is more than one paint from a different manufacturer that is considered the same group like a "heavy body". This is my ejs on my page that works fine:
<% colorEntry.forEach( function(entry){ %>
<div class="brandBlock">
<div class="brandTitle">
<span><%=entry.result[0].brand%> - <%=entry.result[0].group%></span>
I for the life of me cant seem to figure out the combination of filter() and maybe map() that would allow this kind of processing to be done.
My database has like 600 documents, colors from a number of different manufacturers and I don't know how to get this as a returned structure: lets say this is a few colors in the DB that get returned from a mongoose find:
[{ medium: "Oil",
brand: "Gamblin",
group: "Artists oil colors"},
{ medium: "Acrylic",
brand: "Liquitex",
group: "Heavy Body"},
{ medium: "Acrylic",
brand: "Golden",
group: "Heavy Body"}
]
i need to organize it like this or something similar. It can be anything that just sorts this data into a basic structure like this, I am not confined to any set standard or anything, this is just for personal use and a site I am trying to build to learn more.
returnedColors = [ { brand: "Gamblin", group: "Artists oil colors", { 50 paints colors returned} },
{ brand: "liquitex" , group: "heavy body", { 20 paint colors returned } },
{ brand: "golden" , group: "heavy body",{ 60 paint colors returned} }
];
I am not a web developer and only write some web code every 6 months or so and have been trying how to figure this out for the last 2 days. I can't wrap my head around some of the awesome filter and map combo's i have seen and cant get this to work.
Any help or advice would be great. I am sure there are many areas for improvement in this code, but everything was working up until I entered paints that were from different brands that had the same group type and i had to try to rewrite this sorting code to deal with it.
It boils down to needing to be able to iterate over the entire set of returned documents from the DB and then sort them based off 2 values.
UPDATE:
I was able to get something that works and returns the data in the format that I need to be able to send it to my ejs file and display it properly. The code is rather ugly and probably very redundant, but it technically works. It starts off by using the group value to run over paints since each set of paints will have a group name, but can sometimes share a group name with a paint from another brand like "heavy body".
groupTypes.forEach( function(group){
var name = group;
var result = colors.filter(obj => { return obj.group === group });
// this gets brand names per iteration of this loop so that we will know if more than one brand of paint
// has the same group identity.
var brands = [];
result.forEach( function(color){
brands.push(color["brand"]);
});
// This filters the brand names down to a unique list of brands
var brandNames = Array.from(new Set(brands));
// if there is more than one brand, we need to filter this into two separate groups
if( brandNames.length > 1){
//console.log("You have duplicates");
brandNames.forEach( x => {
var tmpResult = [...result];
var resultTmp = result.filter(obj => { return obj.brand === x });
result = resultTmp;
//console.log("FILTERED RESULT IS: ", result);
tempVariable.push( {brand: x ,group : name, result } );
result = [...tmpResult];
});
}else{
tempVariable.push( {brand: result[0].brand ,group : name, result } );
}
});
if anyone can reduce this to something more efficient, I would love to see the "better" way or "right" way of doing something like this.
UPDATE2
Thanks to the answer below, I was put on the right track and was able to rewrite a bunch of that long code with this:
Color.aggregate([
{
$sort: { name: 1}
},
{
$group: {
_id: { brand: '$brand', group: '$group' },
result: { $push: '$$ROOT' }
}
},
{ $sort: { '_id.brand': 1 } }
], function( err, colors){
if(err){
console.log(err);
}else{
res.render("landing", {colorEntry:colors, isSearch:1, codes: userCodes, currentUser: req.user, ads: vs.randomAds()} );
}
});
Much cleaner and appears to achieve the same result.
Since you're using MongoDB, "right" way is to utilize an Aggregation framework, precisely, $group stage.
Product.aggregate([{
$group: {
_id: { group: '$group', brand: '$brand' },
products: { $push: '$$ROOT' }
}
}])
This will output array of objects containing every combination of brand and group, and push all relevant products to corresponding subarray.
Combine it with $project and $sort stages to shape your data further.

mongoose mongodb - remove all where condition is true except one

If a collection have a list of dogs, and there is duplicate entries on some races. How do i remove all, but a single specific/non specific one, from just one query?
I guess it would be possible to get all from a Model.find(), loop through every index except the first one and call Model.remove(), but I would rather have the database handle the logic through the query. How would this be possible?
pseudocode example of what i want:
Model.remove({race:"pitbull"}).where(notFirstOne);
To remove all but one, you need a way to get all the filtered documents, group them by the identifier, create a list of ids for the group and remove a single id from
this list. Armed with this info, you can then run another operation to remove the documents with those ids. Essentially you will be running two queries.
The first query is an aggregate operation that aims to get the list of ids with the potentially nuking documents:
(async () => {
// Get the duplicate entries minus 1
const [doc, ...rest] = await Module.aggregate([
{ '$match': { 'race': 'pitbull'} },
{ '$group': {
'_id': '$race',
'ids': { '$push': '$_id' },
'id': { '$first': '$_id' }
} },
{ '$project': { 'idsToRemove': { '$setDifference': [ ['$id'], '$ids' ] } } }
]);
const { idsToRemove } = doc;
// Remove the duplicate documents
Module.remove({ '_id': { '$in': idsToRemove } })
})();
if purpose is to keep only one, in case of concurrent writes, may as well just write
Module.findOne({race:'pitbull'}).select('_id')
//bla
Module.remove({race:'pitbull', _id:{$ne:idReturned}})
If it is to keep the very first one, mongodb does not guarantee results will be sorted by increasing _id (natural order refers to disk)
see Does default find() implicitly sort by _id?
so instead
Module.find({race:'pitbull'}).sort({_id:1}).limit(1)

Trying to merge two collections together in meteor

I have two collections in my application that are parsed from two separate json files. I have inserted data from the two files into separate collections. The collections have corresponding numerical ID's and I want to match them up in a new collection. For example: the postmeta collection has a post_id value and the posts collection has a corresponding ID.
To explain this further here is a simple collections example. One thing to note is that there are over 730 collection posts and although there are matching ID's they are not sorted so when I view them they don't match each other.
The posts collection example:
{
"_id": "kTeQxenYZcQfPiaYv",
"ID": "44",
"post_content": "Today we talked about the letter Hh..."
}
The postsmeta collection example:
{
"_id": "otEGQYxvv6MkCABST",
"post_id": "44",
"meta_value": "http://www.mrskitson.ca/wp-content/uploads/2010/11/snackTime.jpg"
}
What I would like to do is parse through the collections and take for example posts collection where the ID matches the postsmeta collection. Once I find a match I want to insert the collections content (post_content & meta_value) into a new collection.
Here is all my code so far.
lib/collections/posts.js
Postsmeta = new Mongo.Collection('postsmeta');
Posts = new Mongo.Collection('posts');
server/publications.js
Meteor.publish('postsmeta', function() {
return Postsmeta.find();
});
Meteor.publish('posts', function() {
return Posts.find();
});
server/main.js
Meteor.startup(() => {
var postsmeta = JSON.parse(Assets.getText('postsmeta.json'));
var posts = JSON.parse(Assets.getText('posts.json'));
var length = postsmeta.length;
for(x=0; x < length; x++){
Posts.insert({
ID: posts[x].ID,
post_content: posts[x].post_content
});
Postsmeta.insert({
post_id: postsmeta[x].post_id,
meta_value: postsmeta[x].meta_value
});
}
});
Let's refactor your code a bit. We'll build the Postsmeta collection first and then jointly create the Posts and PostsCombined collections. Since Postsmeta will already exist we can just search inside it to find matching documents.
Meteor.startup(() => {
const postsmeta = JSON.parse(Assets.getText('postsmeta.json'));
postsmeta.forEach(doc => {
Postsmeta.insert({ post_id: doc.post_id, meta_value: doc.meta_value });
});
const posts = JSON.parse(Assets.getText('posts.json'));
posts.forEach(doc => {
const post = { ID: doc.ID, post_content: doc.post_content}
Posts.insert(post); // omit if you don't need the uncombined collection
const metadoc = Postsmeta.findOne({post_id: doc.ID}); // essentially a JOIN
if (metadoc) post.meta_value = metadoc.meta_value; // guard against no matching meta
PostsCombined.insert(post);
});
});
The following IDs are not present in your postsmeta data:
["56", "322", "521", "563", "583", "608", "625", "671", "707", "708",
"711", "713", "754", "758", "930", "1068", "1126", "1235", "1237", "1238",
"1239", "1246", "1249", "1256", "1263", "1355", "1375", "1678", "1680", "1763",
"1956", "2107", "2121", "2148", "2197", "2249"]
Do you want to put the collections together for consultation? because the insertion is correct for two different collections.
Tip one
If it is for query use the "find().map()", if you are using mongodb, within the function it will return the values ​​of each row of the first collection and soon you can call the other collection and check the id of the collection and return a JSON or Array of what you need. I do not pretend to do it that way, but it's a way of putting the two collections together.
Best solution
The correct way is not thinking as if noSql was a relational database like the other postgres, mysql and etc ... think that it is a dynamic bank, where in the same collection you can have everything you need at that moment, so I think You create a new collection that would be the junction of the two, when save saves the data in this other collection, which would be the query collection, and in that it would weigh less the query and until it would return the data faster, but suppose a 5x more faster than the above example ...
I hope I have helped, any questions or doubts I will be here. Hugs!

Mongoose/MongoDB: $in and .sort()

I hit an API which follows 50 members' data in a game once a day, and use mongoose to convert the JSON into individual documents in a collection. Between days there is data which is consistent, for example each member's tag (an id for the member in game), but there is data which is different (different scores etc.). Each document has a createdAt property.
I would like to find the most recent document for each member, and thus have an array with each member's tag.
I an currently using the following query to find all documents where tags match, however they are returning all documents, not just one. How do I sort/limit the documents to the most recent one, whilst keep it as one query (or is there a more "mongodb way")?
memberTags = [1,2,3,4,5];
ClanMember.find({
'tag': {
$in: memberTags
}
}).lean().exec(function(err, members) {
res.json(members);
});
Thanks
You can query via the aggregation framework. Your query would involve a pipeline that has stages that process the input documents to give you the desired result. In your case, the pipeline would have a $match phase which acts as a query for the initial filter. $match uses standard MongoDB queries thus you can still query using $in.
The next step would be to sort those filtered documents by the createdAt field. This is done using the $sort operator.
The preceding pipeline stage involves aggregating the ordered documents to return the top document for each group. The $group operator together with the $first accumulator are the operators which make this possible.
Putting this altogether you can run the following aggregate operation to get your desired result:
memberTags = [1,2,3,4,5];
ClanMember.aggregate([
{ "$match": { "tag": { "$in": memberTags } } },
{ "$sort": { "tag": 1, "createdAt: -1 " } },
{
"$group": {
"_id": "$tag",
"createdAt": { "$first": "$createdAt" } /*,
include other necessary fields as appropriate
using the $first operator e.g.
"otherField1": { "$first": "$otherField1" },
"otherField2": { "$first": "$otherField2" },
...
*/
}
}
]).exec(function(err, members) {
res.json(members);
});
Or tweak your current query using find() so that you can sort on two fields, i.e. the tag (ascending) and createdAt (descending) attributes. You can then select the top 5 documents using limit, something like the following:
memberTags = [1,2,3,4,5];
ClanMember.find(
{ 'tag': { $in: memberTags } }, // query
{}, // projection
{ // options
sort: { 'createdAt': -1, 'tag': 1 },
limit: memberTags.length,
skip: 0
}
).lean().exec(function(err, members) {
res.json(members);
});
or
memberTags = [1,2,3,4,5];
ClanMember.find({
'tag': {
$in: memberTags
}
}).sort('-createdAt tag')
.limit(memberTags.length)
.lean()
.exec(function(err, members) {
res.json(members);
});
Ok, so, first, let's use findOne() so you get only one document out of the request
Then to sort by the newest document, you can use .sort({elementYouWantToSort: -1}) (-1 meaning you want to sort from newest to oldest, and 1 from the oldest to the newest)
I would recommend to use this function on the _id, which already includes creation date of the document
Which gives us the following request :
ClanMember.findOne({
'tag': {
$in: memberTags
}
}).sort({_id: -1}).lean().exec(function(err, members) {
res.json(members);
});

MongoDB - $set to update or push Array element

In products collection, i have an Array of recentviews which has 2 fields viewedBy & viewedDate.
In a scenario if i already have a record with viewedby, then i need to update it. For e.g if i have array like this :-
"recentviews" : [
{
"viewedby" : "abc",
"vieweddate" : ISODate("2014-05-08T04:12:47.907Z")
}
]
And user is abc, so i need to update the above & if there is no record for abc i have to $push.
I have tried $set as follows :-
db.products.update( { _id: ObjectId("536c55bf9c8fb24c21000095") },
{ $set:
{ "recentviews":
{
viewedby: 'abc',
vieweddate: ISODate("2014-05-09T04:12:47.907Z")
}
}
}
)
The above query erases all my other elements in Array.
Actually doing what it seems like you say you are doing is not a singular operation, but I'll walk through the parts required in order to do this or otherwise cover other possible situations.
What you are looking for is in part the positional $ operator. You need part of your query to also "find" the element of the array you want.
db.products.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095"),
"recentviews.viewedby": "abc"
},
{
"$set": {
"recentviews.$.vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
}
)
So the $ stands for the matched position in the array so the update portion knows which item in the array to update. You can access individual fields of the document in the array or just specify the whole document to update at that position.
db.products.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095"),
"recentviews.viewedby": "abc"
},
{
"$set": {
"recentviews.$": {
"viewedby": "abc",
"vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
}
)
If the fields do not in fact change and you just want to insert a new array element if the exact same one does not exist, then you can use $addToSet
db.products.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095"),
"recentviews.viewedby": "abc"
},
{
$addToSet:{
"recentviews": {
"viewedby": "abc",
"vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
}
)
However if you are just looking for for "pushing" to an array by a singular key value if that does not exist then you need to do some more manual handling, by first seeing if the element in the array exists and then making the $push statement where it does not.
You get some help from the mongoose methods in doing this by tracking the number of documents affected by the update:
Product.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095"),
"recentviews.viewedby": "abc"
},
{
"$set": {
"recentviews.$": {
"viewedby": "abc",
"vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
},
function(err,numAffected) {
if (numAffected == 0) {
// Document not updated so you can push onto the array
Product.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095")
},
{
"$push": {
"recentviews": {
"viewedby": "abc",
"vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
}
},
function(err,numAffected) {
}
);
}
}
);
The only word of caution here is that there is a bit of an implementation change in the writeConcern messages from MongoDB 2.6 to earlier versions. Being unsure right now as to how the mongoose API actually implements the return of the numAffected argument in the callback the difference could mean something.
In prior versions, even if the data you sent in the initial update exactly matched an existing element and there was no real change required then the "modified" amount would be returned as 1 even though nothing was actually updated.
From MongoDB 2.6 the write concern response contains two parts. One part shows the modified document and the other shows the match. So while the match would be returned by the query portion matching an existing element, the actual modified document count would return as 0 if in fact there was no change required.
So depending on how the return number is actually implemented in mongoose, it might actually be safer to use the $addToSet operator on that inner update to make sure that if the reason for the zero affected documents was not just that the exact element already existed.

Categories

Resources