I hit an API which follows 50 members' data in a game once a day, and use mongoose to convert the JSON into individual documents in a collection. Between days there is data which is consistent, for example each member's tag (an id for the member in game), but there is data which is different (different scores etc.). Each document has a createdAt property.
I would like to find the most recent document for each member, and thus have an array with each member's tag.
I an currently using the following query to find all documents where tags match, however they are returning all documents, not just one. How do I sort/limit the documents to the most recent one, whilst keep it as one query (or is there a more "mongodb way")?
memberTags = [1,2,3,4,5];
ClanMember.find({
'tag': {
$in: memberTags
}
}).lean().exec(function(err, members) {
res.json(members);
});
Thanks
You can query via the aggregation framework. Your query would involve a pipeline that has stages that process the input documents to give you the desired result. In your case, the pipeline would have a $match phase which acts as a query for the initial filter. $match uses standard MongoDB queries thus you can still query using $in.
The next step would be to sort those filtered documents by the createdAt field. This is done using the $sort operator.
The preceding pipeline stage involves aggregating the ordered documents to return the top document for each group. The $group operator together with the $first accumulator are the operators which make this possible.
Putting this altogether you can run the following aggregate operation to get your desired result:
memberTags = [1,2,3,4,5];
ClanMember.aggregate([
{ "$match": { "tag": { "$in": memberTags } } },
{ "$sort": { "tag": 1, "createdAt: -1 " } },
{
"$group": {
"_id": "$tag",
"createdAt": { "$first": "$createdAt" } /*,
include other necessary fields as appropriate
using the $first operator e.g.
"otherField1": { "$first": "$otherField1" },
"otherField2": { "$first": "$otherField2" },
...
*/
}
}
]).exec(function(err, members) {
res.json(members);
});
Or tweak your current query using find() so that you can sort on two fields, i.e. the tag (ascending) and createdAt (descending) attributes. You can then select the top 5 documents using limit, something like the following:
memberTags = [1,2,3,4,5];
ClanMember.find(
{ 'tag': { $in: memberTags } }, // query
{}, // projection
{ // options
sort: { 'createdAt': -1, 'tag': 1 },
limit: memberTags.length,
skip: 0
}
).lean().exec(function(err, members) {
res.json(members);
});
or
memberTags = [1,2,3,4,5];
ClanMember.find({
'tag': {
$in: memberTags
}
}).sort('-createdAt tag')
.limit(memberTags.length)
.lean()
.exec(function(err, members) {
res.json(members);
});
Ok, so, first, let's use findOne() so you get only one document out of the request
Then to sort by the newest document, you can use .sort({elementYouWantToSort: -1}) (-1 meaning you want to sort from newest to oldest, and 1 from the oldest to the newest)
I would recommend to use this function on the _id, which already includes creation date of the document
Which gives us the following request :
ClanMember.findOne({
'tag': {
$in: memberTags
}
}).sort({_id: -1}).lean().exec(function(err, members) {
res.json(members);
});
Related
Trying to use sequelize findAndCountAll method, to get items.
I've to use distinct, and offset with limit due to my task.
The problem is, that in associated model i've column with array type, and i need to order parent model by that array length.
My query looks like :
const { rows, count } = await this.repo.findAndCountAll({
where: { someField: someValue },
col: 'someCol',
distinct: true,
include: [
{
model: someNestedModel,
as: 'someNestedModelAssociation',
include: [{ model: someInnerNestedModel, as: 'someInnerNestedAssociation' }]
}
],
// eslint-disable-next-line #typescript-eslint/ban-ts-comment
//#ts-ignore
order: this.getOrderByOptions(sortOrder, orderBy),
limit,
offset
});
getOrderByOptions(sortOrder, orderBy) {
switch (orderBy) {
case Sort_By_Array_Length:
return Sequelize.literal('json_array_length(someModelName.someColumnWithArrayName) ASC');
default:
return [[orderBy, sortOrder]];
}
}
The problem is, that my order by query is used both in subQuery and mainQuery.
And using it into subQuery leads to error, cz there is no such field.
If i use subQuery:false flag, it works, but then i got messed with returning results, due to problems with subQuery:false and offset&limits.
So the question is, is there a way, to exclude orderBy field from subQuery?
P.S. Models have many to many association with through table.
If a collection have a list of dogs, and there is duplicate entries on some races. How do i remove all, but a single specific/non specific one, from just one query?
I guess it would be possible to get all from a Model.find(), loop through every index except the first one and call Model.remove(), but I would rather have the database handle the logic through the query. How would this be possible?
pseudocode example of what i want:
Model.remove({race:"pitbull"}).where(notFirstOne);
To remove all but one, you need a way to get all the filtered documents, group them by the identifier, create a list of ids for the group and remove a single id from
this list. Armed with this info, you can then run another operation to remove the documents with those ids. Essentially you will be running two queries.
The first query is an aggregate operation that aims to get the list of ids with the potentially nuking documents:
(async () => {
// Get the duplicate entries minus 1
const [doc, ...rest] = await Module.aggregate([
{ '$match': { 'race': 'pitbull'} },
{ '$group': {
'_id': '$race',
'ids': { '$push': '$_id' },
'id': { '$first': '$_id' }
} },
{ '$project': { 'idsToRemove': { '$setDifference': [ ['$id'], '$ids' ] } } }
]);
const { idsToRemove } = doc;
// Remove the duplicate documents
Module.remove({ '_id': { '$in': idsToRemove } })
})();
if purpose is to keep only one, in case of concurrent writes, may as well just write
Module.findOne({race:'pitbull'}).select('_id')
//bla
Module.remove({race:'pitbull', _id:{$ne:idReturned}})
If it is to keep the very first one, mongodb does not guarantee results will be sorted by increasing _id (natural order refers to disk)
see Does default find() implicitly sort by _id?
so instead
Module.find({race:'pitbull'}).sort({_id:1}).limit(1)
I've stumbled upon some very strange behavior with MongoDB. For my test case, I have an MongoDB collection with 9 documents. All documents have the exact same structure, including the fields expired_at: Date and location: [lng, lat].
I now need to find all documents that are not expired yet and are within a bounding box; I show match documents on map. for this I set up the following queries:
var qExpiry = {"expired_at": { $gt : new Date() } };
var qLocation = { "location" : { $geoWithin : { $box : [ [ 123.8766, 8.3269 ] , [ 122.8122, 8.24974 ] ] } } };
var qFull = { $and: [ qExpiry, qLocation ] };
Since the expiry date is long in the past, and when I set the bounding box large enough, the following queries give me all 9 documents as expected:
db.docs.find(qExpiry);
db.docs.find(qLocation);
db.docs.find(qFull);
db.docs.find(qExpiry).sort({"created_at" : -1});
db.docs.find(qLocation).sort({"created_at" : -1});
Now here's the deal: The following query returns 0 documents:
db.docs.find(qFull).sort({"created_at" : -1});
Just adding sort to the AND query ruins the result (please note that I want to sort since I also have a limit in order to avoid cluttering the map on larger scales). Sorting by other fields yield the same empty result. What's going on here?
(Actually even stranger: When I zoom into my map, I sometimes get results for qFull, even with sorting. One could argue that qLocation is faulty. But when I only use qLocation, the results are always correct. And qExpiry is always true for all documents anyway)
You may want to try running the same query using the aggregation framework's $match and $sort pipelines:
db.docs.aggregate([
{ "$match": qFull },
{ "$sort": { "created_at": -1 } }
]);
or implicitly using $and by specifiying a comma-separated list of expressions as in
db.docs.aggregate([
{
"$match": {
"expired_at": { "$gt" : new Date() },
"location" : {
"$geoWithin" : {
"$box" : [
[ 123.8766, 8.3269 ],
[ 122.8122, 8.24974 ]
]
}
}
}
},
{ "$sort": { "created_at": -1 } }
]);
Not really sure why that fails with find()
chridam suggestion using the aggregation framework of MongoDB proved to be the way to go. My working query now looks like this:
db.docs.aggregate(
[
{ $match : { $and : [qExpiry, qLocation]} },
{ $sort: {"created_at": -1} }.
{ $limit: 50 }.
]
);
Nevertheless, if any can point out way my first approach did not work, that would be very useful. Simply adding sort() to a non-empty query shouldn't suddenly return 0 documents. Just to add, since I still tried for a bit, .sort({}) return all documents but was not very useful. Everything else failed including .sort({'_id': 1}).
I have a document collection with a subdocument of tags.
{
title:"my title",
slug:"my-title",
tags:[
{tagname:'tag1', id:1},
{tagname:'tag2', id:2},
{tagname:'tag3', id:3}]
}
{
title:"my title2",
slug:"my-title2",
tags:[
{tagname:'tag1', id:1},
{tagname:'tag2', id:2}]
}
{
title:"my title3",
slug:"my-title3",
tags:[
{tagname:'tag1', id:1},
{tagname:'tag3', id:3}]
}
{
title:"my title4",
slug:"my-title4",
tags:[
{tagname:'tag1', id:1},
{tagname:'tag2', id:2},
{tagname:'tag3', id:3}]
}
[...]
Getting a count of every tag is quite simple with an $unwind + group count aggregate
However, I would like to find a count of which tags are found together, or more precisely, which sibling shows up most often beside one another, ordered by count. I have not found an example nor can I figure out how to do this without multiple queries.
Ideally the end result would be:
{'tag1':{
'tag2':3, // tag1 and tag2 were found in a document together 3 times
'tag3':3, // tag1 and tag3 were found in a document together 3 times
[...]}}
{'tag2':{
'tag1':3, // tag2 and tag1 were found in a document together 3 times
'tag3':2, // tag2 and tag3 were found in a document together 2 times
[...]}}
{'tag3':{
'tag1':3, // tag3 and tag1 were found in a document together 3 times
'tag2':2, // tag3 and tag2 were found in a document together 2 times
[...]}}
[...]
As stated earlier it just simply is not possible to have the aggregation framework generate arbitrary key names from data. It's also not possible to do this kind of analysis in a single query.
But there is a general approach to doing this over your whole collection for an undetermined number of tag names. Essentially you are going to need to get a distinct list of the "tags" and process another query for each distinct value to get the "siblings" to that tag and the counts.
In general:
// Get a the unique tags
db.collection.aggregate([
{ "$unwind": "$tags" },
{ "$group": {
"_id": "$tags.tagname"
}}
]).forEach(function(tag) {
var tagDoc = { };
tagDoc[tag._id] = {};
// Get the siblings count for that tag
db.collection.aggregate([
{ "$match": { "tags.tagname": tag._id } },
{ "$unwind": "$tags" },
{ "$match": { "tags.tagname": { "$ne": tag._id } } },
{ "$group": {
"_id": "$tags.tagname",
"count": { "$sum": 1 }
}}
]).forEach(function(sibling) {
// Set the value in the master document
tagDoc[tag._id][sibling._id] = sibling.count;
});
// Just emitting for example purposes in some way
printjson(tagDoc);
});
The aggregation framework can return a cursor in releases since MongoDB 2.6, so even with a large number of tags this can work in an efficient way.
So that's the way you would handle this, but there really is no way to have this happen in a single query. For a shorter run time you might look at frameworks that allow many queries to be run in parallel either combining the results or emitting to a stream.
In products collection, i have an Array of recentviews which has 2 fields viewedBy & viewedDate.
In a scenario if i already have a record with viewedby, then i need to update it. For e.g if i have array like this :-
"recentviews" : [
{
"viewedby" : "abc",
"vieweddate" : ISODate("2014-05-08T04:12:47.907Z")
}
]
And user is abc, so i need to update the above & if there is no record for abc i have to $push.
I have tried $set as follows :-
db.products.update( { _id: ObjectId("536c55bf9c8fb24c21000095") },
{ $set:
{ "recentviews":
{
viewedby: 'abc',
vieweddate: ISODate("2014-05-09T04:12:47.907Z")
}
}
}
)
The above query erases all my other elements in Array.
Actually doing what it seems like you say you are doing is not a singular operation, but I'll walk through the parts required in order to do this or otherwise cover other possible situations.
What you are looking for is in part the positional $ operator. You need part of your query to also "find" the element of the array you want.
db.products.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095"),
"recentviews.viewedby": "abc"
},
{
"$set": {
"recentviews.$.vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
}
)
So the $ stands for the matched position in the array so the update portion knows which item in the array to update. You can access individual fields of the document in the array or just specify the whole document to update at that position.
db.products.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095"),
"recentviews.viewedby": "abc"
},
{
"$set": {
"recentviews.$": {
"viewedby": "abc",
"vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
}
)
If the fields do not in fact change and you just want to insert a new array element if the exact same one does not exist, then you can use $addToSet
db.products.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095"),
"recentviews.viewedby": "abc"
},
{
$addToSet:{
"recentviews": {
"viewedby": "abc",
"vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
}
)
However if you are just looking for for "pushing" to an array by a singular key value if that does not exist then you need to do some more manual handling, by first seeing if the element in the array exists and then making the $push statement where it does not.
You get some help from the mongoose methods in doing this by tracking the number of documents affected by the update:
Product.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095"),
"recentviews.viewedby": "abc"
},
{
"$set": {
"recentviews.$": {
"viewedby": "abc",
"vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
},
function(err,numAffected) {
if (numAffected == 0) {
// Document not updated so you can push onto the array
Product.update(
{
"_id": ObjectId("536c55bf9c8fb24c21000095")
},
{
"$push": {
"recentviews": {
"viewedby": "abc",
"vieweddate": ISODate("2014-05-09T04:12:47.907Z")
}
}
},
function(err,numAffected) {
}
);
}
}
);
The only word of caution here is that there is a bit of an implementation change in the writeConcern messages from MongoDB 2.6 to earlier versions. Being unsure right now as to how the mongoose API actually implements the return of the numAffected argument in the callback the difference could mean something.
In prior versions, even if the data you sent in the initial update exactly matched an existing element and there was no real change required then the "modified" amount would be returned as 1 even though nothing was actually updated.
From MongoDB 2.6 the write concern response contains two parts. One part shows the modified document and the other shows the match. So while the match would be returned by the query portion matching an existing element, the actual modified document count would return as 0 if in fact there was no change required.
So depending on how the return number is actually implemented in mongoose, it might actually be safer to use the $addToSet operator on that inner update to make sure that if the reason for the zero affected documents was not just that the exact element already existed.