I have 20 Millions documents in my database with the following manner.
{
"_id": ObjectId("5bb84e931cb3d25a3b21d14e"),
"merchant": "menswearhouse.com",
"category": "Fashion > Clothing > Men's Clothing",
"feature": [
"-0.899652959529",
"-0.02401520125567913",
"0.08394625037908554",
"0.06319021433591843",
"-0.015963224694132805"
]
}
Now I have below array with which I need to find documents.
const dummy = [
"-0.899652959529",
"-0.02401520125567913",
"0.08394625037908554",
"0.06319021433591843",
"-0.015963224694132805"
];
I need to
Find difference of all the values i.e need to subtract first index of feature with the first index of my dummy array and so on for the all 5 values.
Take square of all values
Add all 5 values
Take square root.
Sort all the values with that field and get only 5 documents.
I am using this query which $projects the field when I use $limit. But I need to $sort with the $projected field and need to take top 5 documents. But there are 20 millions document it doesn't return anything and last forever.
db.collection.aggregate([
{ $project: {
field: {
$sqrt: {
$sum: {
$map: {
input: { $range: [0, { $size: '$feature' }] },
as: "d",
in: {
$pow: [
{
$subtract: [
{ $toDouble: { $arrayElemAt: [dummy, "$$d"] }},
{ $toDouble: { $arrayElemAt: ["$feature", "$$d"] }}
]
},
2
]
}
}
}
}
}
}}
])
Can I use index on the field which is being created at the runtime?
Thanks!!!
The short answer is no. You can NOT create index on the fields created at runtime. MongoDB, at this writing, can't achieve what you want. But you can calculate them in parallel. Assuming your server has proper resources (CPU and memory), you can, in your application, divide your jobs and execute them in parallel. For simple math, let's assume you have 20,000,000 (mil) docs and you divide them into 20 tasks. For each task, it'll process 1,000,000 docs and return top 5 results. The pipeline for the first task will be
[
{
'$sort': {
'_id': 1
}
}, {
'$skip': 0
}, {
'$limit': 1000000
}, {
'$project': {
'field': {
'$sqrt': {
<do your thing>
}
}
}
}, {
'$limit': 5
}
]
After all threads (tasks) returned, merge the results (only 100 docs) in your application, sort them by field, and finally get your top 5 documents. Note that you have to consider your hardware resources to come up the optimal number of divided tasks.
Related
I have the following document...
{
"_id": {
"$oid": "5bca528b49079d64dd4cea5d"
},
"song_queue": [
{
"song_id": "best",
"vote": 1
},
{
"song_id": "west",
"vote": 1
}
],
"room_id": "FHWAN",
"__v": 0
My issue is that I need to first update a vote for one of the songs and then re-arrange the order of the array based on the vote count.
I have a way of doing it with two separate queries...
// Give the song an upvote.
await Room.findOneAndUpdate(
{ "song_queue.song_id": song.song_id },
{$set: { "song_queue.$.vote": song.votes }},
{ new: true }
);
// Sort the queue.
room = await Room.findOneAndUpdate(
{ room_id: data.room_id },
{ $push: { song_queue: { $each: [], $sort: { vote: -1 } } } },
{ new: true }
);
Is there a way to merge the two queries into one query, or perhaps make it more concise? I would like one query that updates the song in song_queue and then sorts the array after the change.
There is no way to merge this queries because the order of the operations inside update parameter is not guaranteed to be preserved so the code like this might fail to run correctly because you can't be sure that updating song_queue.$.vote with occur before song_queue is sorted:
await Room.findOneAndUpdate(
{ "song_queue.song_id": song.song_id },
{
$set: { "song_queue.$.vote": song.votes },
$push: { song_queue: { $each: [], $sort: { vote: -1 } } }
},
{ new: true }
);
What can be done to optimize db querying is using Ordered Bulk Write operation which will make only one round trip to MongoDB instead of two.
I am trying to achieve pagination in Bookshelf.js, and limit not just the number of models (Author in my code), but also the number of items in every model (books in my code).
Here is the code:
Author.where('postedBy', userId).fetchPage({
pageSize: 10,
page: 1,
withRelated: [
{ 'books': function(qb) { qb.limit(10) } }
]
})
So, what I expect is:
{ author_1: [ book_1, ..., book_10 ], ..., author_10: [ book_1, ..., book_10 ] }
But instead I get:
{ author_1: [ book_1, ..., book_7 ], author_2: [ book_1, ..., book_3 ], author_N: [/*8 other collections are empty*/] }
So it just limits the total number of Authors to 10, AND overall number of items in the query to 10.
Instead I want to have 10 Authors and up to 10 books in every author's collection.
Is there a way to achieve that with Bookshelf.js & Knex.js?
I'm using the following query to populate items from MongoDB, in ascending order, according to a field called sortIndex.
Sometimes though items in the DB don't have the sortIndex field. With the following query, the items with a null sortIndex are showing up at the top, and I'm wondering how to get them to show up at the bottom. Would I need two queries for this or is there a way to use one query?
.populate({path: 'slides', options: { sort: { 'sortIndex': 'ascending' } } })
You can do something like this:
db.collection.aggregate([
{ $addFields:
{
hasValue : { $cond: [ { $eq: [ "$value", null ] }, 2, 1 ] },
}
},
])
.sort({hasValue : 1, value : 1});
Duplicate of: How to keep null values at the end of sorting in Mongoose?
Anyway posting the same solution ...
Am not sure about the solution am about to say. I cant test this out as I dont have a mongo db set right now, but I think that you can use <collection>.aggregate along with $project and $sort to achieve this.
Sample code:
db.inventory.aggregate(
[
{
$project: {
item: 1,
description: { $ifNull: [ "$amount", -1*(<mimimum value>)* ] }
}
},
{
$sort : {
amount : (-1 or 1 depending on the order you want)
}
}
]
)
Hope this helps !!
I have a collection representing robots holding an inventory of products in positional slots, which will be incremented and decremented.
{
_id: "someid",
name: "",
inventory: [{
productId: "productId1",
count: 30
}, {
productId: "productId2",
count: 56
}, {
// ... up to 55 slots.
}]
}
I then have an API that will interact with this document on a PUT request. The request data will contain the index of the inventory to update and the number to decrement it by, eg:
[
{ "inventory": 3, "inc": -10 }, // remove 10 from robot.inventory[3]
{ "inventory": 54, "inc": -2 }, // remove 2 from robot.inventory[10]
]
I have the following code.
// robots submit to this api to keep their products up to date
MachineApiV1.addRoute('/products', {
authRequired: true,
roleRequired: 'machine'
}, {
put: function () {
// omit process to get data from above
var user = Users.findOne(this.request.headers['x-user-id']);
Robots.update(user.profiles.robot, {
$inc: { } // this is where I am lost.
});
}
});
I can't quite think of a way to do it in a single update. How can I increment multiple arbitrary indexes in a mongo document?
MongoDB makes it really simple - just specify the position in the array of sub-documents you want to update:
Robots.update(user.profiles.robot, {
$inc: {
'inventory.3.count': -10,
'inventory.54.count': -2
}
});
I have following data in MongoDB:
[{id:3132, home:'NSH', away:'BOS'}, {id:3112, home:'ANA', away:'CGY'}, {id:3232, home:'MIN', away:'NSH'}]
Is it possible to get total game count for each team with aggregate pipeline?
desired result:
[{team: 'NSH', totalGames: 2}, {team:'MIN', totalGames: 1}, ...}]
i can get each on seperately to their own arrays with two aggregate calls:
[{$group: {_id: "$home", gamesLeft: {$sum: 1}}}]
and
[{$group: {_id: "$away", gamesLeft: {$sum: 1}}}]
resulting
var homeGames = [ { _id: 'NSH', totalGames: 1 }, { _id: 'SJS', totalGames: 2 }, ...]
var awayGames = [ { _id: 'NSH', totalGames: 1 }, { _id: 'SJS', totalGames: 4 }, ...]
But i really want to get it working with just one query. If not possible what would be the best way to combine these two results in to one using javascript?
After some puzzling, I found a way to get it done using an aggregate pipeline. Here is the result:
db.games.aggregate([{
$project: {
isHome: { $literal: [true, false] },
home: true,
away: true
}
}, {
$unwind: '$isHome'
}, {
$group: {
_id: { $cond: { if: '$isHome', then: '$home', else: '$away' } },
totalGames: { $sum: 1 }
}
}
]);
As you can see it consists of three stages. The first two are meant to duplicate each document into one for the home team and one for the away team. To do this, the project stage first creates a new isHome field on each document containing a true and a false value, which the unwind stage then splits into separate documents containing either the true or the false value.
Then in the group phase, we let the isHome field decide whether to group on the home or the away field.
It would be nicer if we could create a team field in the project step, containing the array [$home, $away], but mongo only supports adding array literals here, hence the workaround.