mongodb pagination by range query with another sort field - javascript

Here is simplified version of my schema:
var MySchema = new Schema({
createdDate: {
type: Date,
required: true,
default: Date.now,
index: true
},
vote: {
type: Number,
index: true,
default: 0
}
});
I have large amount of data, so for paging with good performance I use range query like: .find({_id: {$gt: lastId}}).limit(20). Now I also want to sort my data by vote field. How should I do this?

Fairly much the same thing as the looking for a greater value concept, but this time on the "vote", but with another twist:
var query = Model.find({
"vote": { "$lte": lastVoteValue },
"_id": { "$nin": seenIds }
}).sort({ "vote": -1 }).limit(20);
So if you think about what is going on here, since you are doing a "descending" sort you want values of vote that are either "less than or equal to" the last value seen from your previous page.
The second part would be the "list" of previously seen _id values from either just the last page or possibly more. That part depends on how "granular" your "vote" values are in order to maintain that none of the items already paged are seen in the next page.
So the $nin operator does the exclusion here. You might want to track how that "vote" value varies to help you decide when to reduce the list of "seenIds".
That's the way to do range queries for paging, but if you need to jump to "specific" pages by number don't forget that you would still need to .limit() and .skip(). But this will work with single page jumps, or otherwise just incremental paging.

Related

MongoDB is going back to matching among all documents after $group stage

So I have a collection looking like this:
[
{"url":"website.com/test", "links":[ {"url": "www.something.fr/page.html","scoreDiff": 0.44} ], "justUpdated": true, "score": 0.91},
{"url":"domain.com/", "links":[], "justUpdated": true, "score": 0.81},
{"url":"www.something.fr/page.html", "links":[], "justUpdated": false, "score": 0.42},
]
The goal here is to get the third document, because in one of the documents where "justUpdated" equals true (the first one here), there is its url as a value in one of the "links" array elements.
To achieve that, I tried:
To find all the documents with "justUpdated" equals to true, then in NodeJS concatenate all the urls in their "links" arrays (let's call this array urlsOfInterest). And finally do another query to find all the documents where the url is in urlsOfInterest.
The problem is that it takes some time to do the first query then process the result and do the second query.
So I thought maybe I could do it all at once in an aggregate query. I use $group (with $cond to check if justUpdated equals true) to get all the arrays of "links" in one new variable named urlsOfInterest. For now this is an array of arrays of object so I then use $project with $reduce to have all these {url: "...", score: X} objects as one big array. Finally I use $project and $map to only have the url as the score value doesn't interest me here.
So I get an output looking like this:
_id:"urlsOfInterest",
urlsOfInterest: ["www.something.fr/page.html"]
Which is pretty great but I am stuck because now I just need to get the documents where url is in this variable named urlsOfInterest except I can't because all my documents have "disappeared" after the $group stage.
Please help me to find a way to perform this final query :) Or if this isn't the right way to do this, please point me in the right direction !
PS: the real goal here would be to update for all the documents where justUpdated equals true, every scoreDiff values in their links array. For our exemple, we do abs(0.91 - 0.42) = 0.49 so we replace our scoreDiff value of 0.44 by 0.49 (0.91 being the score of the document where justUpdated equals true and 0.42 the score of the document where url equals www.something.fr/page.html, explaining why I need to fetch this last document.) I don't believe there could be a way of doing all of this at once but if there is, please tell me !
You can use $lookup to get all matching documents in an array:
db.collection.aggregate([
{
"$match": {
"justUpdated": true
}
},
{
"$lookup": {
"from": "collection",
"localField": "links.url",
"foreignField": "url",
"as": "result"
}
},
{
"$match": {
"result": {
$gt: []
}
}
}
])
Then either $unwind and $replaceRoot the results array to get the documents as a cursor and do the math on the application level or do the calculations in the same pipeline, e.g. with $reduce
The "PS: the real goal" is not quite clear as it is based on a particular example but if you play a little bit with it in the playground I am sure you can calculate the numbers as per your requirements.

Reverse data from Fauna DB

I have been looking at the docs indexes for FQL and Fauna DB. Is there a way to reverse the order of the data returned from the query?
Here is the code I have:
q.Map(
q.Paginate(q.Match(q.Index("all_pages")), {
//! Find way to change order
size: 3
}),
ref => q.Get(ref)
)
The docs mention the reverse flag.
Does anyone know how to use it?
Imagine you have a collection that contains documents with a price field, let's call it ... ermm ... priceables!
Let's add two documents to the collection with price 1 and price 2.
[{
"price": 1
},
{
"price": 2
}]
Image in UI console of the documents
Create a regular index
Create a regular range index on that price value with the following snippet:
CreateIndex({
name: "regular-value-index",
unique: false,
serialized: true,
source: Collection("priceables"),
terms: [],
values: [
{
field: ["data", "price"]
}
]
})
Create a reverse index
There can be multiple values (e.g. a composite index) and the reverse field can be set per value.
To create a reverse index, set reverse to true for that specific value.
CreateIndex({
name: "reverse-value-index",
unique: false,
serialized: true,
source: Collection("priceables"),
terms: [],
values: [
{
field: ["data", "price"],
reverse: true
}
]
})
If you go to the UI console and open the index you will notice the values are sorted from highest to low.
UI console, reverse index
I assume that what confused you is that you can't set the reverse boolean in the UI yet. But you can just go to Shell and paste in the FQL code to create the index: Image of the shell
I just reversed an Index successfully. The docs are helpful, but they don't give an example where the Map function is implemented. The secret is to wrap the Map function as well, with q.Reverse:
q.Reverse(
q.Map(
q.Paginate(q.Match(q.Index("all_pages")), {
//! Find way to change order
size: 3
}),
ref => q.Get(ref)
)
)
This should work for you!
The best way to reverse the order of data is to use an index, per Brecht's answer. Fauna stores index entries in sorted order, so when your index uses reverse: true matching entries from the index are already sorted in reverse order.
WΔ_'s answer is technically correct, but probably not what you want. Using Reverse around a Paginate reverses the items in the page, but not the overall set. Suppose your set was the letters of the alphabet, and your page size is 3. Without Reverse, you'd expect to see pages with ABC, DEF, etc. If you're trying to change the order of the set, you'd expect to see ZYX, WVU, etc. WΔ_'s answer results in CBA, FED, etc.
Reverse can operate on a set. So you can Reverse(Match(Index("all_pages"))), and Paginate/Map that result. Note that for large sets, your query might fail since Reverse has to operate on the entire set.

Deleting Documents With a Specific Count of Keys from MonogDB

I have users' collection from which I want to delete the documents which have only 2 fields. My general schema is like this:
{
_id: 1,
name: af,
city: asd,
transaction: 1,
transactions:[{
id:1,
product: mobile,
amount: 10
},
id:2,
product: tv,
amount: 23
}],
many-other-sub-docs:[],
}
I want to delete documents for which only _id & transaction field exists but not others.
NOTE: I have around 30-40 fields.
One way to remove those documents is specify all the fields in query which shouldn't exist & only those field which should exist.
For e.g. db.users.remove({_id:{$exists:true}, transaction:{$exists:true}, other_field1:{$exists:false}, other_field2:{$exists:false}, ...})
But I find this query absurd. Also I have to find all the fields in my collection.
Is there any other simpler way?
Well yes there is a better way to do that. I cannot promise you blistering performance, but it's likely not much worse than what you are doing now. You use the JavaScript evaluation of $where.
The _id key is always present, so all you are looking for is testing the presence of another field. The total "field count" for the document is then 2. As in:
db.collection.remove({
"transaction": { "$exists": true },
"$where": "return Object.keys(this).length == 2"
})
So simply test the length of the array of document keys for the expected value.

MongoDB: how to find 10 random document in a collection of 100?

Is MongoDB capable of funding number of random documents without making multiple queries?
e.g. I implemented on the JS side after loading all the document in the collection, which is wasteful - hence just wanted to check if this can be done better with one db query?
The path I took on the JS side:
get all data
make an array of the IDs
shuffle array of IDs (random order)
splice the array to the number of document required
create a list of document by selecting them by ID which we have left after two previous operations, one by one from the whole collection
Two major drawback are that I am loading all data - or I make multiple queries.
Any suggestion much appreciated
This was answered long time ago and, since then, MongoDB has greatly evolved.
As posted in another answer, MongoDB now supports sampling within the Aggregation Framework since version 3.2:
The way you could do this is:
db.products.aggregate([{$sample: {size: 5}}]); // You want to get 5 docs
Or:
db.products.aggregate([
{$match: {category:"Electronic Devices"}}, // filter the results
{$sample: {size: 5}} // You want to get 5 docs
]);
However, there are some warnings about the $sample operator:
(as of Nov, 6h 2017, where latest version is 3.4) => If any of this is not met:
$sample is the first stage of the pipeline
N is less than 5% of the total documents in the collection
The collection contains more than 100 documents
If any of the above conditions are NOT met, $sample performs a
collection scan followed by a random sort to select N documents.
Like in the last example with the $match
OLD ANSWER
You could always run:
db.products.find({category:"Electronic Devices"}).skip(Math.random()*YOUR_COLLECTION_SIZE)
But the order won't be random and you will need two queries (one count to get YOUR_COLLECTION_SIZE) or estimate how big it is (it is about 100 records, about 1000, about 10000...)
You could also add a field to all documents with a random number and query by that number. The drawback here would be that you will get the same results every time you run the same query. To fix that you can always play with limit and skip or even with sort. you could as well update those random numbers every time you fetch a record (implies more queries).
--I don't know if you are using Mongoose, Mondoid or directly the Mongo Driver for any specific language, so I'll write all about mongo shell.
Thus your, let's say, product record would look like this:
{
_id: ObjectId("..."),
name: "Awesome Product",
category: "Electronic Devices",
}
and I would suggest to use:
{
_id: ObjectId("..."),
name: "Awesome Product",
category: "Electronic Devices",
_random_sample: Math.random()
}
Then you could do:
db.products.find({category:"Electronic Devices",_random_sample:{$gte:Math.random()}})
then, you could run periodically so you update the document's _random_sample field periodically:
var your_query = {} //it would impact in your performance if there are a lot of records
your_query = {category: "Electronic Devices"} //Update
//upsert = false, multi = true
db.products.update(your_query,{$set:{_random_sample::Math.random()}},false,true)
or just whenever you retrieve some records you could update all of them or just a few (depending on how many records you've retrieved)
for(var i = 0; i < records.length; i++){
var query = {_id: records[i]._id};
//upsert = false, multi = false
db.products.update(query,{$set:{_random_sample::Math.random()}},false,false);
}
EDIT
Be aware that
db.products.update(your_query,{$set:{_random_sample::Math.random()}},false,true)
won't work very well as it will update every products that matches your query with the same random number. The last approach works better (updating some documents as you retrieve them)
Since 3.2 there is an easier way to to get a random sample of documents from a collection:
$sample
New in version 3.2.
Randomly selects the specified number of documents from its input.
The $sample stage has the following syntax:
{ $sample: { size: <positive integer> } }
Source: MongoDB Docs
In this case:
db.products.aggregate([{$sample: {size: 10}}]);
Here is what I came up in the end:
var numberOfItems = 10;
// GET LIST OF ALL ID's
SchemaNameHere.find({}, { '_id': 1 }, function(err, data) {
if (err) res.send(err);
// shuffle array, as per here https://github.com/coolaj86/knuth-shuffle
var arr = shuffle(data.slice(0));
// get only the first numberOfItems of the shuffled array
arr.splice(numberOfItems, arr.length - numberOfItems);
// new array to store all items
var return_arr = [];
// use async each, as per here http://justinklemm.com/node-js-async-tutorial/
async.each(arr, function(item, callback) {
// get items 1 by 1 and add to the return_arr
SchemaNameHere.findById(item._id, function(err, data) {
if (err) res.send(err);
return_arr.push(data);
// go to the next one item, or to the next function if done
callback();
});
}, function(err) {
// run this when looped through all items in arr
res.json(return_arr);
});
});
skip didn't work out for me. Here is what I wound up with:
var randomDoc = db.getCollection("collectionName").aggregate([ {
$match : {
// criteria to filter matches
}
}, {
$sample : {
size : 1
}
} ]).result[0];
gets a single random result, matching the criteria.
Sample may not be best as you wouldn't get virtual like that.
Instead, create a function in your back end that shuffles the results.
Then return the shuffled array instead of the mongodb result

How to change result position based off parameter in a mongodb / mongoose query?

So I am using mongoose and node.js to access a mongodb database. I want to bump up each result based on a number (they are ordered by date created if none are bumped up). For example:
{ name: 'A',
bump: 0 },
{ name: 'B',
bump: 0 },
{ name: 'C',
bump: 2 },
{ name: 'D',
bump: 1 }
would be retreived in the order: C, A, D, B. How can this be accomplished (without iterating through every entry in the database)?
Try something like this. Store a counter tracking the total # of threads, let's call it thread_count, initially set to 0, so have a document somewhere that looks like {thread_count:0}.
Every time a new thread is created, first call findAndModify() using {$inc : {thread_count:1}} as the modifier - i.e., increment the counter by 1 and return its new value.
Then when you insert the new thread, use the new value for the counter as the value for a field in its document, let's call it post_order.
So each document you insert has a value 1 greater each time. For example, the first 3 documents you insert would look like this:
{name:'foo', post_order:1, created_at:... } // value of thread_count is at 1
{name:'bar', post_order:2, created_at:... } // value of thread_count is at 2
{name:'baz', post_order:3, created_at:... } // value of thread_count is at 3
etc.
So effectively, you can query and order by post_order as ASCENDING, and it will return them in the order of oldest to newest (or DESCENDING for newest to oldest).
Then to "bump" a thread in its sorting order when it gets upvoted, you can call update() on the document with {$inc:{post_order:1}}. This will advance it by 1 in the order of result sorting. If two threads have the same value for post_order, created_at will differentiate which one comes first. So you will sort by post_order, created_at.
You will want to have an index on post_order and created_at.
Let's guess your code is the variable response (which is an array), then I would do:
response.sort(function(obj1, obj2){
return obj2.bump - obj1.bump;
});
or if you want to also take in mind name order:
response.sort(function(obj1, obj2){
var diff = obj2.bump - obj1.bump;
var nameDiff = (obj2.name > obj1.name)?-1:((obj2.name < obj1.name)?1:0);
return (diff == 0) ? nameDiff : diff;
});
Not a pleasant answer, but the solution you request is unrealistic. Here's my suggestion:
Add an OrderPosition property to your object instead of Bump.
Think of "bumping" as an event. It is best represented as an event-handler function. When an item gets "bumped" by whatever trigger in your business logic, the collection of items needs to be adjusted.
var currentOrder = this.OrderPosition
this.OrderPosition = currentOrder - bump; // moves your object up the list
// write a foreach loop here, iterating every item AFTER the items unadjusted
// order, +1 to move them all down the list one notch.
This does require iterating through many items, and I know you are trying to prevent that, but I do not think there is any other way to safely ensure the integrity of your item ordering - especially when relative to other pulled collections that occur later down the road.
I don't think a purely query-based solution is possible with your document schema (I assume you have createdDate and bump fields). Instead, I suggest a single field called sortorder to keep track of your desired retrieval order:
sortorder is initially the creation timestamp. If there are no "bumps", sorting by this field gives the correct order.
If there is a "bump," the sortorder is invalidated. So simply correct the sortorder values: each time a "bump" occurs swap the sortorder fields of the bumped document and the document directly ahead of it. This literally "bumps" the document up in the sort order.
When querying, sort by sortorder.
You can remove fields bump and createdDate if they are not used elsewhere.
As an aside, most social sites don't directly manipulate a post's display position based on its number of votes (or "bumps"). Instead, the number of votes is used to calculate a score. Then the posts are sorted and displayed by this score. In your case, you should combine createdDate and bumps into a single score that can be sorted in a query.
This site (StackOverflow.com) had a related meta discussion about how to determine "hot" questions. I think there was even a competition to come up with a new formula. The meta question also shared the formulas used by two other popular social news sites: Y Combinator Hacker News and Reddit.

Categories

Resources