Ordering by several keys in MongoDB - javascript

I would like to order a collection based on several keys. E.g.:
db.collection.find().sort( { age: -1, score: 1 } );
This is all fine and dandy. However, I would like to guarantee that age comes first, and score comes next.
In Javascript, the order in which an object keys are listed is not guaranteed.
The explanation of sort in the official MongoDb documentation is not exactly clear.
So... is the way I showed actually reliable in terms of order in which the arguments are taken?
Merc.

The syntax you use only works if your JavaScript implementation preserves the order of keys in objects. Most do, most of the time, but it's up to you to be sure.
For the Nodejs official driver, there is an alternate form that works with an Array:
db.collection.find().sort([[age, -1], [score, 1]]);
Drivers for other languages might have something similar.

The aggregation documentation actually provides a little bit more information on the subject, in which it states:
db.users.aggregate(
{ $sort : { age : -1, posts: 1 } }
);
This operation sorts the documents in the users collection, in
descending order according by the age field and then in ascending
order according to the value in the posts field
Which would indicate that the .sort() method functions in the same way.

Related

Using Where and Order by different fields in Firestore query

I have a Firestore collection named channels, and I'd like to get the list of channels based on an array of IDs and order it by the createdAt field, this is my function :
const getChannels = () => {
const q = query(
collection(db, "channels"),
where(documentId(), "in", [
"F0mnR5rNdhwSLPZ57pTP",
"G8p6TWSopLN4dNHJLH8d",
"wMWMlJwa3m3lYINNjCLT",
]),
orderBy("createdAt")
);
const unsubscribe = onSnapshot(q, (snapshot) => {
snapshot.docs.map((doc) => {
console.log(doc.data());
});
});
return unsubscribe;
};
But I'm getting this error
FirebaseError: inequality filter property and first sort order must be the same: __name__ and createdAt.
It only works if I orderBy documentId().
I'm aware there is a limitation in the docs about this, but I'm wondering if there is a workaround for this type of situation.
Also the answer for this question isn't working anymore I guess.
The title of your question indicates that you are trying to use where and orderBy for different fields. But note that you are using documentId() in the where condition to filter, which is not a field in the Firestore document.
So if you filter is based on documentId(), you can use only documentId() in orderBy() clause, that also in ascending order because currently Firestore does not support sorting in descending order of documentId() which is mentioned in this answer.
Let’s take a look at the following examples -
const data=await db.collection("users").where(admin.firestore.FieldPath.documentId(),"in",["104","102","101"]).orderBy(admin.firestore.FieldPath.documentId()).get();
The above will work and sort the documents based on documentId() after filtering based on documentId().
But it is not relevant to apply an orderBy() clause based on the documentId(), because without applying the orderBy() clause also yields the same result as, by default, Firestore query gives documents in ascending order of documentId(). That means the following also yields the same result -
const data=await db.collection("users").where(admin.firestore.FieldPath.documentId(),"in",["104","102","101"]).get();
Now Firestore doesn’t support to sort in descending order of documentId() which means the following will not work -
const data=await db.collection("users").where(admin.firestore.FieldPath.documentId(),"in",["104","102","101"]).orderBy(admin.firestore.FieldPath.documentId(),"desc").get();
This will ask to create an index -
The query requires an index. You can create it here:
But if you go there to create an index it will say -
__name__ only indexes are not supported.
Now let's come to your query. What you are trying to do is to filter based on documentId() and then orderBy() based on createdAt field which is not possible and it will give the following error-
inequality filter property and first sort order must be the same.
You may think to use two orderBy() clauses, something like this -
const data=await db.collection("users").where(admin.firestore.FieldPath.documentId(),"in",["104","102","101"]).orderBy(admin.firestore.FieldPath.documentId()).orderBy(“createdAt”
).get();
Which will not work and give the following error
order by clause cannot contain more fields after the key
I am not sure of your use case but it’s not a great idea to filter based on documentId(). If it is required to filter based on documentId(), I would suggest creating a field in the Firestore document which will contain the documentIds and filter based on that.
Now considering the title of the question, yes it is possible to use where() and orderBy() clauses for different fields in Firestore. There are some limitations and you need to stick to that -
If you include a filter with a range comparison (<, <=, >, >=), your first ordering must be on the same field.
const data=await db.collection("users").where(“number”,">=", “101”).orderBy(“createdAt”).get();
The above query doesn't work.
const data=await db.collection("users").where(“number”,">=", “101”).orderBy(“number”).get();
The above query works and you can still use further orderBy() on different fields, something like following -
const data=await db.collection("users").where(“number”,">=", “101”).orderBy(“number”).orderBy(“createdAt”).get();
You cannot order your query by any field included in an equality (=) or in clause.
const data=await db.collection("users").where(“number”,"in",["104","102","101"]).orderBy(“number”).get();
const data=await db.collection("users").where(“number”,"==", “101”).orderBy(“number”).get();
The above two don’t work.
Firestore's speed and efficiency comes almost ENTIRELY from it's use of indexes. Inequalities (INCLUDING in and not-in) are accomplished by sorting by the index, and using the value as a "cut-off" - thus REQUIRING (whether you want it or not) the orderby() to be on the same field as the inequality.
The "answer not working anymore" was never really working in the first place, as the above shows. If you aren't trying to paginate, do the obvious and "filter" by the document ID's and sort on the client.
BUT...
...more importantly, it is ALMOST NEVER useful nor performant to use documentId's to select from the database, unless you both copy it to a field, AND are looking for a SPECIFIC id. In almost all cases, it would be FAR better to use a query on another field (however you got the list of documentId's in the first place), then orderBy. Yes, the inequality/orderBy is a limitation, but it's there for a reason.
Going forward, an important design decision is to understand what questions you want your data to answer, and design your entire database schema to support those queries - this is the fundamental nature of NoSQL.
Problem:The other link that you have shared before perfectly works and the only solutions available is to create an index. However the reason you are not able to do a where and order with the above example is because you cannot create an index with the document id and createdAt.
Solution: To do so add the document id as one of the field say docID in the document then create an index with the fields docID and createdAt. This should be working for you.
Note: I have not physically tested this. Will update once I have checked it

How to do an 'AND' statement in Firebase or equivalent?

I need to do a query where I can show only specific data using an 'AND' statement or equivalent to it. I have taken the example which is displayed in the Firebase Documentation.
// Find all dinosaurs whose height is exactly 25 meters.
var ref = firebase.database().ref("dinosaurs");
ref.orderByChild("height").equalTo(25).on("child_added", function(snapshot) {
console.log(snapshot.key);
});
I understand this line is going to retrieve all the dinosaurs whose height is exactly 25, BUT, I need to show all dinosaurs whose height is '25' AND name is 'Dino'. Is there any way to retrieve this information?
Thanks in advance.
Actually firebase only supports filtering/ordering with one propery, but if you want to filter with more than one property like you said I want to filter with age and name, you have to use composite keys.
There is a third party library called querybase which gives you some capabilities of multy property filtering. See https://github.com/davideast/Querybase
You cannot query by multiple keys.
If you need to sort by two properties your options are:
Create a hybrid key. In reference to your example, if you wanted to get all 'Dino' and height '25' then you would create a hybrid name_age key which could look something like Dino_25. This will allow you to query and search for items with exactly the same value but you lose the ability for ordering (i.e. age less than x).
Perform one query on Firebase and the other client side. You can query by name on Firebase and then iterate through the results and keep the results that match age 25.
Without knowing much about your schema I would advise you to make sure you're flattening your data sufficiently. Often I have found that many multi-level queries can be solved by looking at how I'm storing the data. This is not always the case and sometimes you may just have to take one of the routes I have mentioned above.

Range query for MongoDB pagination

I want to implement pagination on top of a MongoDB. For my range query, I thought about using ObjectIDs:
db.tweets.find({ _id: { $lt: maxID } }, { limit: 50 })
However, according to the docs, the structure of the ObjectID means that "ObjectId values do not represent a strict insertion order":
The relationship between the order of ObjectId values and generation time is not strict within a single second. If multiple systems, or multiple processes or threads on a single system generate values, within a single second; ObjectId values do not represent a strict insertion order. Clock skew between clients can also result in non-strict ordering even for values, because client drivers generate ObjectId values, not the mongod process.
I then thought about querying with a timestamp:
db.tweets.find({ created: { $lt: maxDate } }, { limit: 50 })
However, there is no guarantee the date will be unique — it's quite likely that two documents could be created within the same second. This means documents could be missed when paging.
Is there any sort of ranged query that would provide me with more stability?
It is perfectly fine to use ObjectId() though your syntax for pagination is wrong. You want:
db.tweets.find().limit(50).sort({"_id":-1});
This says you want tweets sorted by _id value in descending order and you want the most recent 50. Your problem is the fact that pagination is tricky when the current result set is changing - so rather than using skip for the next page, you want to make note of the smallest _id in the result set (the 50th most recent _id value and then get the next page with:
db.tweets.find( {_id : { "$lt" : <50th _id> } } ).limit(50).sort({"_id":-1});
This will give you the next "most recent" tweets, without new incoming tweets messing up your pagination back through time.
There is absolutely no need to worry about whether _id value is strictly corresponding to insertion order - it will be 99.999% close enough, and no one actually cares on the sub-second level which tweet came first - you might even notice Twitter frequently displays tweets out of order, it's just not that critical.
If it is critical, then you would have to use the same technique but with "tweet date" where that date would have to be a timestamp, rather than just a date.
Wouldn't a tweet "actual" timestamp (i.e. time tweeted and the criteria you want it sorted by) be different from a tweet "insertion" timestamp (i.e. time added to local collection). This depends on your application, of course, but it's a likely scenario that tweet inserts could be batched or otherwise end up being inserted in the "wrong" order. So, unless you work at Twitter (and have access to collections inserted in correct order), you wouldn't be able to rely just on $natural or ObjectID for sorting logic.
Mongo docs suggest skip and limit for paging:
db.tweets.find({created: {$lt: maxID}).
sort({created: -1, username: 1}).
skip(50).limit(50); //second page
There is, however, a performance concern when using skip:
The cursor.skip() method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result. As offset increases, cursor.skip() will become slower and more CPU intensive.
This happens because skip does not fit into the MapReduce model and is not an operation that would scale well, you have to wait for a sorted collection to become available before it can be "sliced". Now limit(n) sounds like an equally poor method as it applies a similar constraint "from the other end"; however with sorting applied, the engine is able to somewhat optimize the process by only keeping in memory n elements per shard as it traverses the collection.
An alternative is to use range based paging. After retrieving the first page of tweets, you know what the created value is for the last tweet, so all you have to do is substitute the original maxID with this new value:
db.tweets.find({created: {$lt: lastTweetOnCurrentPageCreated}).
sort({created: -1, username: 1}).
limit(50); //next page
Performing a find condition like this can be easily parallellized. But how to deal with pages other than the next one? You don't know the begin date for pages number 5, 10, 20, or even the previous page! #SergioTulentsev suggests creative chaining of methods but I would advocate pre-calculating first-last ranges of the aggregate field in a separate pages collection; these could be re-calculated on update. Furthermore, if you're not happy with DateTime (note the performance remarks) or are concerned about duplicate values, you should consider compound indexes on timestamp + account tie (since a user can't tweet twice at the same time), or even an artificial aggregate of the two:
db.pages.
find({pagenum: 3})
> {pagenum:3; begin:"01-01-2014#BillGates"; end:"03-01-2014#big_ben_clock"}
db.tweets.
find({_sortdate: {$lt: "03-01-2014#big_ben_clock", $gt: "01-01-2014#BillGates"}).
sort({_sortdate: -1}).
limit(50) //third page
Using an aggregate field for sorting will work "on the fold" (although perhaps there are more kosher ways to deal with the condition). This could be set up as a unique index with values corrected at insert time, with a single tweet document looking like
{
_id: ...,
created: ..., //to be used in markup
user: ..., //also to be used in markup
_sortdate: "01-01-2014#BillGates" //sorting only, use date AND time
}
The following approach wil work even if there are multiple documents inserted/updated at same millisecond even if from multiple clients (which generates ObjectId). For simiplicity, In following queries I am projecting _id, lastModifiedDate.
First page, fetch the result Sorted by modifiedTime (Descending), ObjectId (Ascending) for fist page.
db.product.find({},{"_id":1,"lastModifiedDate":1}).sort({"lastModifiedDate":-1, "_id":1}).limit(2)
Note down the ObjectId and lastModifiedDate of the last record fetched in this page. (loid, lmd)
For sencod page, include query condition to search if (lastModifiedDate = lmd AND oid > loid ) OR (lastModifiedDate < loid)
db.productfind({$or:[{"lastModifiedDate":{$lt:lmd}},{"_id":1,"lastModifiedDate":1},{$and:[{"lastModifiedDate":lmd},{"_id":{$gt:loid}}]}]},{"_id":1,"lastModifiedDate":1}).sort({"lastModifiedDate":-1, "_id":1}).limit(2)
repeat same for subsequent pages.
ObjectIds should be good enough for pagination if you limit your queries to the previous second (or don't care about the subsecond possibility of weirdness). If that is not good enough for your needs then you will need to implement an ID generation system that works like an auto-increment.
Update:
To query the previous second of ObjectIds you will need to construct an ObjectID manually.
See the specification of ObjectId http://docs.mongodb.org/manual/reference/object-id/
Try using this expression to do it from a mongos.
{ _id :
{
$lt : ObjectId(Math.floor((new Date).getTime()/1000 - 1).toString(16)+"ffffffffffffffff")
}
}
The 'f''s at the end are to max out the possible random bits that are not associated with a timestamp since you are doing a less than query.
I recommend during the actual ObjectId creation on your application server rather than on the mongos since this type of calculation can slow you down if you have many users.
I have build a pagination using mongodb _id this way.
// import ObjectId from mongodb
let sortOrder = -1;
let query = []
if (prev) {
sortOrder = 1
query.push({title: 'findTitle', _id:{$gt: ObjectId('_idValue')}})
}
if (next) {
sortOrder = -1
query.push({title: 'findTitle', _id:{$lt: ObjectId('_idValue')}})
}
db.collection.find(query).limit(10).sort({_id: sortOrder})

Meteor: Mongo sort on subkey

In Meteor's "Parties" example, there is a Party model which is represented by a document of the following schema:
Each party is represented by a document in the Parties collection:
owner: user id
x, y: Number (screen coordinates in the interval [0, 1])
title, description: String
public: Boolean
invited: Array of user id's that are invited (only if !public)
rsvps: Array of objects like {user: userId, rsvp: "yes"} (or "no"/"maybe")
I would like to find all Parties, and sort by the "rsvps" based on a specific user. For example, something like this:
Meteor.find({sort: {rsvps: {user: 'myself', rsvp: 'yes'}}})
But of course, this does not work, as it does not follow the sort specifier syntax. Also, there is a note in the same docs that say Minimongo (the local Mongo implementation on the client) does not support sorting on subkeys. However, I don't think the issue is simply sorting on subkeys, as I need to find a specific subkey and then sort on a different sibling subkey (whether they are attending or not, the rsvps.rsvp subkey).
Are there any ways, or workarounds, achieve the sorted collection?
The minimongo sort file contains this comment :
// XXX sort does not yet support subkeys ('a.b') .. fix that!
So sadly it isn't supported at the moment. Although I have this pull request from which you can take the needed parts to implement this feature.
Check it out here :
https://github.com/meteor/meteor/pull/443
Lander Van Breda
Another option is to get the data out of the Cursor with '.fetch()' and then use something like underscore.js's _.sortBy to sort the resulting Array.
The resulting custom sorted array can then be passed on to handlebars and will retain its reactive features as well in Meteor.

What is the maximum value for a compound CouchDB key?

I'm using what seems to be a common trick for creating a join view:
// a Customer has many Orders; show them together in one view:
function(doc) {
if (doc.Type == "customer") {
emit([doc._id, 0], doc);
} else if (doc.Type == "order") {
emit([doc.customer_id, 1], doc);
}
}
I know I can use the following query to get a single customer and all related Orders:
?startkey=["some_customer_id"]&endkey=["some_customer_id", 2]
But now I've tied my query very closely to my view code. Is there a value I can put where I put my "2" to more clearly say, "I want everything tied to this Customer"? I think I've seen
?startkey=["some_customer_id"]&endkey=["some_customer_id", {}]
But I'm not sure that {} is certain to sort after everything else.
Credit to cmlenz for the join method.
Further clarification from the CouchDB wiki page on collation:
The query startkey=["foo"]&endkey=["foo",{}] will match most array keys with "foo" in the first element, such as ["foo","bar"] and ["foo",["bar","baz"]]. However it will not match ["foo",{"an":"object"}]
So {} is late in the sort order, but definitely not last.
I have two thoughts.
Use timestamps
Instead of using simple 0 and 1 for their collation behavior, use a timestamp that the record was created (assuming they are part of the records) a la [doc._id, doc.created_at]. Then you could query your view with a startkey of some sufficiently early date (epoch would probably work), and an endkey of "now", eg date +%s. That key range should always include everything, and it has the added benefit of collating by date, which is probably what you want anyways.
or, just don't worry about it
You could just index by the customer_id and nothing more. This would have the nice advantage of being able to query using just key=<customer_id>. Sure, the records won't be collated when they come back, but is that an issue for your application? Unless you are expecting tons of records back, it would likely be trivial to simply pluck the customer record out of the list once you have the data retrieved by your application.
For example in ruby:
customer_records = records.delete_if { |record| record.type == "customer" }
Anyways, the timestamps is probably the more attractive answer for your case.
Rather than trying to find the greatest possible value for the second element in your array key, I would suggest instead trying to find the least possible value greater than the first: ?startkey=["some_customer_id"]&endkey=["some_customer_id\u0000"]&inclusive_end=false.
CouchDB is mostly written in Erlang. I don't think there would be an upper limit for a string compound/composite key tuple sizes other than system resources (e.g. a key so long it used all available memory). The limits of CouchDB scalability are unknown according to the CouchDB site. I would guess that you could keep adding fields into a huge composite primary key and the only thing that would stop you is system resources or hard limits such as maximum integer sizes on the target architecture.
Since CouchDB stores everything using JSON, it is probably limited to the largest number values by the ECMAScript standard.All numbers in JavaScript are stored as a floating-point IEEE 754 double. I believe the 64-bit double can represent values from - 5e-324 to +1.7976931348623157e+308.
It seems like it would be nice to have a feature where endKey could be inclusive instead of exclusive.
This should do the trick:
?startkey=["some_customer_id"]&endkey=["some_customer_id", "\uFFFF"]
This should include anything that starts with a character less than \uFFFF (all unicode characters)
http://wiki.apache.org/couchdb/View_collation

Categories

Resources