Firestore OR query - javascript

Imagine I have a collection in Firestore with objects of the following format:
{
/* ... */
ownerId: "id-1",
memberIds: [
"id-2",
"id-3",
"id-4"
]
}
How can I query Firestore for all documents where the string id-1 is present in the ownerId field OR the memberIds field?
I only found AND queries, like so:
return collection('whatever')
.where('memberIds', 'array-contains', 'id-1')
.where('ownerId', '==', 'id-1');
But those will only return me the documents where id-1 is included in both the ownerId field and the memberIds array.
I understand that I could perform multiple queries and join the results, but that would make pagination and ordering too much of a hassle to implement.

It's not possible in Firestore to execute an OR query like the one you describe.
One solution is to denormalize your data (a very common approach in the NoSQL world) and add, to the document, an extra field which "concatenates" the two other fields, as follows:
{
/* ... */
ownerId: "id-1",
memberIds: [
"id-2",
"id-3",
"id-4"
],
ownersAndMemberIds: [
"id-1",
"id-2",
"id-3",
"id-4"
]
}
Then you can easily query the doc with
return collection('whatever')
.where('ownersAndMemberIds', 'array-contains', 'id-1');
Of course it requires that you maintain this array aligned with the other fields, but this is not difficult. Either you do it from your front-end when you update one of the two "main" fields, or through a Cloud Function which is triggered on any change to the doc.

Related

Populate field then find - mongoose

I have a Cheques and a Payees collection, every cheque has its corresponding Payee ID.
What I'm trying to do is to write some queries on cheques, but I need to preform the searching after populating the payee (to get the name)
const search = req.query.search || "";
const cheques = await Cheque
.find({
isCancelled: false,
dueDate: { $gte: sinceDate, $lte: tillDate }
})
.select("_id serial dueDate value payee")
.skip(page * limit)
.limit(limit)
.sort({ dueDate: -1, serial: 1 })
.populate({
path: "payee",
select: "name"
})
I guess what I'm trying do is fit this somewhere in my code,
match: {
name: { $regex: search, $options: "i" }
},
I have tried to put the match within the populate, but then it will still find all cheques even if they don't satisfy the population match but populate as null.
I hate this answer and I hope someone is going to post a better one, but I've surfed the web for that with no luck.
The only method I was able to find is to use the $lookup method in aggregation.
So you'll have to change your code from calling .find() to .aggregate().
It's not the sad news, it's great, stable and no problems at all.
but I hated that because it's going to change some patterns you might be following in your code.
const search = req.query.search || "";
const cheques = await Cheque
.aggregate([
{
$lookup: { // similar to .populate() in mongoose
from: 'payees', // the other collection name
localField: 'payee', // the field referencing the other collection in the curent collection
foreignField: '_id', // the name of the column where the cell in the current collection can be found in the other collection
as: 'payee' // the field you want to place the db response in. this will overwrite payee id with the actual document in the response (it only writes to the response, not on the database, no worries)
},
{ // this is where you'll place your filter object you used to place inside .find()
$match: {
isCancelled: false,
dueDate: { $gte: sinceDate, $lte: tillDate }
'payee.branch': 'YOUR_FILTER', // this is how you access the actual object from the other collection after population, using the dot notation but inside a string.
}
},
{ // this is similar to .select()
$project: {_id: 1, serial: 1, dueDate: 1, value: 1, payee: 1}
},
{
$unwind: '$payee' // this picks the only object in the field payee: [ { payeeDoc } ] --> { payeeDoc }
}
])
.skip(page * limit)
.limit(limit)
.sort({ dueDate: -1, serial: 1 })
Notice how you can no longer chain .select() and .populate() on the model query the way you used to do it on .find(), because now you're using .aggregate() which returns a different class instance in mongoose.
you can call .projcet() instead of doing it inside the aggregation array if you want to, but as far as I know, you can't use the .select() method.
My opinion-based solution to this problem is to include the payee information you need for filtering in the Cheque collection.
In my senario, this happen when I was filtering for the sake of my users-roles and permissions, so someone can not see what another one is seeing.
It's up to you, but this makes it easier later when you want to generate the reports (I assume you're working on a payment service).
The populate() feature provided by mongoose first fetches all the Cheques with given conditions and then makes another query to payee with _ids to populate the fields you wanted.
https://mongoosejs.com/docs/api.html#query_Query-populate
by putting match in populate you're filtering which cheques need to be populated but not the cheques themselves.
A simple solution for this is to filter the cheques which are populated as null and return them for your use.
If you see more queries of this sort and/or the collection is huge, it's better you add the payee name in the Cheques collection itself if that fits your purpose.

Pass query criteria to mongoDB aggregation

our current setup is: SPA frontend, Azure functions with mongoose middleware, MongoDB
(Maybe first read the question***)
Since we have a lot of documents in our DB and our customer wants to query them we are facing the following problem:
The user is assigned to his organization. He wants to search for Doc1s he has not responded to.
Doc1
{
_id
organization -> partitionKey
content
}
By creating doc2 with reference to doc1 he can respond.
Doc2
{
_id
organization -> partitionKey
Doc1ref
content
}
We have a 1:n relationship.
At the moment we filter just by query criteria of doc1 with limit and skip options.
But the new requirement is to filter the same way by referring doc2s.
I was thinking of:
Doing it in my code => Problem: after we have read with limit=100 and I filter it by my code, the result is not 100 anymore.
Extending doc1 by doc2 arrays => Must be the last option
Dynamic aggregation, Prepared in the code and executed at runtime => Don't want to user dynamic aggregations and the benefits of mongoose are almost lost.
Create a MongoDB view with lookup aggregation (populating doc1 by doc1.respondedOrganizations) => Problem is see here is the performance. When searching a lot of documents and then joining them by a non partitionKey.
*** So, I come to my question:
Is it possible to pass a virtual (not existing) query criteria...
doc1.find({ alreadyResponded : my.organization } )
...and use it as input variable in an aggregation
{
$lookup: {
from: Doc2s,
localField: _id,
foreignField: Doc1ref,
as: < output array field >
pipeline: [{
$match: {
$organization: {
$eq: $$alreadyResponded
}]
}
}
It would reduce query performance extremly.
Thanks

Resolve nested GraphQL queries that don't match DB schema

TLDR: How does one resolve nested GraphQL queries where the GQL Schema departs from the data stored inside of a MongoDB database?
We've got an application where each user in our DB has array of foreign keys that reference other documents, in this case "pets." These pets are in a separate collection.
{
"humans": [
{
"id": "1",
"name": "bob",
"pets": [
"jBWMVGjm50l5LGwepDoty1",
"jBWMVGjm50l5LGwepDoty12"
]
}
]
}
I've got a GraphQL API in front of this DB and I'm trying to write my resolvers to handle nested queries. The problem is that our GraphQL schema does not match the DB schema. Inside of the Human type, the pets field expects an array of pets, like this:
type Human {
id: ID!
name: String!
pets: [Pet!] # This is an array of pet objects...
gender: String!
hair: String!
favoriteNum: Int!
alive: Boolean!
createdAt: Int!
}
Currently, the humans resolver will query for a human by his/her ID, and return that human, like so:
human(parent, { input }, { models }) {
return models.Human.findOne({ id: input.id });
}
The issue here, obviously, is that the returned human from the DB does not conform to the GQL schema. The array of "pets" is not an array of objects, it's an array of IDs. What is the proper way of resolving a query like this?
We've tried adding another DB call for the human's pets inside of the humans resolver. The problem here is that if someone makes a query for just the human's name, our resolver would have to go and fetch all of the pets data, even though our user did not request it...
The same problem would crop up again, however, if our pets had foreign keys! How do we resolve this issue?
That looks like a design question. Graphql should always resolve a graph, so your query caller can be sure to receive all reference to this object. Let's take your human type and look at it as caller. When the caller sees there are pets in the human object he thinks he receives this object by doing a query like
query(
human(id) {
pets {
stuffOnlyPetshave
}
}
)
So one approach is to query the pets from your database in the human resolver. That should be default way to get the complete graph.
But if you want to avoid the database query on every human query, then you should do a conditional pet query in you frontend.

How to do a simple join in GraphQL?

I am very new in GraphQL and trying to do a simple join query. My sample tables look like below:
{
phones: [
{
id: 1,
brand: 'b1',
model: 'Galaxy S9 Plus',
price: 1000,
},
{
id: 2,
brand: 'b2',
model: 'OnePlus 6',
price: 900,
},
],
brands: [
{
id: 'b1',
name: 'Samsung'
},
{
id: 'b2',
name: 'OnePlus'
}
]
}
I would like to have a query to return a phone object with its brand name in it instead of the brand code.
E.g. If queried for the phone with id = 2, it should return:
{id: 2, brand: 'OnePlus', model: 'OnePlus 6', price: 900}
TL;DR
Yes, GraphQL does support a sort of pseudo-join. You can see the books and authors example below running in my demo project.
Example
Consider a simple database design for storing info about books:
create table Book ( id string, name string, pageCount string, authorId string );
create table Author ( id string, firstName string, lastName string );
Because we know that Author can write many Books that database model puts them in separate tables. Here is the GraphQL schema:
type Query {
bookById(id: ID): Book
}
type Book {
id: ID
title: String
pageCount: Int
author: Author
}
type Author {
id: ID
firstName: String
lastName: String
}
Notice there is no authorId on the Book type but a type Author. The database authorId column on the book table is not exposed to the outside world. It is an internal detail.
We can pull back a book and it's author using this GraphQL query:
{
bookById(id:"book-1"){
id
title
pageCount
author {
firstName
lastName
}
}
}
Here is a screenshot of it in action using my demo project:
The result nests the Author details:
{
"data": {
"book1": {
"id": "book-1",
"title": "Harry Potter and the Philosopher's Stone",
"pageCount": 223,
"author": {
"firstName": "Joanne",
"lastName": "Rowling"
}
}
}
}
The single GQL query resulted in two separate fetch-by-id calls into the database. When a single logical query turns into multiple physical queries we can quickly run into the infamous N+1 problem.
The N+1 Problem
In our case above a book can only have one author. If we only query one book by ID we only get a "read amplification" against our database of 2x. Imaging if you can query books with a title that starts with a prefix:
type Query {
booksByTitleStartsWith(titlePrefix: String): [Book]
}
Then we call it asking it to fetch the books with a title starting with "Harry":
{
booksByTitleStartsWith(titlePrefix:"Harry"){
id
title
pageCount
author {
firstName
lastName
}
}
}
In this GQL query we will fetch the books by a database query of title like 'Harry%' to get many books including the authorId of each book. It will then make an individual fetch by ID for every author of every book. This is a total of N+1 queries where the 1 query pulls back N records and we then make N separate fetches to build up the full picture.
The easy fix for that example is to not expose a field author on Book and force the person using your API to fetch all the authors in a separate query authorsByIds so we give them two queries:
type Query {
booksByTitleStartsWith(titlePrefix: String): [Book] /* <- single database call */
authorsByIds(authorIds: [ID]) [Author] /* <- single database call */
}
type Book {
id: ID
title: String
pageCount: Int
}
type Author {
id: ID
firstName: String
lastName: String
}
The key thing to note about that last example is that there is no way in that model to walk from one entity type to another. If the person using your API wants to load the books authors the same time they simple call both queries in single post:
query {
booksByTitleStartsWith(titlePrefix: "Harry") {
id
title
}
authorsByIds(authorIds: ["author-1","author-2","author-3") {
id
firstName
lastName
}
}
Here the person writing the query (perhaps using JavaScript in a web browser) sends a single GraphQL post to the server asking for both booksByTitleStartsWith and authorsByIds to be passed back at once. The server can now make two efficient database calls.
This approach shows that there is "no magic bullet" for how to map the "logical model" to the "physical model" when it comes to performance. This is known as the Object–relational impedance mismatch problem. More on that below.
Is Fetch-By-ID So Bad?
Note that the default behaviour of GraphQL is still very helpful. You can map GraphQL onto anything. You can map it onto internal REST APIs. You can map some types into a relational database and other types into a NoSQL database. These can be in the same schema and the same GraphQL end-point. There is no reason why you cannot have Author stored in Postgres and Book stored in MongoDB. This is because GraphQL doesn't by default "join in the datastore" it will fetch each type independently and build the response in memory to send back to the client. It may be the case that you can use a model that only joins to a small dataset that gets very good cache hits. You can then add caching into your system and not have a problem and benefit from all the advantages of GraphQL.
What About ORM?
There is a project called Join Monster which does look at your database schema, looks at the runtime GraphQL query, and tries to generate efficient database joins on-the-fly. That is a form of Object Relational Mapping which sometimes gets a lot of "OrmHate". This is mainly due to Object–relational impedance mismatch problem.
In my experience, any ORM works if you write the database model to exactly support your object API. In my experience, any ORM tends to fail when you have an existing database model that you try to map with an ORM framework.
IMHO, if the data model is optimised without thinking about ORM or queries then avoid ORM. For example, if the data model is optimised to conserve space in classical third normal form. My recommendation there is to avoid querying the main data model and use the CQRS pattern. See below for an example.
What Is Practical?
If you do want to use pseudo-joins in GraphQL but you hit an N+1 problem you can write code to map specific "field fetches" onto hand-written database queries. Carefully performance test using realist data whenever any fields return an array.
Even when you can put in hand written queries you may hit scenarios where those joins don't run fast enough. In which case consider the CQRS pattern and denormalise some of the data model to allow for fast lookups.
Update: GraphQL Java "Look-Ahead"
In our case we use graphql-java and use pure configuration files to map DataFetchers to database queries. There is a some generic logic that looks at the graph query being run and calls parameterized sql queries that are in a custom configuration file. We saw this article Building efficient data fetchers by looking ahead which explains that you can inspect at runtime the what the person who wrote the query selected to be returned. We can use that to "look-ahead" at what other entities we would be asked to fetch to satisfy the entire query. At which point we can join the data in the database and pull it all back efficiently in the a single database call. The graphql-java engine will still make N in-memory fetches to our code. The N requests to get the author of each book are satisfied by simply lookups in a hashmap that we loaded out of the single database call that joined the author table to the books table returning N complete rows efficiently.
Our approach might sound a little like ORM yet we did not make any attempt to make it intelligent. The developer creating the API and our custom configuration files has to decide which graphql queries will be mapped to what database queries. Our generic logic just "looks-ahead" at what the runtime graphql query actually selects in total to understand all the database columns that it needs to load out of each row returned by the SQL to build the hashmap. Our approach can only handle parent-child-grandchild style trees of data. Yet this is a very common use case for us. The developer making the API still needs to keep a careful eye on performance. They need to adapt both the API and the custom mapping files to avoid poor performance.
GraphQL as a query language on the front-end does not support 'joins' in the classic SQL sense.
Rather, it allows you to pick and choose which fields in a particular model you want to fetch for your component.
To query all phones in your dataset, your query would look like this:
query myComponentQuery {
phone {
id
brand
model
price
}
}
The GraphQL server that your front-end is querying would then have individual field resolvers - telling GraphQL where to fetch id, brand, model etc.
The server-side resolver would look something like this:
Phone: {
id(root, args, context) {
pg.query('Select * from Phones where name = ?', ['blah']).then(d => {/*doStuff*/})
//OR
fetch(context.upstream_url + '/thing/' + args.id).then(d => {/*doStuff*/})
return {/*the result of either of those calls here*/}
},
price(root, args, context) {
return 9001
},
},

AngularFire2 with Firebase Realtime DB - Nested Data Query Angular 6 [duplicate]

The structure of the table is:
chats
--> randomId
-->--> participants
-->-->--> 0: 'name1'
-->-->--> 1: 'name2'
-->--> chatItems
etc
What I am trying to do is query the chats table to find all the chats that hold a participant by a passed in username string.
Here is what I have so far:
subscribeChats(username: string) {
return this.af.database.list('chats', {
query: {
orderByChild: 'participants',
equalTo: username, // How to check if participants contain username
}
});
}
Your current data structure is great to look up the participants of a specific chat. It is however not a very good structure for looking up the inverse: the chats that a user participates in.
A few problems here:
you're storing a set as an array
you can only index on fixed paths
Set vs array
A chat can have multiple participants, so you modelled this as an array. But this actually is not the ideal data structure. Likely each participant can only be in the chat once. But by using an array, I could have:
participants: ["puf", "puf"]
That is clearly not what you have in mind, but the data structure allows it. You can try to secure this in code and security rules, but it would be easier if you start with a data structure that implicitly matches your model better.
My rule of thumb: if you find yourself writing array.contains(), you should be using a set.
A set is a structure where each child can be present at most once, so it naturally protects against duplicates. In Firebase you'd model a set as:
participants: {
"puf": true
}
The true here is really just a dummy value: the important thing is that we've moved the name to the key. Now if I'd try to join this chat again, it would be a noop:
participants: {
"puf": true
}
And when you'd join:
participants: {
"john": true,
"puf": true
}
This is the most direct representation of your requirement: a collection that can only contain each participant once.
You can only index known properties
With the above structure, you could query for chats that you are in with:
ref.child("chats").orderByChild("participants/john").equalTo(true)
The problem is that this requires you to define an index on `participants/john":
{
"rules": {
"chats": {
"$chatid": {
"participants": {
".indexOn": ["john", "puf"]
}
}
}
}
}
This will work and perform great. But now each time someone new joins the chat app, you'll need to add another index. That's clearly not a scaleable model. We'll need to change our data structure to allow the query you want.
Invert the index - pull categories up, flattening the tree
Second rule of thumb: model your data to reflect what you show in your app.
Since you are looking to show a list of chat rooms for a user, store the chat rooms for each user:
userChatrooms: {
john: {
chatRoom1: true,
chatRoom2: true
},
puf: {
chatRoom1: true,
chatRoom3: true
}
}
Now you can simply determine your list of chat rooms with:
ref.child("userChatrooms").child("john")
And then loop over the keys to get each room.
You'll like have two relevant lists in your app:
the list of chat rooms for a specific user
the list of participants in a specific chat room
In that case you'll also have both lists in the database.
chatroomUsers
chatroom1
user1: true
user2: true
chatroom2
user1: true
user3: true
userChatrooms
user1:
chatroom1: true
chatroom2: true
user2:
chatroom1: true
user2:
chatroom2: true
I've pulled both lists to the top-level of the tree, since Firebase recommends against nesting data.
Having both lists is completely normal in NoSQL solutions. In the example above we'd refer to userChatrooms as the inverted index of chatroomsUsers.
Cloud Firestore
This is one of the cases where Cloud Firestore has better support for this type of query. Its array-contains operator allows filter documents that have a certain value in an array, while arrayRemove allows you to treat an array as a set. For more on this, see Better Arrays in Cloud Firestore.

Categories

Resources