I have the following generic schema to represent different types of information.
var Record = new Schema (
{
type: {type: String}, // any string (foo, bar, foobar)
value: {type: String}, // any string value
o_id: {type:String}
}
);
Some of the records based on this schema have:
type="car"
value="ferrari" or
value="ford"
Some records have type "topspeed" with value "210" but they always share o_id (e.g. related "ferrari has this topspeed"). So if "ferrari has top speed 300", then both records have same o_id.
How can I make query to find "ferrari with topspeed 300" when I don't know o_id?
The only solution I found out is to select cars "ferrari" first and then with knowledge of all o_id for all "ferrari" use it to find topspeed.
In pseudocode:
Record.find({type:"car", value:"ferrari"}, function(err, docs)
{
var condition = [];// create array of all found o_id;
Record.find({type:"topspeed", value:"300"}...
}
I know that some merging or joining might not be possible, but what about some chaining these conditions to avoid recursion?
EDIT:
Better example:
Lets imagine I have a HTML document that contains DIV elements with certain id (o_id).
Now each div element can contain different type of microdata items (Car, Animal...).
Each microdata item has different properties ("topspeed", "numberOfLegs"...) based on the type (Car has a topspeed, animal numberOfLegs)
Each property has some value (310 kph, 4 legs)
Now I'm saving these microdata items to the database but in a general way, agnostic of the type and values they contain since the user can define custom schemas from Car, to Animal, to pretty much anything). For that I defined the Record schema: type consists of "itemtype_propertyname" and value is value of the property.
I would eventually like to query "Give me o_id(s) of all DIV elements that contain item Ferrari and item Dog" at the same time.
The reason for this general approach is to allow anyone the ability to define custom schema and corresponding parser that stores the values.
But I will have only one search engine to find all different schemas and value combinations that will treat all possible schemas as a single definition.
I think it'd be far better to combine all records that share an o_id into a single record. E.g.:
{
_id: ObjectId(...),
car: "ferarri",
topspeed: 300
}
Then you won't have this problem, and your schema will be more efficient both in speed and storage size. This is how MongoDB is intended to be used -- heterogenous data can be stored in a single collection, because MongoDB is schemaless. If you continue with your current design, then no, there's no way to avoid multiple round-trips to the database.
Related
I am quite new to using databases and have only used MongoDB so far. I was planning on making a app for a shop to create their menu(s).
There are several interfaces I would define beforehand.
interface FoodItem {
id: number
name: string,
price: number
}
interface Category {
id: number,
name: string,
food: FoodItem[]
}
interface Menu {
id: number
name:string,
categories: Category[]
}
From the above schemas, I was wondering if this is ok for a Postgres database, where all the data resides in the Menus table, and fetching a Menu will yield all nested Categories and FoodItems.
I am confused because from other examples of such data models, the array of Categories will simply be replaced by an array of IDs that points to their respective Categories and the same for FoodItems as well. In that case, to fetch a Menu, the database will also have to fetch, say 3 other Categories and in those categories fetch another say 10 other FoodItems via their IDs. Wouldn't that be a performance issue since the database has to query each nested item in the array, as opposed to doing it all in one single query fetch from the root Menu?
For my app that I'm designing, each Category and FoodItem is unique to their parents and will not be shared with other Categories/Menus. In that case there wouldn't be duplicate entries of FoodItem when using the first data model. Of course I would not mind using the second data model since it is much cleaner and easy to reference without deeply nested objects.
So for the second data model, would there a big performance decrease when my dataset starts to grow larger and larger?
I currently have a few issues with my Firestore querying technique. As per this stackoverflow post I made recently, Querying with two array with firestore security rules
The answer proposed to add the the "ids" into a object, with the key as the id, and the value simply being "true". I have completed this, and now my structure looks like so:
This leaves me with this query:
db.collection('Depots')
.where(`products.${productId}`, '==', true)
.where(`users.${userId}`, '==', true)
.where('created', '>', 1585998560500)
.orderBy('created', 'asc')
.get();
This query leaves me with throwing an error, asking to create an index:
The query requires an index. You can create it here: ...
However, this tries to index the specific object key, i.e. QXooVYGBIFWKo6C so products.QXooVYGBIFWKo6C. Which is certianly not what I want, as this query changes, and can have an infinite number of possibilities, which means I would have to create another index for each key entry in order to query it.
Is there any way to solve this issue? I am assuming it needs to index this query due to the different operators used in the query, so I was wondering if there were any workarounds to this issue.
Thank you very much in advance.
What you have here is a map field, for which indexes should usually be created automatically.
That indeed means that you'll have as many indexes as you have products, which means:
You are limited in how many products you can have, as there is a maximum of 40,000 index entries per document.
You pay more per document, as you pay for the storage of each index.
If these are not what you want, you'll have to switch back to your original model, with the query limitations you had there. There doesn't seem to be a solution that fits both of your requirements.
After our discussion in chat, this is the starting point I would suggest. Who knows what the end architecture would look like, but I think this or very close to this. You say that a user can exist in multiple depots at the same time and multiple depots can contain the same products, also at the same time. You also said that a depot can never have more than 40 users at a given time, so an array of 40 users would certainly not encroach on Firestore's document limit of 1,048,576 bytes.
[collection]
<documentId>
- field: value
[depots]
<UUID>
- depotId: string "depot456"
- productCount: num 5,000
<UUID>
- depotId: string "depot789"
- productCount: num 4,500
[products]
<UUID>
- productId: string "lotion123"
- depotId: string "depot456"
- users: [string] ["user10", "user27", "user33"]
<UUID>
- productId: string "lotion123"
- depotId: string "depot789"
- users: [string] ["user10", "user17", "user50"]
[users]
<userId>
- depots: [string] ["depot456", "depot999"]
<userId>
- depots: [string] ["depot333", "depot999"]
In NoSQL, storage is cheap and computation isn't so denormalize your data as much as you need to make your queries possible and efficient (fast and cheap).
To find all depots in a single query where user10 and lotion123 are both true, query the products collection where productId equals x and users array-contains y and collect the depotId values from those results. If you want to preserve the array-contains operation for something else, you'd have to denormalize your data further (replace the array for a single user). Or you could split this query into two separate queries.
With this model, when a user leaves a depot, get all products where users array-contains that user and remove that userId from the array. And when a user joins a depot, get all products where depotId equals x and append that userId to the array.
Watch this video, and others by Rick, to get a solid handle on NoSQL: https://www.youtube.com/watch?v=HaEPXoXVf2k
#danwillm If you are not sure about the number of users and products then your DB structure seems unfit for this situation because there are size and length limitations of the firestore document.
You should rather create a separate collection for products and users i.e normalize your data and have a reference for the user in the product collection.
User :
{
userId: documentId,
name: John,
...otherInfo
}
Product :
{
productId: documentId,
createdBy: userId,
createdOn:date,
productName:"exa",
...otherInfo
}
This way you there will be the size of the document would be limited, i.e try avoiding using maps/arrays in firestore if you are not sure about there size.
Also, in this case, the number of queries would be increased but you don't need many indexes in this case.
I am trying to find which approach is more scalable.
I have a user who has requested a seat in a carpool trip, and the user needs to be able to see all trips that apply to them. My models look like this:
var UserSchema = new mongoose.Schema({
id: String,
name: String,
trips: [String] // An array of strings, which holds the id of trips
});
var TripSchema = new mongoose.Schema({
id: String,
description: String,
passengers: [String] // An array of strings, which holds the id of users
});
So when the user goes to see all trips that apply to them, my backend will search through all the trips in the Mongo database.
I am deciding between 2 approaches:
Search through all trips and return the trips where the user's id is in the passengers array
Search through all trips and return the trips with an id matching an id in the user's trips array.
I believe approach #2 is better because it does not have to search deeper in the Trip model. I am just seeking confirmation and wondering if there is anything else I should consider.
If you don't do big data, I would simply say that it does not matter - both are good enough, but if you really have millions of queries on millions of users and trips...
for option 1 you only have one query but you would have to make sure, that you have your field passengers indexed, so you would need to maintain another index for this to be efficient. Another index impacts your write performance.
for option 2 you always have to do two queries.
First query for the user object in the user collection, then do an in style query to load the trip items that match any of those tripIds from user.trips. You will query on on the _id field which is always indexed. Of course, when you always load your user anyway there is only one query which really counts.
You would also have to consider whether write or read performance matters more. Your model is pretty inefficient for write because for every new trip you need to update two collections (the trip and the user). So currently you double your writes and usually writes are more expensive than reads.
And finally: to have easy and maintainable code is mostly more imporant than a bit of performance --> just use the mongoose populate feature, and all is done automatically for you. Don't store the references as Strings but as type ObjectId and use the ref keywoard in your model.
I am learning nodejs along with mongodb currently, and there are two things that confuse me abit.
(1),
When a new Schema and model name are used (not in db), the name is changed into its plural form. Example:
mongoose.model('person', personSchema);
in the database, the table will be called "people" instead.
Isn't this easy to confuse new developer, why has they implemented it this way?
(2),
Second thing is that whenever I want to refer to an existing model in mongoDb (assume that in db, a table called people exists). Then in my nodejs code, I still have to define a Schema in order to create a model that refer to the table.
personSchema = new mongoose.Schema({});
mongoose.model('person',personSchema);
The unusual thing is, it does not seem to matter how I define the schema, it can just be empty like above, or fill with random attribute, yet the model will always get the right table and CRUD operations performs normally.
Then what is the usage of Schema other than defining table structure for creating new table?
Many thanks,
Actually two questions, you usually do better asking one, just for future reference.
1. Pluralization
Short form is that it is good practice. In more detail, this is generally logical as what you are referring to is a "collection" of items or objects rather. So the general inference in a "collection" is "many" and therefore a plural form of what the "object" itself is named.
So a "people" collection implies that it is in fact made up of many "person" objects, just as "dogs" to "dog" or "cats" to "cat". Not necessarily "bovines" to "cow", but generally speaking mongoose does not really deal with Polymorphic entities, so there would not be "bull" or "bison" objects in there unless just specified by some other property to "cow".
You can of course change this if you want in either of these forms and specify your own name:
var personSchema = new Schema({ ... },{ "collection": "person" });
mongoose.model( "Person", personSchema, "person" );
But a model is general a "singular" model name and the "collection" is the plural form of good practice when there are many. Besides, every SQL database ORM I can think of also does it this way. So really this is just following the practice that most people are already used to.
2. Why Schema?
MongoDB is actually "schemaless", so it does not have any internal concept of "schema", which is one big difference from SQL based relational databases which hold their own definition of "schema" in a "table" definition.
While this is often actually a "strength" of MongoDB in that data is not tied to a certain layout, some people actually like it that way, or generally want to otherwise encapsulate logic that governs how data is stored.
For these reasons, mongoose supports the concept of defining a "Schema". This allows you to say "which fields" are "allowed" in the collection (model) this is "tied" to, and which "type" of data may be contained.
You can of course have a "schemaless" approach, but the schema object you "tie" to your model still must be defined, just not "strictly":
var personSchema = new Schema({ },{ "strict": false });
mongoose.model( "Person", personSchema );
Then you can pretty much add whatever you want as data without any restriction.
The reverse case though is that people "usually" do want some type of rules enforced, such as which fields and what types. This means that only the "defined" things can happen:
var personSchema = new Schema({
name: { type: String, required: true },
age: Number,
sex: { type: String, enum: ["M","F"] },
children: [{ type: Schema.Types.ObjectId, ref: "Person" }],
country: { type: String, default: "Australia" }
});
So the rules there break down to:
"name" must have "String" data in it only. Bit of a JavaScript idiom here as everything in JavaScript will actually stringify. The other thing on here is "required", so that if this field is not present in the object sent to .save() it will throw a validation error.
"age" must be numeric. If you try to .save() this object with data other than numeric supplied in this field then you will throw a validation error.
"sex" must be a string again, but this time we are adding a "constraint" to say what the valid value are. In the same way this also can throw a validation error if you do not supply the correct data.
"children" actually an Array of items, but these are just "reference" ObjectId values that point to different items in another model. Or in this case this one. So this will keep that ObjectId reference in there when you add to "children". Mongoose can actually .populate() these later with their actual "Person" objects when requested to do so. This emulates a form of "embedding" in MongoDB, but used when you actually want to store the object separately without "embedding" every time.
"country" is again just a String and requires nothing special, but we give it a default value to fill in if no other is supplied explicitly.
There are many other things you can do, I would suggest really reading through the documentation. Everything is explained in a lot of detail there, and if you have specific questions then you can always ask, "here" (for example).
So MongoDB does things differently to how SQL databases work, and throws out some of the things that are generally held in "opinion" to be better implemented at the application business logic layer anyway.
Hence in Mongoose, it tries to "put back" some of the good things people like about working with traditional relational databases, and allow some rules and good practices to be easily encapsulated without writing other code.
There is also some logic there that helps in "emulating" ( cannot stress enough ) "joins", as there are methods that "help" you in being able to retrieve "related" data from other sources, by essentially providing definitions where which "model" that data resides in within the "Schema" definition.
Did I also not mention that "Schema" definitions are again just objects and re-usable? Well yes they are an can in fact be tied to "many" models, which may or may not reside on the same database.
Everything here has a lot more function and purpose than you are currently aware of, the good advice here it to head forth and "learn". That is the usual path to the realization ... "Oh, now I see, that's what they do it that way".
I have found a couple of solutions for inheritance using Mongoose here and here. These seem to work fine when the documents are stored in a normal collection. But, I am having trouble figuring out how to be able to store an array of the subclassed documents in a sub document on a model.
Let's say we have an object 'drawer' and it contains a collection of 'clothing' objects, where clothing objects can actually be one of several types of clothing 'sock', 'shirt', 'shorts'.
Sock, Shirt, and Shorts are all subclasses of Clothing.
So, I want to have my model Drawer look something like this...
var drawerSchema = mongoose.Schema({
// some drawer properties here
// ...
contents: [clothingSchema]
});
I have tried this approach, but when saved, only the properties of the actual clothingSchema are saved to the DB. For example, if my clothing schema had a common size property, it would be saved, but a property on my shirt object called 'buttonDown' would not be saved.
Has anyone else had a need to do this kind of modeling and if so found a solution?
Thanks!
If you can have inherited Schemas work on independent collections, you can use some Foreign Keys.
var drawerSchema = new Schema({
contents: [{type: Schema.Types.ObjectId, ref: 'cloth'}],
});
var clothSchema = new Schema({
size: Number
});
// Then define extended Schemas (Shirt, Sock, Short...)
This approach also lets you have a cloth definition only once in your DB so prevents unnecessary repeat of same data.