Mongoose behavior and schema - javascript

I am learning nodejs along with mongodb currently, and there are two things that confuse me abit.
(1),
When a new Schema and model name are used (not in db), the name is changed into its plural form. Example:
mongoose.model('person', personSchema);
in the database, the table will be called "people" instead.
Isn't this easy to confuse new developer, why has they implemented it this way?
(2),
Second thing is that whenever I want to refer to an existing model in mongoDb (assume that in db, a table called people exists). Then in my nodejs code, I still have to define a Schema in order to create a model that refer to the table.
personSchema = new mongoose.Schema({});
mongoose.model('person',personSchema);
The unusual thing is, it does not seem to matter how I define the schema, it can just be empty like above, or fill with random attribute, yet the model will always get the right table and CRUD operations performs normally.
Then what is the usage of Schema other than defining table structure for creating new table?
Many thanks,

Actually two questions, you usually do better asking one, just for future reference.
1. Pluralization
Short form is that it is good practice. In more detail, this is generally logical as what you are referring to is a "collection" of items or objects rather. So the general inference in a "collection" is "many" and therefore a plural form of what the "object" itself is named.
So a "people" collection implies that it is in fact made up of many "person" objects, just as "dogs" to "dog" or "cats" to "cat". Not necessarily "bovines" to "cow", but generally speaking mongoose does not really deal with Polymorphic entities, so there would not be "bull" or "bison" objects in there unless just specified by some other property to "cow".
You can of course change this if you want in either of these forms and specify your own name:
var personSchema = new Schema({ ... },{ "collection": "person" });
mongoose.model( "Person", personSchema, "person" );
But a model is general a "singular" model name and the "collection" is the plural form of good practice when there are many. Besides, every SQL database ORM I can think of also does it this way. So really this is just following the practice that most people are already used to.
2. Why Schema?
MongoDB is actually "schemaless", so it does not have any internal concept of "schema", which is one big difference from SQL based relational databases which hold their own definition of "schema" in a "table" definition.
While this is often actually a "strength" of MongoDB in that data is not tied to a certain layout, some people actually like it that way, or generally want to otherwise encapsulate logic that governs how data is stored.
For these reasons, mongoose supports the concept of defining a "Schema". This allows you to say "which fields" are "allowed" in the collection (model) this is "tied" to, and which "type" of data may be contained.
You can of course have a "schemaless" approach, but the schema object you "tie" to your model still must be defined, just not "strictly":
var personSchema = new Schema({ },{ "strict": false });
mongoose.model( "Person", personSchema );
Then you can pretty much add whatever you want as data without any restriction.
The reverse case though is that people "usually" do want some type of rules enforced, such as which fields and what types. This means that only the "defined" things can happen:
var personSchema = new Schema({
name: { type: String, required: true },
age: Number,
sex: { type: String, enum: ["M","F"] },
children: [{ type: Schema.Types.ObjectId, ref: "Person" }],
country: { type: String, default: "Australia" }
});
So the rules there break down to:
"name" must have "String" data in it only. Bit of a JavaScript idiom here as everything in JavaScript will actually stringify. The other thing on here is "required", so that if this field is not present in the object sent to .save() it will throw a validation error.
"age" must be numeric. If you try to .save() this object with data other than numeric supplied in this field then you will throw a validation error.
"sex" must be a string again, but this time we are adding a "constraint" to say what the valid value are. In the same way this also can throw a validation error if you do not supply the correct data.
"children" actually an Array of items, but these are just "reference" ObjectId values that point to different items in another model. Or in this case this one. So this will keep that ObjectId reference in there when you add to "children". Mongoose can actually .populate() these later with their actual "Person" objects when requested to do so. This emulates a form of "embedding" in MongoDB, but used when you actually want to store the object separately without "embedding" every time.
"country" is again just a String and requires nothing special, but we give it a default value to fill in if no other is supplied explicitly.
There are many other things you can do, I would suggest really reading through the documentation. Everything is explained in a lot of detail there, and if you have specific questions then you can always ask, "here" (for example).
So MongoDB does things differently to how SQL databases work, and throws out some of the things that are generally held in "opinion" to be better implemented at the application business logic layer anyway.
Hence in Mongoose, it tries to "put back" some of the good things people like about working with traditional relational databases, and allow some rules and good practices to be easily encapsulated without writing other code.
There is also some logic there that helps in "emulating" ( cannot stress enough ) "joins", as there are methods that "help" you in being able to retrieve "related" data from other sources, by essentially providing definitions where which "model" that data resides in within the "Schema" definition.
Did I also not mention that "Schema" definitions are again just objects and re-usable? Well yes they are an can in fact be tied to "many" models, which may or may not reside on the same database.
Everything here has a lot more function and purpose than you are currently aware of, the good advice here it to head forth and "learn". That is the usual path to the realization ... "Oh, now I see, that's what they do it that way".

Related

Graphql - Error when trying to create reference to object for field [duplicate]

Reaching to you all as I am in the learning process and integration of Apollo and graphQL into one of my projects. So far it goes ok but now I am trying to have some mutations and I am struggling with the Input type and Query type. I feel like it's way more complicated than it should be and therefore I am looking for advice on how I should manage my situation. Examples I found online are always with very basic Schemas but the reality is always more complex as my Schema is quite big and look as follow (I'll copy just a part):
type Calculation {
_id: String!
userId: String!
data: CalculationData
lastUpdated: Int
name: String
}
type CalculationData {
Loads: [Load]
validated: Boolean
x: Float
y: Float
z: Float
Inputs: [Input]
metric: Boolean
}
Then Inputs and Loads are defined, and so on...
For this I want a mutation to save the "Calculation", so in the same file I have this:
type Mutation {
saveCalculation(data: CalculationData!, name: String!): Calculation
}
My resolver is as follow:
export default resolvers = {
Mutation: {
saveCalculation(obj, args, context) {
if(context.user && context.user._id){
const calculationId = Calculations.insert({
userId: context.user._id,
data: args.data,
name: args.name
})
return Calculations.findOne({ _id: calculationId})
}
throw new Error('Need an account to save a calculation')
}
}
}
Then my mutation is the following :
import gql from 'graphql-tag';
export const SAVE_CALCULATION = gql`
mutation saveCalculation($data: CalculationData!, $name: String!){
saveCalculation(data: $data, name: $name){
_id
}
}
`
Finally I am using the Mutation component to try to save the data:
<Mutation mutation={SAVE_CALCULATION}>
{(saveCalculation, { data }) => (
<div onClick={() => saveCalculation({ variables : { data: this.state, name:'name calcul' }})}>SAVE</div>
}}
</Mutation>
Now I get the following error :
[GraphQL error]: Message: The type of Mutation.saveCalculation(data:)
must be Input Type but got: CalculationData!., Location: undefined,
Path: undefined
From my research and some other SO posts, I get that I should define Input type in addition to the Query type but Input type can only avec Scalar types but my schema depends on other schemas (and that is not scalar). Can I create Input types depending on other Input types and so on when the last one has only scalar types? I am kinda lost cause it seems like a lot of redundancy. Would very much appreciate some guidance on the best practice. I am convinced Apollo/graphql could bring me quite good help over time on my project but I have to admit it is more complicated than I thought to implement it when the Schemas are a bit complex. Online examples generally stick to a String and a Boolean.
From the spec:
Fields may accept arguments to configure their behavior. These inputs are often scalars or enums, but they sometimes need to represent more complex values.
A GraphQL Input Object defines a set of input fields; the input fields are either scalars, enums, or other input objects. This allows arguments to accept arbitrarily complex structs.
In other words, you can't use regular GraphQLObjectTypes as the type for an GraphQLInputObjectType field -- you must use another GraphQLInputObjectType.
When you write out your schema using SDL, it may seem redundant to have to create a Load type and a LoadInput input, especially if they have the same fields. However, under the hood, the types and inputs you define are turned into very different classes of object, each with different properties and methods. There is functionality that is specific to a GraphQLObjectType (like accepting arguments) that doesn't exist on an GraphQLInputObjectType -- and vice versa.
Trying to use in place of another is kind of like trying to put a square peg in a round hole. "I don't know why I need a circle. I have a square. They both have a diameter. Why do I need both?"
Outside of that, there's a good practical reason to keep types and inputs separate. That's because in plenty of scenarios, you will expose plenty of fields on the type that you won't expose on the input.
For example, your type might include derived fields that are actually a combination of the underlying data. Or it might include fields to relationships with other data (like a friends field on a User). In both these case, it wouldn't make sense to make these fields part of the data that's submitted as as argument for some field. Likewise, you might have some input field that you wouldn't want to expose on its type counterpart (a password field comes to mind).
Yes, you can:
The fields on an input object type can themselves refer to input object types, but you can't mix input and output types in your schema. Input object types also can't have arguments on their fields.
Input types are meant to be defined in addition to normal types. Usually they'll have some differences, eg input won't have an id or createdAt field.

Suggest type of an object based on a given title

I am working on an Ember application that deals with geospatial data processing. Part of this project is importing a JSON object that describes a data layer which contains fields corresponding to data entries. For example, I suppose I am importing a data layer called "Laundry Facilities"; the JSON will look something like this:
{
key: "laundryFacilities",
label: "Laundry Facilities",
fields: [
{
"label": "Name of Facility",
"key": "name",
},
{
"label": "Number of Dryers",
"key": "numberDryers",
}
]
}
At some point in my data import workflow, the user must specify a type for each field. For example, the type for "Name of Facility" would be a string, and the type for "Number of Dryers" would be an integer. I'd like to be able to provide a suggested type to the user based off of the label or key attribute rather than forcing them to specify the type for every field. Is there any kind of algorithm, package, framework, etc. that provides functionality for guessing a data type based off of something qualitative like a label describing the data field? Or does anyone know of another way I could implement this? I know not to expect 100% accuracy but even a rough type guess would be extremely helpful. Bonus points if it's an Ember addon.
Your best bet is to write some simple heuristic, not much more complicated than a bunch of keywords mapping to types. As you've described, 'number' probably means a number type and 'name' likely means a 'name' type.
In general, you're describing a classification problem. This is going to be difficult to solve with a (presumably) small set of training examples. If you can get a decent number of examples of column names, I would first try a decision tree or a logistic regression, which would take the presence of certain words as features, and produce a data type as the output variable.

MongoDB and Mongoose: Nested Array of Document Reference IDs

I have been diving into a study of MongoDB and came across a particularly interesting pattern in which to store relationships between documents. This pattern involves the parent document containing an array of ids referencing the child document as follows:
//Parent Schema
export interface Post extends mongoose.Document {
content: string;
dateCreated: string;
comments: Comment[];
}
let postSchema = new mongoose.Schema({
content: {
type: String,
required: true
},
dateCreated: {
type: String,
required: true
},
comments: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Comment' }] //nested array of child reference ids
});
And the child being referenced:
//Child Schema
export interface Comment extends mongoose.Document {
content: string;
dateCreated: string;
}
let commentSchema = new mongoose.Schema({
content: {
type: String,
required: true
},
dateCreated: {
type: String,
required: true
}
});
This all seems fine and dandy until I go to send a request from the front end to create a new comment. The request has to contain the Post _id (to update the post) and the new Comment, which are both common to a request one would send when using a normal relational database. The issue appears when it comes time to write the new Comment to the database. Instead of one db write, like you would do in a normal relational database, I have to do 2 writes AND 1 read. The first write to insert the new Comment and retrieve the _id. Then a read to retrieve the Post by the Post _id sent with the request so I can push the new Comment _id to the nested reference array. Finally, a last write to update the Post back into the database.
This seems extremely inefficient. My question is two-fold:
Is there a better/more efficient way to handle this relationship pattern (parent containing an array of child reference ids)?
If not, what would be the benefit of using this pattern as opposed to A) storing the parent _id in a property on the child similar to a traditional foreign key, or B) taking advantage of MongoDB documents and storing an array of the Comments as opposed to an array of reference ids to the Comments.
Thanks in advance for your insight!
Regarding your first question:
You specifically ask for a better way to work with child-ids that are stored in the parent. I'm pretty sure that there is no better way to deal with this, if it has to be this pattern.
But this problem also exist in relational databases. If you want to save your post in a relational database (using that pattern), you also have to first create the comment, get its ID and then update the post. Granted, you can send all these tasks in a single request, which is probably more efficient than using mongoose, but the type of work that needs to be done is the same.
Regarding your second question:
The benefit over variant A is, that you can for example get the post, and instantly know how many comments it has, without asking the mongodb to go through probably hundrets of documents.
The benefit over variant B is, that you can store more references to comments in a single document (a single post), than whole comments, because of mongos 16MB document-size-limit.
The Downside however is the one you mentioned, that it's inefficient to maintain that structure. I take it, that this is only an example to showcase the scenario, so here is what i would do:
I would decide on a case by case basis what to use.
If the document will be read a lot, and not much written to, AND it is unlikely to grow larger than 16MB: Embed the sub-document. this way you can get all the data in a single query.
If you need to reference the document from multiple other documents AND your data really must be consistent, then you have no choice but to reference it.
If you need to reference the document from multiple other documents BUT data-consitency is not that super important AND the restrictions from the first bulletpoint apply, then embed the sub-documents, and write code to keep your data consistent.
If you need to reference the document from multiple other documents, and they are written to a lot, but not read that often, you're probably better off referencing them, as this is easier to code, because you don't need to write code to sync duplicate data.
In this specific case (post/comment) referencing the parent from the child (letting the child know the parents _id) is probably a good idea, because it's easier to maintain than the other way around, and the document might grow larger than 16MB if they were embedded directly. If i'd know for sure, that the document would NOT larger than over 16MB, embedding them would be better, because its faster to query the data that way

Mongodb parent reference tree get current location and full path

I'm making parental reference tree with MongoDB and Mongoose. My schema looks like this
var NodesSchema = new Schema({
_id: {
type: ShortId,
len: 7
},
name: { // name of the file or folder
type: String,
required: true
},
isFile: { // is the node file or folder
type: Boolean,
required: true
},
location: { // location, null for root
type: ShortId,
default: null
},
data: { // optional if isFile is true
type: String
}
});
Note that files/folders are rename-able.
In my current setup if I want to get files in specific folder I perform the following query:
NodesModel.find({ location: 'LOCATION_ID' })
If I want to get a single file/folder I run:
NodesModel.findOne({ _id: 'ITEM_ID' })
and the location field looks like f8mNslZ1 but if I want to get the location folder name I need to do second query.
Unfortunately if I want to get path to root I need to do a recursive query, which might be slow if I have 300 nested folders.
So I have been searching and figured out the following possible solution:
Should I change the location field from string to object and save the information in it as following:
location: {
_id: 'LOCATION_ID',
name: 'LOCATION_NAME',
fullpath: '/FOLDERNAME1/FOLDERNAME2'
}
The problem in this solution is that files/folders are rename-able. On rename I should update all children. However rename occurs much more rarely then indexing, but if the folder has 1000 items, would be a problem I guess.
My questions are:
Is my suggestion with the location object instead of string viable? What problems might it cause?
Are there better ways to realize this?
How can I improve my code?
Looking at your Node Schema, if you change the location property to an object, you'll have 2 places where you state the Node's name so be mindful of updating both name properties. Usually you want to keep you database as DRY as possible, and in most cases doing nested queries is quite common. That being said, you know your database much more than I do, and if you see a significant performance delay by doing more queries, then just be sure to update all name properties.
In addition to this, if you have your location's fullpath property be a string, and let's say you run into a case where you have to rename a folder, you'll have to analyze the whole string by breaking it down and comparing substrings to a new value for the new folder name. This can get tedious.
A possible solution could be to store the full path as an array instead of a string, having the order be the next folder in the chain, that way you can quickly compare and update when need be.
The different ways to model tree structures are extensively covered in the MongoDB docs.
The way you are proposing is one of them.
Depending on how frequent folder renaming is expected to happen (and/or any other hierarchy changes more complex than adding a new leaf node) you might consider storing the "path" as an "array of ancestors" instead. But whichever way you happen to denormalize or materialize the path up the tree in each folder, the trade-off is that for faster look-ups, you will have slower and/or more complicated updates.
In your case it seems clear to optimize for the read and not for the rare update - in addition to being less frequent, it seems that renames could be done asynchronously where that's simply not possible with displaying names of parent folders.
While DRY is a great principle in programming, it's pretty much not applicable to non-relational databases, so unless you are using a strictly relational database and normal form don't apply it to your schema design and in fact this would be specifically discouraged in MongoDB as you would then be using the wrong tool for the job.

Mongoose - recursive query (merge from multiple results)

I have the following generic schema to represent different types of information.
var Record = new Schema (
{
type: {type: String}, // any string (foo, bar, foobar)
value: {type: String}, // any string value
o_id: {type:String}
}
);
Some of the records based on this schema have:
type="car"
value="ferrari" or
value="ford"
Some records have type "topspeed" with value "210" but they always share o_id (e.g. related "ferrari has this topspeed"). So if "ferrari has top speed 300", then both records have same o_id.
How can I make query to find "ferrari with topspeed 300" when I don't know o_id?
The only solution I found out is to select cars "ferrari" first and then with knowledge of all o_id for all "ferrari" use it to find topspeed.
In pseudocode:
Record.find({type:"car", value:"ferrari"}, function(err, docs)
{
var condition = [];// create array of all found o_id;
Record.find({type:"topspeed", value:"300"}...
}
I know that some merging or joining might not be possible, but what about some chaining these conditions to avoid recursion?
EDIT:
Better example:
Lets imagine I have a HTML document that contains DIV elements with certain id (o_id).
Now each div element can contain different type of microdata items (Car, Animal...).
Each microdata item has different properties ("topspeed", "numberOfLegs"...) based on the type (Car has a topspeed, animal numberOfLegs)
Each property has some value (310 kph, 4 legs)
Now I'm saving these microdata items to the database but in a general way, agnostic of the type and values they contain since the user can define custom schemas from Car, to Animal, to pretty much anything). For that I defined the Record schema: type consists of "itemtype_propertyname" and value is value of the property.
I would eventually like to query "Give me o_id(s) of all DIV elements that contain item Ferrari and item Dog" at the same time.
The reason for this general approach is to allow anyone the ability to define custom schema and corresponding parser that stores the values.
But I will have only one search engine to find all different schemas and value combinations that will treat all possible schemas as a single definition.
I think it'd be far better to combine all records that share an o_id into a single record. E.g.:
{
_id: ObjectId(...),
car: "ferarri",
topspeed: 300
}
Then you won't have this problem, and your schema will be more efficient both in speed and storage size. This is how MongoDB is intended to be used -- heterogenous data can be stored in a single collection, because MongoDB is schemaless. If you continue with your current design, then no, there's no way to avoid multiple round-trips to the database.

Categories

Resources