I am learning MongoDB and I have a question regarding duplication of data. In the SQL world you try to normalize the data. For instance I have a table with categories and another one with products. Each product may belong to many categories so there is a join between these tables.
However am I right that in MongoDB you don't think like this? Does instead each product have a embedded document(s) of categories? Is that just the way it is? You don't care if the data is duplicated?
In the SQL world you try to normalize the data
Not always, normalising to the point of death inflicts performance hits but it is true that I personally do not apply the same normalisation to MongoDB as I do SQL.
If you are aware of the normalised forms ( http://en.wikipedia.org/wiki/Database_normalization ) I like to think MongoDB as going to 1NF and then back down to denormalised again.
You don't care if the data is duplicated?
Oh yes we do. Updating is a pain if the data is duplicated wrong.
Let me give you an example: category and product would be two separate entities, there is no denying it. These two entities are normalised (the repeating data of product has been spearated from category). Another way of thinking of it is: Are all products only going to exist in one category?
So on top level entities, as you can see, the same rules relatively apply with 1NF easily being applied to MongoDB.
On the front of duplication you, of course, would not want to store each product separately within each category (I answered no to the question above) so you would naturally want to separate catgeories and products.
You would normally have a many-to-many relationship here with a middle normalised table. This is where de-normalisation can come in. You can say that a category will have a list of products that are unique to that category as such you could de-normalise the many-to-many relational table into the category row as a list (or the other way around into the product row). This will not generate duplication since that list is unique to that category (more than likely). This of course means that the category or products would house a list _ids of the related row instead of the object itself.
There are times where duplication is nessecary, mainly for optimisation or work arounds for not having JOINs; this rule also applies to SQL as well if you have ever done a big enough site.
Typical usage scenarios of duplication is aggregation fields of stats like a Facebook posts shares and comments and maybe even the 5 latest comments of that post would also be duplicated onto the post row.
So it is not a case of ignoring schema design but more of tuning it for MongoDBs characteristics. Normally if you do that you will find that you, naturally, design a good schema.
As an added reference you can refer here: http://docs.mongodb.org/manual/core/data-modeling
Related
I made a web page for selling items online. The website has a lot of products but will probably have multiple thousand products in the near future. The website contains a search bar and I want to create a search results page, but I am not sure what the best way of doing this is. I thought about using JavaScript to loop through the list of all the products until it finds a match. But this process is probably too slow. My question is: What is the best way to store a large list of items, and what is the best way to find matches from the list for the search query? I now that many people use SQL databases for storing lists but is that method any better than simply storing everything in a JavaScript list, and why? Also how do I find a match in the list? Can I use JavaScript or is it necessary or better to use a language like PHP?
I read this question Many to Many relationship in Firebase,
and here describes how to create or structure many to many relationship within firebase(nosql database),
and it is pretty easy,
companyContractors
companyKey1
contractorKey1: true
contractorKey3: true
companyKey2
contractorKey2: true
contractorCompanies
contractorKey1
companyKey1: true,
companyKey2: true
contractorKey2
companyKey2: true
contractorKey3
companyKey1: true
but when i want to get all contractors specific company, can i do it using only one request? because in this case i get only list of id, and after there, using js loop, or forEach, i do multiple request.
usually on others API's, i only use
URL/contractor/:id/company or
URL/company/:id/contractors
how to do it on firebase?
it would be cool to have an example of using angular2, angularfire2
thanks
AskFirebase
With NoSQL, you have to think in terms of views first, then let the views dictate your schema. You can definitely do this with one query, but forget about normalization, that concept only applies to relational databases.
Let's say you have a view where you want to search by company and list all the contractors, with their info, or search by contractor and list all the companies with their info:
Schema:
companyContractorKey, contractorCompanykey, contractorName, contractorSkill,
companyIndustry, etc...
where companyContractorKey is a field containing a concatenation of the company name and contractor name, for example: 'Acme/Ellis Electric'. You can then do a range search from 'Acme/A' to 'Acme/z' and get all the contractors for Acme.
Similarly, contractorCompanyKey is a field containing a concatenation of the contractor name and company, for example 'Ellis Electric/Acme'. You can then do a range search from 'Ellis Electric/A' to 'Ellis Electric/z' to get all the companies for Ellis Electric.
The drawback is that the information for a company is stored in multiple records (easily found using the companyContractorKey), and the information for a contractor is also stored in multiple records (found using the contractorCompanyKey), so updates and deletions will involve multiple records, but querying will be super fast, as long as you indexOn the two key fields. Firebase supports updating multiple records with one request, so this should not present a problem.
Also you will want to avoid putting in all the information about a company or contractor in that schema node, only what is necessary for your views, and have all the details that are not in the "listing" view in separate schema nodes, one dedicated to companies and one to contractors.
In a Node.js App, i use Mongodb/Mongoose. MongoDB doesn't has Joins, and this is a big problem for me. because i store data in separate collections. i have two collections. Users and Books. i need a query like this:
SELECT Books.name FROM Users,Books WHERE Users.name='john' and Books.lang='en'
im sure that we cannot do this by one single Mongoose query. Mongoose populate has also lots of problems. for example if i do this:
Users.find({user_name:'john'}).pupulate({path:'books', match:{lang:'en'}})
and if i change this code to this, it gets to another problem:
Users.find({user_name:'john'}).pupulate({path:'books', match:{lang:'en'}}).limit(6)
it fetches first 6 users item and then populate books, if there no population then result will be null. i need fetch first 6 item that belong to an specific user and has an english book.
With this code, All users with name of john will be fetched and a few of them has a book with english language. imagine users with name john is 1 million and one of them has an english book! we have to fetch that huge data and with a secondary block of code, filter that one item. its not clear and good way.
is there any other way? i has a lot of search about this but no good result. in this scenario what annoys me is fetching all users and then filter it again. this can be an slow architecture. whats the best way to me? what can i do.
i'm wondering a database engine that claims to be an enterprise one has this disadvantage. other people doesn't has this problem? there is something i'm missing?
Nothing important in this paragraph - I am still new to programming and getting the hang of nodejs, so for this question I am only asking for someone to explain the logical process I should take. I can figure it out from there and I feel as though this would be a great example for beginners like myself so I will post the code for the final solution.
The Situation
-I have one mongo database containing customers(there are 100's of customers. It has 3 fields (First, Last, Age)
-The Customers receive a monthly service. Everyday the company assigns to each of their small number of employees a random list of these customers to take care of.
-I have already created a second Schema in the mongo database that has four fields; the employee assigned to the daily list, the date, an array filled with the ID fields of many customers, and a Boolean value (employee, date, array, boolean) - {i need the second schema for archive purposes}
The problem -
I need to query the employee list (2nd) database for incomplete lists( aka false boolean values); Then create, show or active a link to a view for each of the incomplete lists, the individual views will be populated by querying the customer database(1st database) by the arguement of the ID fields that I retrieve from the array in the 2nd database. And I need to create that link and populate the view for each of incomplete lists. I am using NodeJS, Express and Jade, but like i said in the paragraph you skipped if you could just map out the logic I can get started and will post my final result.
Thank You for any attempts. It is now almost 9pmEST, I will be monitoring my post until atleast 11pmEST if anyone needs any clarification. and again tomorrow morning
-Steven R
Having what I understand, is that they have 2 databases. Parese to me a little more simple to do. Use ObjectId with a field "boolesean" or numeric, this field should modify or change the connection to the database. So solve the problem of having multiple databases and you can relate. If you use ObjectId, you can do research in other collections and in this case in other databases. use Mongoose That will help you build better databases, and search on different levels.
Are there any best practices for returning large lists of orders to users?
Let me try to outline the problem we are trying to solve. We have a list of customers that have 1-5,000+ orders associated to each. We pull these orders directly from the database and present them to the user is a paginated grid. The view we have is a very simple "select columns from orders" which worked fine when we were first starting but as we are growing, it's causing performance/contention problems. Seems like there are a million and one ways to skin this cat (return only a page worth of data, only return the last 6 months of data, etc.) but like I said before just wondering if there are any resources out there that provide a little more hand holding on how to solve this problem.
We use SQL Server as our transaction database and select the data out in XML format. We then use a mixture of XSLT and Javascript to create our grid. We aren't married to the presentation solution but are married to the database solution.
My experience.
Always set default values in the UI for the user that are reasonable. You don't want them clicking "Retrieve" and getting everything.
Set a limit to the number of records that can be returned.
Only return from the database the records you are going to display.
If forward/backward consistencency is important, store the entire results set from the query in a temp table and return just the page you need to display. When paging up/down retrieve the next set from the temp table.
Make sure your indexs are covering your queries.
Use different queries for different purposes. Think "Open Orders" vs "Closed Orders". These might perfrom much better as different queries instead of one generic query.
Set parameter defualts in the stored procedures. Protect your query from a UI that is not setting reasonable limits.
I wish we did all these things.
I'd recommend doing some profiling to find the actual bottlenecks. Perhaps you have access to Visual Studio Profiler? http://msdn.microsoft.com/en-us/magazine/cc337887.aspx There are plenty of good profilers out there.
Otherwise, my first stop would be pagination to bring back less records from the db, which is easier on the connection and the memory footprint. Take a look at this (I'm assuming you're on SQL Server >= 2005)
http://www.15seconds.com/issue/070628.htm
I"m not sure from the question exactly what UI problem you are trying to solve.
If it's that the customer can't work with a table that is just one big amorphous blob, then let him sort on the fields: order date, order number, your SKU number, his SKU number maybe, and I guess others,too. He might find it handy to do a multi-column stable sort, too.
If it's that the table headers scroll up and disappears when he scrolls down through his orders, that's more difficult. Read the SO discussion to see if the method there gives a solution you can use.
There is also a JQuery mechanism for keeping the header within the viewport.
HTH
EDIT: plus I'll second #Iain 's answer: do some profiling.
Another EDIT: #Scott Bruns 's answer reminded me that when we started designing the UI, the biggest issue by far was limiting the number of records the user had to look at. So, yes I agree with Scott that you should give the user some way to see only a limited number of records right from the start; that is, before he ever sees a table, he has told you a lot about what he wants to see.
Stupid question, but have you asked the users of your application for input on what records that they would like to see initially?