NodeJS/Mongoose - Logical separation of same schema + multi-tenancy - javascript

I have 2 requirements in my application:
I have multiple clients, which should be completely separated
Each client can have multiple subsidiaries that he should be able to switch between without re-authenticating but the data should be separated (e.g. all vendors in subsidiary 1 should not be shown in subsidiary 2)
As for the first requirement, I'm thinking of using a multi-tenancy architecture. That is, there will be one API instance, one frontend instance per customer and one database per customer. Each request from the frontend will include a tenant ID by which the API decides which database it needs to connect to / use. I would use mongoose's useDb method for this.
Question 1: is this method a good approach and/or are there any known drawbacks performance wise? I'm using this article as a reference.
As for the second requirement, I would need to somehow logically separate certain schemas. E.g., I have my mongoose vendorSchema. But I would need to somehow separate the entries per subsidiary. I could only imagine to add a field to each of these "shared schemas" e.g.
const vendorSchema = new mongoose.Schema({
/* other fields... */
subsidiary {
type: mongoose.Schema.Types.ObjectId,
ref: "Subsidiary",
required: true
}
})
and then having to use this a subsidiary in every request to the API to use in the mongoose query to find the right data. That seems like a bad architectural decision and an overhead though, and seems little scalable.
Question 2: Is there a better approach to achieve this logical separation as per subsidiary for every "shared" schema?
Thanks in advance for any help!

To maybe answer part of your question..
A multi tenant application is, well normal.. I honestly don't know of any web-app that would be single tenant, unless it's just a personal app.
With that said the architecture you have will work but as noted in my comments there is no need to have a separate DB for each users, this would be a bit overkill and is the reason why SQL or Mongo queries exist.
Performance wise, in general database servers are very performant, that's what they are designed for, but this will rely on many factors
Number of requests
size of requests
DB optimization
Query optimization
Resources of DB server
I'm sure there are many more I didn't list but you get the idea..
To your second question, yes you could add a 'Subsidiary' field, this would say be the subsidiary ID, so then when you query Mongo you use where subsidiar = 'id' this would then return only the items for said user...
From the standpoint of multiple request to mongo for each API call, yah you want to try and limit the number of calls each time but thats where caching comes in, using something like redis to store the responses for x minutes etc. Then the response is mainly handled by redis, but again this is going to depend a lot on the size of the responses and frequency etc.
But this actually leads into why I was asking about DB choices, Mongo works really well for frequently changing schemas with little to no relation to each other. We use Mongo for an a chat application and it works really well for that because it's more or less just a JSON store for us with simply querying for chats but the second you need to have data relate to each other it can start to get tricky and end up costing you more time and resources trying to hack around mongo to do the same task.
I would say it could be worth doing an exercise where you look at your current data structure, where it is today and where it might go in the future. If you can foresee having your data related in anyway in the future or maybe even crypto ( yes mongo does have this but its only in the enterprise version) then it may be something to look at.

Related

Storing in MySQL vs JavaScript Object

I have a set of data associating zipcodes to GPS coordinates (namely latitude and longitude). The very nature of the data makes it immutable, so it has no need to be updated.
What are the pro and cons of storing them in a SQL database vs directly as a JavaScript hashmap? The table resides on the server, it's Node.js, so this is not a server vs browser question.
When retrieving data, one is sync, the other async, but there is less than 10k elements, so I'm not sure whether storing these in MySQL and querying them justifies the overhead.
As there is no complex querying need, are there some points to consider that would justify having the dataset in a database?
* querying speed and CPU used for retrieving a pair,
* RAM used for a big dataset that would need to fit into working memory.
I guess that for a way bigger dataset, (like 100k, 1M or more), it would be too costly in memory and a better fit for the database.
Also, JavaScript obejects use hash tables internally, so we can infer they perform well even with non trivial datasets.
Still, would a database be more efficient at retrieving a value from an indexed key than a simple hashmap?
Anything else I'm not thinking about?
You're basically asking a scalability question... "At what point do I swap from storing things in a program to storing things in a databse?"
Concurrency, persistence, maintainability, security, etc.... are all factors.
If the data is open knowledge, only used by one instance of one program, and will never change, then just hard code it or store it in a flat file.
When you have many applications with different permissions calling a set of data and making changes, a database really shines.
Most basically, an SQL database will [probably ...] be "server side," while your JavaScript hash-table will be "client side." Does the data need to be persisted from one request to the next, and between separate invocations of the JavaScript program? If so, it must be stored ... somewhere.
The decision of whether to use "a hash table" is also up to you: hash tables are great when you are looking for explicit keys. But they're not the only data-structure available to you in JavaScript.
I'd say: carefully work out all the particulars of your exact situation, and use these to inform your decision. "An online web forum like this one" really can't step into your shoes on this. "You're the engineer ..."

When to use redis for better optimization?

I am beginner in redis and had used it in my node.js project and its providing good results when I see the caching mechanism it's been spinning
So basically in world where MySql,firebase and mongodb are top in there perspective, where would redis fit? Can we use redis for better optimization replacing any of these most popular databases or can have greater application role with specific technologies ? Maybe it should be used with javascript and its framework(eg. node.js has good analogy with redis) more?
Redis is widely used for caching. Meaning, in a high availability infrastructure, when some data has to be accessed many times, you would store it in your database and then store it in redis with some unique key which you could rebuild easily with parameters. When the data is updated, you just clear that key in redis and add it again with the new data.
Example:
You have thousands of users.
They all connect many many times and go on their profile.
You might want to store their profile info in redis with a key {userid}_user_info.
The user tries to access his profile:
first check if data exists in redis,
if yes return it,
else get it from db and insert it in redis
The user updates his profile info, just refresh the redis value.
etc.
There is also another way redis is used, it's for queuing tasks and synchronising websockets broadcasts across machines. Here is a useful article about it
http://fiznool.com/blog/2016/02/24/building-a-simple-message-queue-with-redis/
As per using redis as a database, well for simple data it can be used, like settings where a simple key/value is enough. For storing complex data, it's a bit of a hassle, specially if you want to use relational functionalities, or searching features. Redis is fast because it does not have all these features, and keeps data in memory (not only, but it does contribute).

Established methods for websites to serve database results

Are there established methods or frameworks for serving database rows to web clients? So far I just have them submit a JSON object ex
{
Query: "SELECT",
Schema: "icecream",
Table: "cones",
Fields: ["price", "flavor"],
Filters: [
{
"Comparison": "=",
"Field": "flavor",
"Value": "chocolate"
}
]
}
I verify that the fields mentioned are authorized/correct, then construct a prepared statement mysql string, but are there any frameworks, or standard methods of implementing this?
Truth be told, what you are doing is very unusual. You are effectively giving your web client full access to build queries for your database. You mention that you are validating the information before building the query, which is good, because the risk of SQL injection is very high. You are asking a very broad question, so I will respond with an equally broad answer:
No, there are no frameworks or standards to implementing this. The reason is because (unless you have a very specific reason for doing this), very few web applications are setup to give the web client such extensive control over the queries being built. Normally your backend APIs would intentionally be much more limited. You are effectively implementing an API method that says:
Give me the details of your query and I will build and execute it for you and return the result.
Normally standard operating procedure otherwise is to have much more specific and limited API methods. Rather than having a generic query builder you would have an API for each specific thing that has to happen:
Tell me how many records you want and a search value on this small handful of fields and I will return a list of matching users
Tell me how many records you want and which of these fields you want to sort by and I will return a list of matching books
That is not to say that there aren't perfectly valid reasons to do it the way you are trying to do it, but unless there is a reason why you specifically want to give users full control over the query building process, I think the first step is to refactor in a way that gives the web client less control, and your server-side application more.
It looks like you're reinventing a query language!
Why not allow your users to type SQL queries directly? Or mongodb queries, or whatever DBMS you use. There is much less overhead.
When it comes to security, a good practice is to setup a clone of your database (a read-only replica), and have your clients hit the read-only replica instead of the main database node.
Your main node and your read-only replica can stay in sync using replication. Any good DBMS supports it.
A good example of this is the Stack Exchange Data Explorer

Caching query results, to do or not to do, overkill or performance energizer?

Good evening,
my project uses the MEAN Stack and has a few collections and a single database from which the data is retrieved.
Thinking about how the user would interface itself with the webapp I am going to build, I figured that my idea of the application is quite a bit of a waste.
Now, the application is hosted on a private server on the LAN, making it very fast on requests and it's running an express server.
The application is made around employee management, services and places where the services can take place. Just describing, so to have an idea.
The "ring to rule them all" is pretty much the first collection, services, which starts the core of the application. There's a page that let's you add rows, one for each service that you intend to do and within that row you choose an employee to "run the service", based on characteristics that this employee has, meaning that if the service is about teaching Piano, the employee must know how to play Piano. The same logic works for the rest of the "columns" that will build up my row into a full service recognized by the app as such.
Now, what I said above is pretty much information retrieval from a database and logic to make the application model the information retrieved and build something with it.
My question or rather my doubt comes from how I imagined the querying to work for each field that is part of the service row. Right now I'm thinking about querying the database (mongodb) each time I have to pick a value for a field, but if you consider that I might want to add a 100 rows, each of which would have 10 fields, that would make up for a lot of requests to the database. Now, that doesn't seem elegant, nor intelligent to me, but I can't come up with a better solution or idea.
Any suggestions or rule of thumbs for a MEAN newb?
Thanks in advance!
EDIT: Answer to a comment question which was needed.
No, the database is pretty static (unless the user willingly inserts a new value, say a new employee that can do a service). That wouldn't happen very often. Considering the query that would return all the employees for a given service, those employees would (ideally) be inside an associative array, with the possibility to be "pop'd" from it if chosen for a service, making them unavailable for further services (because one person can't do two services at the same time). Hope I was clear, I'm surely not the best person at explaining oneself.
It would query the database on who is available when a user looks at that page and another query if the user assigns an employee to do a service.
In general 1 query on page load and another when data is submitted is standard.
You would only want to use an in memory cache for
frequent queries but most databases will do this automatically.
values that change frequently like:
How many users are connected
Last query sent
Something that happens on almost every query (>95%)

Performance issues with EmberJS and Rails 4 API

I have an EmberJS application which is powered by a Rails 4 REST API. The application works fine the way it is, however it is becoming very sluggish based on the kind of queries that are being performed.
Currently the API output is as follows:
"projects": [{
"id": 1,
"builds": [1, 2, 3, 4]
}]
The problem arises when a user has lots of projects with lots of builds split between them. EmberJS currently looks at builds key then makes a request to /builds?ids[]=1&ids[]=2 which is the kind of behaviour I want.
This question could have one of two solutions.
Update Rails to load the build_ids more efficiently
Update EmberJS to support different queries for builds
Option 1: Update Rails
I have tried various solutions regarding eager loading and manually grabbing the IDs using custom methods on the serializer. Both of these solution add a lot of extra code that I'd rather not do and still do individual queries per project.
By default rails also does SELECT * style queries when doing has_many and I can't figure out how to overwrite this at the serializer layer. I also wrote a horrible solution which got the entire thing to one fast query but it involved writing raw SQL which I know isn't the Rails way of doing things and I'd rather not have such a huge complex untestable query as the default scope.
Option 2: Make Ember use different queries
Instead of requesting /builds?ids[]=1&ids[]=2 I would rather not include the builds key at all on the project and make a request to /builds?project_id=1 when I access that variable within Ember. I think I can do this manually on a per field basis by using something similar to this:
builds: function () {
return this.store.find('builds', { project_id: this.get('id') });
}.property()
instead of the current:
builds: DS.hasMany('build', { async: true })
It's also worth mentioning that this doesn't only apply to "builds". There are 4 other keys on the project object that do the same thing so that's 4 queries per project.
Have you made sure that you have properly added indexes to your database? Adding and index on the builds table on project_id will make it work a lot faster.
Alternatively you should use the links attribute to load your records.
{"projects": [{
"id": 1,
"links": {
"builds": "/projects/1/builds"
}
}]}
This means that the builds table will only be queried when the relationships is accessed.
Things you can try:
Make sure your rails controller only selects the columns needed for JSON serialization.
Ensure you have indexes on the columns present in your where and join clauses unless the column is boolean or has low number of distinct values. Always ensure you have indexes on foreign key columns.
Be VERY VERY careful with how you are using ActiveRecord joins vs includes vs preload vs eager and references. This area is fraught with problems composing scopes together and subtle things can alter the SQL generated and number of queries issued and even what actual results are returned. I noticed differences in minor point releases of AR 4 yielding different query results because of the join strategy AR would choose.
Often you want to aim to reduce the number of SQL's issued to the database but joining tables is not always the best solution. You will need to benchmark and use EXPLAIN to see what works better for your queries. Sometimes sub queries/sub-selects can be more efficient.
Querying by parent_id is a good option if you can get Ember Data to perform the request that way as the database has a simpler query.
You could consider using Ember-Model instead of Ember-Data, I am using it currently as its much simpler and easier to adapt to my needs, and supports multi-fetch to avoid 1+N request problems.
You may be able to use embedded models or side-loaded models so your server can reduce the number of web requests AND the number of SQLs and return what the client needs in one request / one SQL. Ember-Model supports both embedded and side-loaded models, so Ember-Data being more ambitious may also.
Although it appears from your question that Ember-Data is doing a multi-fetch, make sure you are doing SQL IN clause for those ID's instead of separate queries.
Make sure that the SQL on your rails side is not fanning out in a 1+N pattern. Using the includes options to effect eager loading on AR relations may help avoid 1+N queries or it may unnecessarily load models depending on the results needed in your response.
I also found that the Ruby JSON serializer libraries are less than optimal. I created a gem ToJson that speeds up JSON serializing many times over the existing solutions. You can try it and benchmark for yourself.
I found that ActiveRecord (including AR 4) didn't work well for me and I moved to Sequel in the end because it gave me so much more control over join types, join conditions, and query composition and tactical eager loading, plus it was just faster, has wider support for standard SQL features and excellent support for postgres features and extensions. These things can make a huge difference to the way you design your database schema and the performance and types of queries you can achieve.
Using Sequel and ToJson I can serve around 30-50 times more requests than I could with ActiveRecord + JBuilder for most of my queries, and in some instances its hundreds times better than what I was achieving with AR (especially creates/updates). Besides Sequel being faster at instantiating models from the DB, it also has a Postgres streaming adapter which makes it even faster again for large results.
Changing your data access/ORM layer and JSON serialisation can achieve 30-50 times faster performance or alternatively require managing 30-50 less servers for the same load. It's nothing to sneeze at.

Categories

Resources