I'm currently working on a node/react application that will make use of a food database API like the one offered by the USDA.
My question is, how are these APIs best used in order to limit API calls?
For example, autocomplete, as a user is typing into the search field for a food, i want to give them a list of say 10-20 possible options based on what they typed. However, as they continue to refine what they type in, this is going to generate multiple API calls in order to return possible items. What is the best way to limit the amount of calls?
Is it better to somehow store the data locally? It's a massive database that the USDA has, I'm not sure it's feasible or even allowed to be copied necessarily.
Any other time I have dealt with APIs it has just been a call here or there but nothing like this and I guess I'm just confused on where to get started with it.
Related
I have 2 requirements in my application:
I have multiple clients, which should be completely separated
Each client can have multiple subsidiaries that he should be able to switch between without re-authenticating but the data should be separated (e.g. all vendors in subsidiary 1 should not be shown in subsidiary 2)
As for the first requirement, I'm thinking of using a multi-tenancy architecture. That is, there will be one API instance, one frontend instance per customer and one database per customer. Each request from the frontend will include a tenant ID by which the API decides which database it needs to connect to / use. I would use mongoose's useDb method for this.
Question 1: is this method a good approach and/or are there any known drawbacks performance wise? I'm using this article as a reference.
As for the second requirement, I would need to somehow logically separate certain schemas. E.g., I have my mongoose vendorSchema. But I would need to somehow separate the entries per subsidiary. I could only imagine to add a field to each of these "shared schemas" e.g.
const vendorSchema = new mongoose.Schema({
/* other fields... */
subsidiary {
type: mongoose.Schema.Types.ObjectId,
ref: "Subsidiary",
required: true
}
})
and then having to use this a subsidiary in every request to the API to use in the mongoose query to find the right data. That seems like a bad architectural decision and an overhead though, and seems little scalable.
Question 2: Is there a better approach to achieve this logical separation as per subsidiary for every "shared" schema?
Thanks in advance for any help!
To maybe answer part of your question..
A multi tenant application is, well normal.. I honestly don't know of any web-app that would be single tenant, unless it's just a personal app.
With that said the architecture you have will work but as noted in my comments there is no need to have a separate DB for each users, this would be a bit overkill and is the reason why SQL or Mongo queries exist.
Performance wise, in general database servers are very performant, that's what they are designed for, but this will rely on many factors
Number of requests
size of requests
DB optimization
Query optimization
Resources of DB server
I'm sure there are many more I didn't list but you get the idea..
To your second question, yes you could add a 'Subsidiary' field, this would say be the subsidiary ID, so then when you query Mongo you use where subsidiar = 'id' this would then return only the items for said user...
From the standpoint of multiple request to mongo for each API call, yah you want to try and limit the number of calls each time but thats where caching comes in, using something like redis to store the responses for x minutes etc. Then the response is mainly handled by redis, but again this is going to depend a lot on the size of the responses and frequency etc.
But this actually leads into why I was asking about DB choices, Mongo works really well for frequently changing schemas with little to no relation to each other. We use Mongo for an a chat application and it works really well for that because it's more or less just a JSON store for us with simply querying for chats but the second you need to have data relate to each other it can start to get tricky and end up costing you more time and resources trying to hack around mongo to do the same task.
I would say it could be worth doing an exercise where you look at your current data structure, where it is today and where it might go in the future. If you can foresee having your data related in anyway in the future or maybe even crypto ( yes mongo does have this but its only in the enterprise version) then it may be something to look at.
how to make relation for no sql database?
You can say it for firebase database, where database is in json
format.
A NO SQL database means that database has no relations. Either go with SQL database and convert JSON format or face the truth that in NOSQL there are no relations.
Before anything, I will give you an example of how to try to do this, but I want to tell you that I wouldn't actually do this again. Firebase is no good replace for MySQL.
Making relations for a noSQL database is not possible, but you could always make them "manually" and decide how to work with it.
What I meant with "manually" is that you can duplicate data for that, but that's not a very good option. For example, I made long ago an Android app to manage neightbour communities, and because of the time I had to do it I decided to make it with Firebase.
And I will never do it again, to be honest. I didn't want to lose time on an API, but I lost it anyway trying to structure everything nicely and with all the changes I had to make every 2 days so everything wouldn't fail.
Here you have an example. The database has 2 nodes, the communities and the users.
The users have these fields:
And, meanwhile, the communities have an incidences list, and those store the email of its author as any other field (image not relatable, they are random ones).
So, TLDR: No, you can't make relations. The only way to do what you want is with duplicated data, like the email of its author on the incidence.
PS: I made a chat app with Firebase in my company, I would share the DB with you so you can see the structure, but it's confidential, you know.
Good evening,
my project uses the MEAN Stack and has a few collections and a single database from which the data is retrieved.
Thinking about how the user would interface itself with the webapp I am going to build, I figured that my idea of the application is quite a bit of a waste.
Now, the application is hosted on a private server on the LAN, making it very fast on requests and it's running an express server.
The application is made around employee management, services and places where the services can take place. Just describing, so to have an idea.
The "ring to rule them all" is pretty much the first collection, services, which starts the core of the application. There's a page that let's you add rows, one for each service that you intend to do and within that row you choose an employee to "run the service", based on characteristics that this employee has, meaning that if the service is about teaching Piano, the employee must know how to play Piano. The same logic works for the rest of the "columns" that will build up my row into a full service recognized by the app as such.
Now, what I said above is pretty much information retrieval from a database and logic to make the application model the information retrieved and build something with it.
My question or rather my doubt comes from how I imagined the querying to work for each field that is part of the service row. Right now I'm thinking about querying the database (mongodb) each time I have to pick a value for a field, but if you consider that I might want to add a 100 rows, each of which would have 10 fields, that would make up for a lot of requests to the database. Now, that doesn't seem elegant, nor intelligent to me, but I can't come up with a better solution or idea.
Any suggestions or rule of thumbs for a MEAN newb?
Thanks in advance!
EDIT: Answer to a comment question which was needed.
No, the database is pretty static (unless the user willingly inserts a new value, say a new employee that can do a service). That wouldn't happen very often. Considering the query that would return all the employees for a given service, those employees would (ideally) be inside an associative array, with the possibility to be "pop'd" from it if chosen for a service, making them unavailable for further services (because one person can't do two services at the same time). Hope I was clear, I'm surely not the best person at explaining oneself.
It would query the database on who is available when a user looks at that page and another query if the user assigns an employee to do a service.
In general 1 query on page load and another when data is submitted is standard.
You would only want to use an in memory cache for
frequent queries but most databases will do this automatically.
values that change frequently like:
How many users are connected
Last query sent
Something that happens on almost every query (>95%)
I'm building an instant messenger on mobile client that interacts with RESTful API through HTTP requests. The pagination endpoint is quite standard - it has starting location (offset) and number of items in a page (limit). I'm having trouble figuring out how to ensure 100% data consistency with pagination when the database can rapidly change.
For example, with some dozen participants, there could be a dozen new messages in a conversation within a second. I don't think it's far-fetched to guess that some of those messages can alter the database within the time the HTTP request for pagination comes back from the server. Fortunately, since this is a messenger I do not have to consider the possibility of data deletion and consider only the data addition.
Among my research, following two links were quite helpful but didn't provide clear solution:
How to ensure data integrity in paginated REST API?
How to implement robust pagination with a RESTful API when the resultset can change?
The only potential solution I can come up with is using the timestamp of the last object in the previously fetched page. So the HTTP query would have timestamp as a parameter, and the server would return a page of objects created after that timestamp.
Is there any potential problem I'm not seeing, or even better, a much better solution to this issue?
It seems that the method I've thought of has a name - cursor based pagination.
The link below has a great graphical description and explanation, plus an example in php.
http://www.sitepoint.com/paginating-real-time-data-cursor-based-pagination/
There's also a helpful guide from Django Framework that compares two different pagination techniques (LimitOffsetPagination and CursorPagination).
http://www.django-rest-framework.org/api-guide/pagination/
Cursor based pagination requires a unique, unchanging ordering of items. Facebook and Twitter use some generated IDs. As for me, I've decided to simply use timestamp at object creation, as it supports up to milliseconds precision. That should be good enough for now.
I have a question about how to approach a certain scenario before I get halfway through it and figure out it was not the best option.
I work for a large company that has a team that creates tools for the team mates to use that aren’t official enterprise tools. We have no access to the database directly, just access to an internal server to store our files to run and be able to access the main site with javascript etc (same domain).
What I am working on is a tool that has a ton of options in it that allow you to select that I will call “data points” on a page.
There are things like “Account status, Balance, Name, Phone number, email etc” and have it save those to an excel sheet.
So you input account numbers, choose what you need and then using IE Objects it navigates to the page and scrapes data you request.
My question is as follows..
I want to make the scraping part pretty Dynamic in the way it works. I want to be able to add new datapoints on the fly.
My goal or idea is so store the regular expression needed to get the specific piece of data in the table with the “data point option”.
If I choose “Name” it knows the expression for name in the database to run again the DOM.
What would be the best way about creating that type of function in Javascript / Jquery?
I need to pass a Regex to a function, have it run against the DOM and then return the result.
I have a feeling that there will be things that require more than 1 step to get the information etc.
I am just trying to think of the best way to approach it without having to hardcode 200+ expressions into the file as the page may get updated and need to be changed.
Any ideas?
IRobotSoft scraper may be the tool you are looking for. Check this forum and see if questions are similar to what you are doing: http://irobotsoft.org/bb/YaBB.pl?board=newcomer. It is free.
What it uses is not regular expression but a language called HTQL, which may be more suitable for extracting web pages. It also supports regular expression, but not as the main language.
It organizes all your actions well with a visual interface, so you can dynamically compose actions or tasks for changing needs.