Storing and updating API Data in MongoDB

Storing and updating API Data in MongoDB - javascript

I am working on a web app where I have made a call to an API and stored the data using MongoDB. This data gets updated daily so I will need to be able to update the data daily by clicking a button in admin site. What is the best way to approach this?
I am new to using databases so I do not know the best approach. The reason I am wanting to store data in database is so I can store it using Redux or Context API so when someone goes to a page the data will be available faster instead of having to make a new API call (and wasting an API call) every time someone visits a page.
My database contains about 630 documents at a time.
Issue:
I need to update the 630 documents in my database to match the 630 documents coming from API that changes daily so I need to figure out what to query MongoDB to accomplish this.

You can use node-schedule.
It's very much like cron-job. But runs on the node application. Make sure the scheduler runs every 24 hours interval and put this database oprtation there.
Note that Node Schedule is designed for in-process scheduling, i.e.
scheduled jobs will only fire as long as your script is running, and
the schedule will disappear when execution completes. If you need to
schedule jobs that will persist even when your script isn't running,
consider using actual cron.

I ended up finding a way to solve my issue.
Previously I have updated a document one at a time so I used something along the lines of db.collection.update(<query>,<update>) to update one document. Issue I was facing was I needed to update all the documents in collection at one time. So by using db.collection.remove({}) I was able to remove all the documents in collection then used db.collection.create(myData) to add new updated data.

Related

Proper way to process Large number of records in React or Java

I have front end in ReactJS and backend api in Spring Boot. I have to show a drop down list with records from Api.
The scenario is i get a list of Users Ids from one end point, then for each record i have to call another api end point to get address details associated to that user ( it may have large number of records), the drop down is name of address.
My problem is if i loop through all records and get the address details, it will time out and take forever.
Can anyone suggest me the correct way to do it, Java or JavaScript.
I have read a little about observable, observer etc. but i did not get anything. Is there any concept of updating an object continuously.
Thanks

The shorter the distance the faster it can go. If you do this client side, the server has to send data on the network and then the client can process it. The best way would be to write a SQL query that does the right joins to get the data quickly. Send the processed data to client.

One good way to handle data is by implementing some kind of a pagination. You don't need to display all of the data in one go and it will be expensive to do so. For instance, if you have 1000 items on a list which you want to fetch, you could do it by 10s or 20s depending on your preference. This way, you could minimize the number of queries. Thus, making it much faster.
Here is an example of doing that on React JS. I'm only tapping into a Fake Online REST API
Hope this helps.

How to deal with database changes in a MEAN application

I have struggled to find many resources on this online. I am developing an application that multiple users will be using at the same time. This means that one user may edit the database after another user has loaded the data from database. This means that this second user will not have an up to date view of the current state of the database. What is the best way to subscribe to database changes and deal with them. I am using a MEAN stack.

If you are trying to develop a real time system where changes are reflected instantly upon changes in database, you need to make use of web sockets. Since you are using Node.js as backend, see Socket.io
A good resource for implementation can be found here
However, if you plan on implementing web sockets, you will have to make significant changes to both your Node.js and Angular code.
Another method (which I would not recommend) is to make periodic api calls for those views which you want to reflect real time changes. You can make use of setInterval for this

Api call request limit of 1 call each hour

I use an API in Switzerland, which allows me to request the api one time every hour in production.
I don't need more than one request each week, since it's event data, but I don't know what i have to do that i can use this api for 200+ users each day.
Do I have to save the data somewhere like firebase or are there services for this? I'm very new in this field. Could you give me a hint?

Building on top of what Dr. cool said, you'll most likely want to use cron jobs: http://code.tutsplus.com/tutorials/scheduling-tasks-with-cron-jobs--net-8800
Also keep in mind, some API's do not allow you to store the data they provide on your own server. Make sure you read the terms of use before doing so from the API provider.

It's better to have a program on the server that can run once a week and load data from the API. This data should be saved in a database. Then when one of your users needs the data, it's ready to load from your database without hitting the API limit.
Yes, Firebase is a great option. Or you can use MySQL or other server-side databases.

Writing entire SQL table whenever data changes on a Node server (weird one, so bear with me)

Okay, let me start by saying that I know this is weird. I do.
But here goes:
Let's say I have an SQL database which stores my data. And let's say I don't have a choice in this, it has to be SQL. The application I'm building has somewhere in the region of 100,000 records in its database, and once every single record has been processed by the users of the application, they all go off and get sent to a different application entirely. So for a short period of time, this application will be in use, and then stops being used until the same time next year. While the application is in use, no external sources will be touching the database at all.
When the (Node) server starts up, it loads everything from the database, into an object literal on the server.
The client-side of this application, on a very basic level, makes requests (to an API on the server) for data, and sends updated versions of records back to the server once they've been processed.
So here's where it gets weird: Let's say I don't want to have the client-side application have to directly retrieve records from the database, nor do I want it to be able to write to them. So the data from the entire database already exists in memory on the server. There's a module on the server that can handle changing the representation of that data already (again, because the client application only interacts with APIs on the server, the database module exists to facilitate this).
Multiple users access the system at once, but due to the way the system works, it is not possible for two users to be sent the same record, so two users will never be sending an update back for the same record (records are processed individually, and sequentially).
So, let's say that I decided that, since I was already managing all of this data in memory on the server, I would just send an updated version of the current data, in its entirety, back to the database, every time it changed.
The question is, where does this rank on the crazy scale?
Performance, writing an entire database rather than single records, would obviously suffer. But, in a database that is only read from once (on start-up of the application), is that even a concern? If every operation other than "Write all the stuff when any of the stuff changes" happened in memory on the server, does it matter how long those updates actually take? If a new update to the database comes in whilst it's being updated, surely SQL will take care of this?
It feels like the correct way to do this of course, is to have each user directly getting their info from the database, and directly making updates to the database too (or at least interacting with API endpoints to make this happen), but, is just...not doing that, utter lunacy?
Like I said, I know it's weird, but other than the fact that "it feels kind of wrong", I'm not sure I'm convinced that it is in fact entirely wrong. So I figured that this place would have an opinion.
The way that I think it currently works is:
[SQL DB] is updated whenever a change happens on {in-memory DB}
{in-memory DB} is updated in various ways based on API calls to the server
makes requests for data, and sends updates to data, both of which are processed on the in-memory DB
Multiple requests can happen at the same time from the application, but mutliple users can not see the same record, because records are allocated to a given user before they're sent
Multiple updates can come from multiple users, each of which ultimately ends in the entire SQL database being saved to with the contents of the in-memory DB.
(Note: I'm not saying "is this the best way to do this". I'm just asking, is there a significant argument for caring about the performance of a database being written to, if it's not going to be read from again unless the server needs to be restarted)

What I think that I would do, in this situation, is to add an attribute to each cached record to indicate that the record is "dirty." In other words, that something has been done to it, by someone, since it was originally read from the database.
(You could also add an attribute that indicates that someone "has this particular record 'checked-out,'" so that you can be sure that two users are no updating the same record at the same time.)
At some convenient moment, you can then walk through the collection, posting the "dirty" records back to the database. Use an SQL Transaction, not only for efficiency but also to be sure that the final update to the database is atomic.
You will need to be very mindful of the possibility of race-conditions. One possible strategy is to use a Unix timestamp as a "dirty" indicator. A record is selected for posting to the database only if its "dirty-time" is greater-than or equal-to the timestamp when the commit-process was last run.
(And, P.S.: "no, I've seen even 'weirder' things than this, in all my crazy years in this crazy business ...)

CouchDB - trigger code when creating or updating document

I have a page which store data in CouchDB. The page accesses the database directly via javascript, so not much of the logic is hidden from the browser. When creating a new document there is some logic which extracts elements of the data into separate fields so that they can be searched on.
Is it possible to do this logic on the server when creating or updating the documents, or am I stuck doing it before hitting the database?

You have a couple of options.
First, see this question about CouchDB update functions. Update functions receive a request from the browser and can modify them in any way before finally storing them in CouchDB. For example, some people use them to automatically add a timestamp. Also see the wiki page on CouchDB document update handlers.
Another option is to receive CouchDB change notifications. In this case, a separate program (either your own browser, or even better, a standalone program that you run) can query CouchDB for _changes. CouchDB will notify this program after the document is saved. Next, the program can fetch the document and then store any new revisions that are necessary.
To me, it sounds like you should try the _update function first.

Develop Reference

JavaScript is the programming language of the Web.