I am looking for a way to transfer about 1 million records from my Firebase's users table to a new MongoDB's users collection I've created on my own server. I want to bring in only some data such as the age, username, and name. I would use Mongo for "fast retrieval" for display in profile pages (because profile pages require me to get the user data before even loading any of the page's data, I am looking to optimize this part as much as possible).
Enough about context. Real question -- how may I get 1m records from Firebase? I tried fetching records by doing limitToFirst(500000) and then limitToLast(500000) (to then loop through the records and store in mongo) but that's still way too big.
Related
I've developed a multi user app where I can login and upload a csv file. The data is extracted from the csv file into objects (about 10000) and then the data/object(s) is/are inserted into mongodb. I've tried differnt mongodb insert option. The fastest is the bulk.insert option. It inserts about 200 objects per second. Following is my bulk.insert code:
const bulk = db.collection('mycollection').initializeUnorderedBulkOp();
bulk.insert(csvData);
bulk.execute();
But the issue with this bulk insert is that while data is inserted to mongodb I cannot view other pages (the pages that read data from the same db). Also, other users cannot login to the website. The website has just one db with different collections.
I tried normal insert option also. Following is the code:
await db.collection("mycollection").insert(csvData);
With this option, other users can login, I can view other pages. But, the insert speed is some 9 times slower (About 24 inserts per second).
With bulk operation, it seems, while mongodb is busy with one query, it does not respond to other queries. Please suggest me a way to increase insert speed and at the same time make the db accessible to others.
I have a search function on my site that needs to search through ~2000 items (this table never changes, it will always have the same number of items) stored in MySQL. All it needs to do is search by name or 'LIKE' the name and return the id. I have thought of two approaches to this:
Query database using ajax on keyup(), this seems like it would be expensive with many people searching
Send all the data to client as a JSON file when they load the page and search through it using JavaScript to reduce the load on the database
Which approach is better? Or if you have a better approach I am open to suggestions.
Depends on what your purpose. Do you want to provide search on fly functionality while user writes its search word or would like to return the result when the user clicks search button or something? Either way, you are talking about 2000 records which is not much actually. So I think you can store all those information into an offline memory cache and you can return your results from memory. SQL is also quite optimized for queries and it also caches the results so for 2000 records it shouldn't be any problem to perform the query to DB as well.
This may be a "stupid" question to ask, but I am working with a "a lot" of data for the first time.
What I want to do: Querying the World Bank API
Problem: The API is very unflexible when it comes to searching/filtering... I could query every country/indicator for it self, but I would generate a lot of calls. So I wanted to download all informations abourt a country or indicator at once and then sort them on the machine.
My Question: Where/How to store the data? Can I simply but it into an array, do I have to worry about size? Should I write to a temporary json file ? Or do you have another idea ?
Thanks for your time!
Example:
20 Countries, 15 Indicators
If I would query every country for itself I would generate 20*15 API calls, if I would call ALL countries for 1 indicator it would result in 15 API calls. I would get a lot of "junk" data :/
You can keep the data in RAM in an appropriate data structure (array or object) if the following are true:
The data is only needed temporarily (during one particular operation) or can easily be retrieved again if your server restarts.
If you have enough available RAM for your node.js process to store the data in RAM. In a typical server environment, there might be more than a GB of RAM available. I wouldn't recommend using all of that, but you could easily use 100MB of that for data storage.
Keeping it in RAM will likely make it faster and easier to interact with than storing it on disk. The data will, obviously, not be persistent across server restarts if it is in RAM.
If the data is needed long term and you only want to fetch it once and then have access to the data over and over again even if your server restarts of if the data is more than hundreds of MBs or if your server environment does not have a lot of RAM, then you will want to write the data to an appropriate database where it will persist and you can query it as needed.
If you don't know how large your data will be, you can write code to temporarily put it in an array/object and observe the memory usage of your node.js process after the data has been loaded.
I would suggest storing it in a nosql database, since you'll be working with JSON, and querying from there.
mongodb is very 'node friendly' - there's the native driver - https://github.com/mongodb/node-mongodb-native
or mongoose
Storing data from an external source you don't control brings with it the complexity of keeping the data in sync if the data happens to change. Without knowing your use case or the API it's hard to make recommendations. For example, are you sure you need the entire data set? Is there a way to filter down the data based on information you already have (user input, etc)?
I need to fetch huge data(may be some 10K records) from DB and show it as report(i use DataTable), and it has data filter/search and pagination.
Question - which one is best/recommended way from the below option,
I will fetch all the records at once and store it in front end(as a object) and if filter applies i will filter from the object and display it.
Likewise i wont interact with DB if i work with pagination(Since i have all the records with myself already)
Every time i need to contact the DB when i applies filter/search.
Likewise for pagination,
For example, if i select page 5 then i will send a query to DB to get me only those data and display it. Note: Number of record per page is also the option to select.
If we have any other best way, please guide me.
Thanks,
I am not familiar with DataTable, but it appears to be similar to jqGrid, which I'm familiar with.
I prefer your proposed solution #2. You are better off fetching only what you need. If you're only displaying, say, 100 rows, it's wasteful (both in terms of bandwidth and local memory usage) to fetch 10k rows at once if you're only displaying 100.
Use LIMIT on the MySQL side to fetch only the records you need. If you want, say, records 200 through 300 for page 3, you'd add LIMIT 200, 100 to the end of your query (the first parameter to LIMIT says "start at 200" and the second says "fetch 100 rows.") If DataTable works like jqGrid, you should be able to re-query the database and repopulate your table when the user changes pages, and this fetch will be done in the background with AJAX, which conserves bandwidth. Your query will be identical except for the range specified by the LIMIT at the end of your query.
Think of it this way: say you use GMail and you never archive your messages, so your inbox contains 20,000 emails, but only shows 100 per page. Do you think Google has designed the GMail front-end so that all 20k subject and from lines are fetched at once and stored locally, or is the server queried again when the user changes pages? (It's the latter.)
I am pulling JSON data from Salesforce. I can have roughly about 10 000 records, but never more. In order to prevent Api limits and having to hit Salesforce for every request, I thought I could query the data every hour and then store it in memory. Obviously this will be much much quicker, and much less error prone.
A JSON object would have about 10 properties and maybe one other nested JSON object with two or three properties.
I am using methods similar to below to query the records.
getUniqueProperty: function (data, property) {
return _.chain(data)
.sortBy(function(item) { return item[property]; })
.pluck(property)
.uniq()
.value();
}
My questions are
What would the ramifications be by storing the data into memory and working with the data in memory? I obviously don't want to block the sever by running heavy filtering on the data.
I have never used redis before, but would something like a caching db help?
Would it be best to maybe query the data every hour, and store the JSON response in something like Mongo. I would then do all my querying against Mongo as opposed to in-memory? Every hour I query Salesforce, I just flush the database and reinsert the data.
Storing your data in memory has a couple of disadvantages:
non-scalable — when you decide to use more processes, each process will need to make same api request;
fragile — if your process crashes you will lose the data.
Also working with large amount of data can block process for longer time than you would like.
Solution:
- use external storage! It can be redis, or MongoDB or RDBMS;
- update data in separate process, triggered with cron;
- don't drop the whole database: there is a chance that someone will make a request right after that (if your storage doesn't support transactions, of course), update records.