I've developed a multi user app where I can login and upload a csv file. The data is extracted from the csv file into objects (about 10000) and then the data/object(s) is/are inserted into mongodb. I've tried differnt mongodb insert option. The fastest is the bulk.insert option. It inserts about 200 objects per second. Following is my bulk.insert code:
const bulk = db.collection('mycollection').initializeUnorderedBulkOp();
bulk.insert(csvData);
bulk.execute();
But the issue with this bulk insert is that while data is inserted to mongodb I cannot view other pages (the pages that read data from the same db). Also, other users cannot login to the website. The website has just one db with different collections.
I tried normal insert option also. Following is the code:
await db.collection("mycollection").insert(csvData);
With this option, other users can login, I can view other pages. But, the insert speed is some 9 times slower (About 24 inserts per second).
With bulk operation, it seems, while mongodb is busy with one query, it does not respond to other queries. Please suggest me a way to increase insert speed and at the same time make the db accessible to others.
Related
I have been assigned with a task to handle large number of data and show then to a webpage tabular form. I'm using HTML/JSP and JS for Frontend and Java as backend.
Business logic is to query database (for me it's Oracle) and get data.
Query looks something like
Select field1, field2 etc.. from table where field1 = "SearchString"
Limit 30
The search string will be given by user.
So, each time the query gets executed I'm getting 30 rows and storing it in a bean.
And with field2 data from iteration 1 I'll execute the query again which will give another 30 rows, I will append those in the bean and loop continues untill there is no matching records. After that I need to display the bean data in UI in tabular form.
Now problem arises when the data is huge. Like, the iteration goes on 1000 times giving 30k records. Then the code is getting stuck in this loop for more time and UI screen is showing loading.
Is there a better approach to my situation?
Note : I can't do any operation the query. Because it's forbidden.
And the query is pseudo query not actual. If the first record has matching record of 30k rows. I need to take 30 in each iteration.
I agree with the comments that this is not the best practice when you are trying to present thousands and thousands of rows to the UI...
It really sounds like you should implement pagination on your UI. This is done by using queries... I don't know what DB system you are using but here is a guide on pagination for SQL Server.
You can explain to the business that using pagination is better for the user. Use the example of how google search gives you pages of search results instead of showing you millions of websites of cat pictures all in one page.
I'm using #google-cloud/bigquery and trying to import data to a table which I have JSON.
I see the table.createWriteStream() method - but I believe since that streams data it costs money, whereas methods bq load in the console is free.
So my two questions are:
1: Is using table.import() the equivilent free way to load data to a table?
2: How can I import data that i have in a variable without having to save it to a .json file first?
If you want to avoid streaming insert you should know that for load jobs there is a Daily limit: 1,000 load jobs per table per day. Streaming insert doesn't have this limit.
Streaming insert is extremely cheap, $0.05 per GB that's $50 for 1TB. Not sure how much volume you have, but usually people are not building around streaming insert because it's better suited.
Streaming insert is the recommended way to import data, as it's scalable, it has a nice per row error message, so you can retry rows and not the full file.
I am looking for a way to transfer about 1 million records from my Firebase's users table to a new MongoDB's users collection I've created on my own server. I want to bring in only some data such as the age, username, and name. I would use Mongo for "fast retrieval" for display in profile pages (because profile pages require me to get the user data before even loading any of the page's data, I am looking to optimize this part as much as possible).
Enough about context. Real question -- how may I get 1m records from Firebase? I tried fetching records by doing limitToFirst(500000) and then limitToLast(500000) (to then loop through the records and store in mongo) but that's still way too big.
Here's my setup:
I'm running a Node.js Web App in Azure, which is using Azure Table Storage (Non-SQL). To work with table storage I'm using the azure-storage npm module.
What I'm trying to do:
So I have a system that's tracking events for devices. In storage I'm setting my PartitionKey to be the deviceId and I'm setting the RowKey to be the eventId.
Adding events is straight forward; add them one at a time as they occur.
Retrieving them is easy using the query structure.
However, deleting large quantities of entries seems to be a pain. It appears you can only delete one entity at a time. There doesn't seem to be a query based implementation.
There is the option to use batches to create a large batch of delete operations; but I've just found that there is a cap of 100 operations per batch.
So I'm trying to delete all events for a single device; in my current case I have about 5000 events. So to achieve this I first have to query all my events with a GET request (and concatenate them using continuation tokens), then separate them into batches of 100, and then send 50 large requests in order to delete all the entries...
The same thing in SQL would be DELETE * WHERE deviceId='xxxxxxxx'
Surely there must be a better way than this!
The same thing in SQL would be DELETE * WHERE deviceId='xxxxxxxx'
Surely there must be a better way than this!
Sadly, there isn't :). You must fetch the entities based on your requirement and then delete them (either in batches or individually).
You can however optimize the fetching process by only fetching PartitionKey and RowKey from your table instead of all attributes as you only need these two attributes for deleting an entity.
I need to fetch huge data(may be some 10K records) from DB and show it as report(i use DataTable), and it has data filter/search and pagination.
Question - which one is best/recommended way from the below option,
I will fetch all the records at once and store it in front end(as a object) and if filter applies i will filter from the object and display it.
Likewise i wont interact with DB if i work with pagination(Since i have all the records with myself already)
Every time i need to contact the DB when i applies filter/search.
Likewise for pagination,
For example, if i select page 5 then i will send a query to DB to get me only those data and display it. Note: Number of record per page is also the option to select.
If we have any other best way, please guide me.
Thanks,
I am not familiar with DataTable, but it appears to be similar to jqGrid, which I'm familiar with.
I prefer your proposed solution #2. You are better off fetching only what you need. If you're only displaying, say, 100 rows, it's wasteful (both in terms of bandwidth and local memory usage) to fetch 10k rows at once if you're only displaying 100.
Use LIMIT on the MySQL side to fetch only the records you need. If you want, say, records 200 through 300 for page 3, you'd add LIMIT 200, 100 to the end of your query (the first parameter to LIMIT says "start at 200" and the second says "fetch 100 rows.") If DataTable works like jqGrid, you should be able to re-query the database and repopulate your table when the user changes pages, and this fetch will be done in the background with AJAX, which conserves bandwidth. Your query will be identical except for the range specified by the LIMIT at the end of your query.
Think of it this way: say you use GMail and you never archive your messages, so your inbox contains 20,000 emails, but only shows 100 per page. Do you think Google has designed the GMail front-end so that all 20k subject and from lines are fetched at once and stored locally, or is the server queried again when the user changes pages? (It's the latter.)