So imagine, I want to retrieve all orders for an array of customers.
The arrayList in the example below will have an array of customer IDs.
This array will be passed into the get method below and processed asynchronously retrieving orders for each customer ID in the array.
Here's where I get lost. How can you paginate the database result set and pull only a small set of records at a time from the database without bring having to pull all the records across the network.
What's confusing me is the asynchronous nature as well as we won't know how many orders per customer there are? So how can you efficiently return a set page size at a time?
service.js
function callToAnotherService(id) {
return new Promise((resolve, reject) => {
//calls service passing id
}
}
exports.get = arrayList => Promise.all(arrayList.map(callToAnotherService))
.then((result) => result);
In MySQL there are more than one way to achieve this.
The method you choose depends on many variables, such your actual pagination method (whether you just want to have "previous" and "next" buttons, or actually want to provide range from 1...n, where n is total number of matching records divided by your your per page records count); also on database design, planned growth, partitioning and/or sharding, current and predicted database load, possible hard query limits (for example if you have years worth of records, you might require end user to choose a reasonable time range for the query (last month, last 3 months, last year, and so on...), so they don't overload the database with unrestricted and too broad queries, etc..
To paginate:
using simple previous and next buttons, you
can use the simple LIMIT [START_AT_RECORD,]
NUMBER_OF_RECORDS method, as Rick
James proposed
above.
using (all) page numbers, you need to know the number of matching
records, so based on your page size you'd know how many total pages
there'd be.
using a mix of two methods above. For example you could present a few clickable page numbers (previous/next 5 for example), as well as first and last links/buttons.
If you choose one of the last two options, you'd definitely need to know the total number of found records.
As I said above, there is more than one way to achieve the same goal. The choice must be made depending on the circumstances. Below I'm describing couple simpler ideas:
FIRST:
If your solution is session based, and you can persist the session, then you can use a temporary table into which you could select only order_id (assuming it's a primary key in the orders table). Optionally, if you want to get the counts (or otherwise filter) per customer, you can also add the second column as customer_id next to order_id from the orders table. Once you have propagated the temporary table with minimum data, you can just easily count rows from the temporary table and create your pagination based on that number.
Now as you start displaying pages, you only select the subset of these rows (using the LIMIT method above), and join the corresponding records (the rest of the columns) from orders on temporary table order_id.
This has two benefits: 1) Browsing records page-by-page would be fast, as it's not querying the (presumably) large orders table any more.2)You're not using aggregate queries on the orders table, as depending on the number of records, and the design, these would have pretty bad performance, as well as potentially impacting the performance of other concurrent users.
Just bear in mind that the initial temporary table creation would be a bit slower query. But it'd definitely be even more slower if you didn't restrict temporary table to only essential columns.Still, it's really advisable that you set a reasonable maximum hard limit (number of temporary table records, or some time range) for the initial query
SECOND:
This is my favourite, as with this method I've been able to solve customers' huge database (or specific queries) performance issues on more than one occasion. And we're talking about going from 50-55 sec query time down to milliseconds. This method is especially immune to database scalability related slow downs.
The main idea is that you can pre-calculate all kinds of aggregates (be that cumulative sum of products, or number of orders per customer, etc...). For that you can create an additional table to hold the aggregates (count of orders per customer in your example).
And now comes the most important part:
You must use custom database triggers, namely in your case you can use ON INSERT and ON DELETE triggers, which would update the aggregates table and would increase/decrease the order count for the specific customer, depending on whether an order was added/deleted. Triggers can fire either before or after the triggering table change, depending on how you set them up.
Triggers have virtually no overhead on the database, as they only fire quickly once per (inserted/deleted) record (unless you do something stupid and for example run COUNT(...) query from some big table, which would completely defeat the purpose anyway)I usually go even more granular, by having counts/sums per customer per month, etc...
When done properly it's virtually impossible for aggregate counts to go out of sync with the actual records. If you application enables order's customer_id change, you might also need to add ON UPDATE trigger, so the customer id change for order would automatically get reflected in the aggregates table.
Of course there are many more ways you can go with this. But these two above have proven to be great. It's all depending on the circumstances...
I'm hoping that my somewhat abstract answer can lead you on the right path, as I could only answer based on the little information your question presented...
In MySQL, use ORDER BY ... LIMIT 30, 10 to skip 30 rows and grab 10.
Better yet remember where you left off (let's say $left_off), then do
WHERE id > $left_off
ORDER BY id
LIMIT 10
The last row you grab is the new 'left_off'.
Even better is that, but with LIMIT 11. Then you can show 10, but also discover whether there are more (by the existence of an 11th row being returned from the SELECT.
Related
I have a collection of orders and by default I want to create a field called priority that, at the time of creation, is the size of the collection. That's the only way I can think of to get a auto-incrementing number in Firestore.
My only issue is that the orders get added via a batch write. So even if I fire off an onCreate cloud function, the possibility of querying the collection size and getting back the same value is, I think, high.
My reason for wanting the priority this way is because changing the priority is a dragging and dropping rows on a table on the front-end. I just swap the priorities and write the new priorities to Firestore. That way when the table is loaded up again, I can sortBy priority and deliver a sensible experience.
How can I get a unique, auto-incrementing number for this use case?
I know this is subjective, but I've spent way too much time thinking about this as it is.
Just looking to see if there's an elegant solution to this problem:
Is there a way to loop through the results of a psql query and return a specific result based on the SQL query?
For example, let's say I wanted to SELECT amount_available FROM lenders ORDER BY interest_rate, and I wanted to loop through the column looking for any available amounts, add those available amounts to a variable, and then once that amount reached a certain figure, exit.
More verbose example:
Let's say I have someone who wants to borrow $400. I want to go through my lenders table, and look for any lender that has available funds to lend. Additionally, I want to start looking at lenders that are offering the lowest interest rate. How could I query the database and find the results that satisfy the $400 loan at the lowest interest rate, and stop once I've reached my goal, instead of searching the whole db? And can I do that inside a JavaScript function, returning those records that meet that criteria?
Maybe I'm trying to do something that's not possible, but just curious.
Thanks!
You translate your requirement into the SQL language. After all, SQL is a descriptive language. The database engine then figures out how to process the request.
Your example sound like
SELECT name
FROM lenders
WHERE property >= 400
ORDER BY interest_rate
FETCH FIRST ROW ONLY;
I'm looking for suggestions on how to go about handling the following use case scenario with python django framework, i'm also open to using javascript libraries/ajax.
I'm working with pre-existing table/model called revenue_code with over 600 million rows of data.
The user will need to search three fields within one search (code, description, room) and be able to select multiple search results similar to kendo controls multi select. I first started off by combining the codes in django-filters as shown below, but my application became unresponsive, after waiting 10-15 minutes i was able to view the search results but couldn't select anything.
https://simpleisbetterthancomplex.com/tutorial/2016/11/28/how-to-filter-querysets-dynamically.html
I've also tried to use kendo controls, select2, and chosen because i need the user to be able to select as many rev codes as they need upward to 10-20, but all gave the same unresponsive page when it attempted to load the data into the control/multi-select.
Essentially what I'm looking for is something like this below, which allows the user to select multiple selections and will handle a massive amount of data without becoming unresponsive? Ideally i'd like to be able to query my search without displaying all the data.
https://petercuret.com/add-ajax-to-django-without-writing-javascript/
Is Django framework meant to handle this type of volume. Would it be better to export this data into a file and read the file? I'm not looking for code, just some pointers on how to handle this use case.
What the basic mechanism of "searching 600 millions"? Basically how database do that is to build an index, before search-time, and sufficiently general enough for different types of query, and then at search time you just search on the index - which is much smaller (to put into memory) and faster. But no matter what, "searching" by its nature, have no "pagination" concept - and if 600 millions record cannot go into memory at the same time, then multiple swapping out and in of parts of the 600 millions records is needed - the more parts then the slower the operation. These are hidden behind the algorithms in databases like MySQL etc.
There are very compact representation like bitmap index which can allow you to search on data like male/female very fast, or any data where you can use one bit per piece of information.
So whether Django or not, does not really matters. What matters is the tuning of database, the design of tables to facilitate the queries (types of indices), and the total amount of memory at server end to keep the data in memory.
Check this out:
https://dba.stackexchange.com/questions/20335/can-mysql-reasonably-perform-queries-on-billions-of-rows
https://serverfault.com/questions/168247/mysql-working-with-192-trillion-records-yes-192-trillion
How many rows are 'too many' for a MySQL table?
You can't load all the data into your page at once. 600 million records is too many.
Since you mentioned select2, have a look at their example with pagination.
The trick is to limit your SQL results to maybe 100 or so at a time. When the user scrolls to the bottom of the list, it can automatically load in more.
Send the search query to the server, and do the filtering in SQL (or NoSQL or whatever you use). Database engines are built for that. Don't try filtering/sorting in JS with that many records.
I have a Project Task list using JavaScript and the jQuery Sortable library to allow easy drag and drop sorting of the the Task list items.
When the sort order changes, the order is saved to a MySQL database. Currently if there are 100 task list items and item number 2 is moved to position 3. This would make records 2-100 all be updated in the database with a sort_order column on the task record DB table.
This is obviously not the most efficient solution since it can cause huge amounts of DB records to be updated at once.
Another issue is this will cause the date_modified column I have on my task records to all be updated when nothing on the task has changed except a sort order number which could be caused from other records shifting position. Obviously there are ways around this and I currently have coded in a huge complex SQL query which makes sure to not update my date_modified column unless other columns on the table are modified.
So the reason of this post is to explore other alternative methods of storing my sort order on large volume of records.
One idea I have is to create new DB table task_sort_order which could have the columns task_id and sort_order. This table would then hold a record for every single Task record. The benefit of this method is that my SQL would be simplified since I would not have to worry about a Task record date fields being updated when only the sort order has changed. It would however still require the same amount of records to be updated at once. In my original example it would still be mass updating 99 records out of 100 in 1 query. It does seem to have benefits over my current though.
Are there any other ways to improve upon storing large numbers of sort orders to a database?
I have seen some people who will also save each sort order number in increments of 10 or another number so that if for example there is 5 records 1=10, 2=20, 3=30, 4=40, 5=50 and record number 2 moved to position 4. It would have a new sort order of 1=10, 3=30, 4=40, 2=44, 5=50 so in this example only record number 2 would have its sort_order value changed and saved to the database.
This seems rather complex as many records start getting moved around and you end up with all sorts of weird odd numbers and it would also have to somehow calculate the correct number to make sure a new moved item is not conflicting with a previously moved item.
As you can seem I am open to hearing any ideas for other methods or even ways to improve my current method?
I think your problem lies in the shortcomings of how you store the order in the DB. Two ideas:
You can try to implement the concept of a linked list in your relational DB schema, i.e. per item you store a "reference" to the next item. In this way you will only need to update a few records per rearrangement.
You can use graph DBs such as Neo4j or OrientDB where a linked list would be just one of possible data entry linkages that the DBs support natively.
I am running MySQL 5.6. I have a number of various "name" columns in the database (in various tables). These get imported every year by each customer as a CSV data dump. There are a number of places that these names are displayed throughout this website. The issue is, the names have almost no formatting (and to this point, no sanitization existed upon importation):
Phil Eaton, PHIL EATON, Phil EATON, etc.
Thus, the website sometimes look like a mess when these names are involved. There are a number of ways that I can think to do this, but none that are particularly appealing.
First, I can have a filter in Javascript. However, as I said, these names exist in a number of places throughout this (large) site. I may end up missing a page. The names do not exist already within nice "name"-classed divs/spans, etc.
Second, I could filter in PHP (the backend). This seems about as effective as doing it in Javascript. I could do it on the API, but there was still not a central method for pulling names from the database. So I could still miss an API call anyway.
Finally, the obvious "best" way is to sanitize the existing data in place for each name column. Then at the same time, immediately start sanitizing all names that get imported each time we add a customer. The issue with the first part of this is that there are hundreds of millions of rows of names in the database. Updating these could take a long amount of time and be disruptive to the clients' daily routines.
So, the most appealing way to correct this in the short-term is to invoke a function every time a column is selected. In this way I could "decorate" every name column with a formatting function so the names will appear uniform on the frontend. So ultimately, my question is: is it possible to invoke a specific function in SQL to format each row every time a specific column is selected? In other words, maybe can I call a stored procedure every time a column is selected? (Point being, I'm trying to keep the formatting in SQL to avoid the propagation of usage.)
In MySQL you can't trigger something on SELECT, but I have an idea (it's only an idea, now I don't have time to try it, sorry).
You probably can create a VIEW on this table, with the same structure, but with the stored procedure applied to the names fields, and select from this view in your PHP.
But it has two backdraw:
You have to modify all your SELECT statements in your PHPs.
The server will always call that procedure. Maybe you can store the formatted values, then check for it (cache them).
On the other hand I agree with HLGEM, I also suggest to format the data on import, because it's a very bad practice to import something you don't check into a DB (SQL Injections?). The batch tasking is also a good idea to clean up the mess.
I presume names are called frequently so invoking a sanitization function every time they are called could severely slow down your system. Further, you can't just do a simple setting to get this, you would have to change every buit of SQL code that is run that includes names.
Personally how I would handle it is to fix the imports so they put in a sanitized version for new names. It is a bad idea to directly put any data into a database without some sort of staging and clean up.
Then I would tackle the old names and fix them in batches in a nightly run that is scheduled when the fewest people are using the system. You would have to do some testing on dev to determine how big a batch you could run without interfering with other things the database is doing. The alrger the batch the sooner you would get through all the names, but even though this will take time, it is the surest method of getting the data cleaned up and over time the data will appear better to the users. If the design of your datbase allows you to identify which are the more active names (such as an is_active flag for a customer or am order in the last year), I would prioritize the update by that. Alternatively, you could clean up one client at a time starting with whichever one has noticed the problem and is driving this change.
Other answers before give some possible solutions. But, the short answer for the specific option you are asking is : No. There is no such thing called a
"Select Statement Trigger", that too for a single column, although triggers come close for this kind of expectation, but only for Insert, Update and Delete operations.