Efficiently implementing pagination in postgres while avoiding duplicates?

Efficiently implementing pagination in postgres while avoiding duplicates? - javascript

Let's say you're making a site like reddit and you want to have infinitely scrolling posts (25 at a time, and 25 more loaded when you reach the bottom)
The naive solution uses LIMIT AND OFFSET, but these are not entirely desirable due to performance considerations and also because if a new post is added to a page while you're on it, when you navigate to the next page, there will be a duplicate.
So solutions instead recommend using a WHERE clause and then sorting by some discrete, unique value. When loading more pages you pass the current value and then the database knows where to go from there.
I would use that, except in my situation these values are not unique. For example, one of the sorting options for posts involves non-unique values. So the former solution wouldn't work because it would possibly contain duplicates.
One idea I had is to just return 1,000 or so post id's on the initial page load. Then the client would send the first 25 post ids to the server to retrieve those. If they scrolled down to the bottom then they would send the next 25 to get the post data for those, etc.
The only issue with this is that it is also not perfectly efficient. If the user doesn't scroll at all it was a waste to send those 1000 post ids.
Is there a proper solution to this? How does one efficiently handle pagination that eliminates duplicates when the sorting option involves non-unique values?

Just add an unique column as a last column to sort on. If the user wants to sort his books by "author, title", just change it to "author, title, book_id".

Related

Paginating firestore data when using vuex and appending new data to the state

I implemented the following to display a paginated query (this was suggested by Tony O'Hagan in this post: How to get the last document from a VueFire query):
bindUsers: firestoreAction(({ bindFirestoreRef }) => {
return bindFirestoreRef('users',
Firebase.firestore().collection('users').limit(8), { serialize })
}),
bindMoreUsers: firestoreAction(context => {
return context.bindFirestoreRef('users', Firebase.firestore().collection('users').startAfter(context.state.users[context.state.users.length - 1]._doc).limit(8), { serialize })
})
When the user scrolls to the end of the page, I call bindMoreUsers which updates the state.users to the next set of 8 documents. I need to be able to append to the state.users as opposed to overwrite the original set of 8 documents. How can I do this?

Confession: I've not yet implemented pagination on my current app but here's how I'd approach it.
In my previous answer I explained how to keep references to the Firestore doc objects inside each element of the state array that is bound by VuexFire or VueFire. In Solution #1 below we use these doc objects to implement Firestore's recommended cursor based pagination of a query result sets using startAfter(doc) query condition instead of the slower more expensive offset clause.
Keep in mind that since we're using Vuexfire/Vuefire we're saying that we wish to subscribe to live changes to our query so our bound query will define precisely what ends up in our bound array.
Solution #1. Paging forward/backward loads and displays a horizontal slice of the full dataset (our bound array maintains the same size = page size). This is not what you requested but might be a preferred solution given the Cons of other solutions.
Pros: Server: For large datasets, this pagination query will execute with least cost and delay.
Pros: Client: Maintains a small in memory footprint and will render fastest.
Cons: Pagination will likely not feel like scrolling. UI will likely just have buttons to go fwd/backward.
Page Forward: Get the doc object from the last element of our state array and apply a startAfter(doc) condition to our updated view query that binds our array to the next page.
Page Backward: Bit Harder! Get the doc object from the first element of our bound state array. Run our page query with startAfter(doc), limit (1), offset(pagesize-1) and reverse sort order. The result is the starting doc (pageDoc) of the previous page. Now use startAfter(pageDoc) and forward sort order and limit(pageSize) to rebind the state array (same query as Page Forward but with doc = pageDoc).
NOTE: In the general case, I'd argue that we can't just keep the pageDoc values from previous pages (to avoid our reverse query) since we're treating this as a 'live' update filtered list so the number of items still remaining from previous pages could have radically changed since we scrolled down. Your specific application might not expect this rate of change so perhaps keeping past pageDoc values would be smarter.
Solution #2. Paging forward, extends the size of the query result and bound array.
Pros: UX feels like normal scrolling since our array grows.
Pros: Don't need to use serializer trick since we're not using startAfter() or endBefore()
Cons: Server: You're reloading from Firestore the entire array up to the new page every time you rebind to a new page and then getting live updates for growing array. All those doc reads could get pricey!
Cons: Client: Rendering may get slower as you page forward - though shadow DOM may fix this. UI might flicker as you reload each time so more UI magic tricks needed (delay rendering until array is fully updated).
Pros: Might work well if we're using an infinite scrolling feature. I'd have to test it.
Page Forward: Add pageSize to our query limit and rebind - which will re-query Firestore and reload everything.
Page Backward: Subtract pageSize from our query limit and rebind/reload (or not!). May also need to update our scroll position.
Solution #3. Hybrid of Solution #1 and #2. We could elect to use live Vuexfire/Vuefire binding for just a slice of our query/collection (like solution #1) and use a computed function to concat it with an array containing the pages of data we've already loaded.
Pros: Reduces the Firestore query cost and query delay but now with a smooth scrolling look and feel so can use Infinite scrolling UI. Hand me a Koolaid!
Cons: We'll have to try to keep track of which part of our array is displayed and make that part bound and so live updated.
Page Forward/Backward: Same deal as Solution #1 for binding the current page of data, except we now have to copy the previous page of data into our non-live array of data and code a small computed function to concat() the two arrays and then bind the UI list to this computed array.
Solution #3a We can cheat and not actually keep the invisible earlier pages of data. Instead we just replace each page with a div (or similar) of the same height ;) so our scrolling looks we've scrolled down the same distance. As we scroll back we'll need to remove our sneaky previous page div and replace it with the newly bound data. If you're using infinite scrolling, to make the scrolling UX nice and smooth you will need to preload an additional page ahead or behind so it's already loaded well before you scroll to the page break. Some infinite scroll APIs don't support this.
Solution #1 & #3 probably needs a Cookbook PR to VueFire or a nice MIT'd / NPM library. Any takers?

API with paginated data, seeking advice

So i'm seeking for a couple of questions to be answered. I am using a api which returns a list of products (15000+) how ever they use pagination so it only returns 20 per page.
I would like to be able to show all of this data on my shop so users can search through it etc... however, issue... it takes A LONG time to loop through it etc etc.
Is there a good method to do this? Shall I just loop through all the data and allow it to be added into an array once loaded? Is there something "special" we can do with the pagination?
I am new to this, and just seeking advice on the above.
Kind Regards,
Josh

There are a few thoughts that strike me straight away so let's cover those first:
From a pure UX perspective, it's very VERY unlikely that any user will ever need or click through 15k+ rows of whatever. So loading them all doesn't serve your user even if you could figure out how to do this in a efficient way.
Instead, look at what serves your users which likely in this case is some sort of filtering or search options. I would look into if your API has any support for things like categories (that should be a set smaller than 1 request maybe) which is much easier to display to get the user to cut down a lot of the data set. Then also look into if they offer some sort of query or search filter, maybe the names of whatever is being displayed. This further lets your users zoom down to a dataset that is manageable (roughly 100 items max). From there, 20 items per page is just 5 pages. Still though, you should only really load 1 page at a time and focus on better ways to offer SORTING, if you can find what the user needs on the first page, you don't need to load those 4 other pages. Hope that gives you some ideas of what to look for inside your API, or what to add if you can add it yourself.
If not, perhaps it would be worth considering loading the data into your own database and set up some background/nightly task that fetches any updates from the API and stores them. Then you build your own API around your own database that has functionality for filtering/searching.
A final option is indeed to simply ask for the first page, and then display that while you wait for the 2nd page to load. But this risks making an awful amount of wasted requests which wastes not only your users bandwidth but also puts pressure on the API for what is likely going to be wasted work. So there are a few other UX ideas around this as well, like infinite scrolling. Load the first 1 or 2 pages then stop, until the users scrolls past the first page and a half, then request page 3 etc. This way you only load pages as the user scrolls but it's a bit more fluid than pagination with numbering. Still, you'd likely want to offer some way to sort this set so that it becomes more likely that they'll find what they need in the first few "pages".

Speed comparison in React: Paginated table vs Scrollable table for column sort

Suppose we have two tables, one is Paginated and other is Scrollable. Both have them allow sorting of records by clicking on any column header.
Let's suppose the table has 5000 records of 6 columns. When the user clicks on any of the column to sort, my understanding is that the whole 5000 records will get sorted and my table state will get updated.
In case of Pagination, since I am only rendering 10 records/ page, the rendering will be fast.
In case of Scrollable table, since I am rendering the whole 5000 records, the rendering will be slow.
I have a project to make ahead and it may involve a huge records of data and column sorting is a mandatory feature. I want to validate whether my understanding of rendering speed for this use case is right or not?
What kind of optimizations are available in either cases for me to know?
Follow up:-
Do I actually need react-window or react-virtualized if I am anyway going for Pagination of table?

You are correct in thinking that Paginated table will be faster with rendering than an enitre table rendered with 5000rows. Also an table with 5000rows is likely to cause your browser to slow down due to a large set of elements in the UI
However there is very little performance difference if you use concepts like virtualization or windowing wherin you render only that amount of data as is coming in a view. This is much faster and optimized.
Now if you come to UX point of view. A user is likely to find a paginated table with column sorting much efficient as compared to a scrollable table.
There are three main reasons because of which I would go with a paginated table with sorting on columns
Its easier for users to jump pages when they want to visit an old data instead of scrolling all the way down to it. This is one of the most strong reason for me
When you use Pagination and the user decides to change the sorting order, it might get trickier to maintain scroll if you decide too. However pagination goes along smoothly with it. Either you stay on the same page or you move the first one. In either case it easy to implement
If your data grows, keeping all the data on client side may become an overhead. In such cases its better to depend on a API to get the data. Now virtualization with fetching data on the go can sometimes become tricky and need lot of special attention and details on prefetching
In short its better to go with Pagination both because of UX and Implementation reason for a large table

I think the optimization here here is not a problem, both of the ways could be done with equal performance.
You mentioned react-virtualized - it's common to use it as a solution for scrollable tables with good performance, it gives you ability to render only these fields that are actually required.

Huge Data in Listbox(Table)

I want to show a listbox(Table) with nearly 20 Million rows.
How can I do so, with lower memory usage and not letting my server die(stop responding) while doing so.
Even you have any Theoretical idea please do share(I will try to implement).
Need solution very urgently.
I know I cannot load all the rows at once. I need to ask new rows from server every time I scroll. I have tried it but my scroll is not smooth enough.
Thanks & Regards,
Aman

Why not just retrieve the first 100 entries and then once the client scrolls to the bottom you append another 100 entries and so on.

Maybe you could wait for ZK's new feature.
Reference
http://books.zkoss.org/wiki/Small_Talks/2012/March/Handling_a_Trillion_Data_Using_ZK

You could use http://books.zkoss.org/wiki/ZK Developer's Reference/MVC/View/Renderer/Listbox Renderer.
public void render(Listitem listitem, Object data, int index)
To start, you can implement render in way so that you get element to render from datasource at hand by index from render method. You can use standard cache (if Hibernate is in place) or custom-written one if otherwise (look also at EhCache).
#Erik solution is really fast to implement. To add you could make a button, so that user would be aware that loading more records would cost some time and would think if one really needs to load more. Scrolling can make you Listbox just hang up for a moment.
And always make an upper constraint on maximal number of records you will show at one time - don't pollute your server's memory.

Make paging for the table values and retrieve specific number of records on demand.
Use can use dataTable pluging to make pagination for data records.
Notice that you can retrieve data in synchronous and asynchronous way using this library

Reordering the results of several asynchronous requests

I need some conceptual help:
I am trying to display a page that contains a single table with a lot of data (moderately big number of rows, very big number of columns), and I want that page to be as fast and smooth as possible from the user's point of view. What I am doing is the following:
Retrieve a list containing the database primary keys of the elements to be displayed in the table.
Iterate through the list, asynchronously request each element given its primary key, and, every time element is retrieved, add it to the table.
Each of these retrieval operations is implemented as a Web service call.
Now my questions are the following:
How can I reorder the elements if they arrive in a different order than they were requested? (It is absolutely essential for me that these elements be inserted in the table in the same positions as their respective primary keys were in the original list.)
Can this strategy be made compatible with any of the main JavaScript grid controls available out there? (Without me having to modify or understand how these controls internally work, of course.)

I think you can look into the jQuery DataTables plugin. It is quite a powerful tool to display data in a tabular format.

Develop Reference

JavaScript is the programming language of the Web.