JavaScript data grid for millions of rows [closed]

JavaScript data grid for millions of rows [closed] - javascript

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I need to present a large number of rows of data (ie. millions of rows) to the user in a grid using JavaScript.
The user shouldn't see pages or view only finite amounts of data at a time.
Rather, it should appear that all of the data are available.
Instead of downloading the data all at once, small chunks are downloaded as the user comes to them (ie. by scrolling through the grid).
The rows will not be edited through this front end, so read-only grids are acceptable.
What data grids, written in JavaScript, exist for this kind of seamless paging?

(Disclaimer: I am the author of SlickGrid)
UPDATE
This has now been implemented in SlickGrid.
Please see http://github.com/mleibman/SlickGrid/issues#issue/22 for an ongoing discussion on making SlickGrid work with larger numbers of rows.
The problem is that SlickGrid does not virtualize the scrollbar itself - the scrollable area's height is set to the total height of all the rows. The rows are still being added and removed as the user is scrolling, but the scrolling itself is done by the browser. That allows it to be very fast yet smooth (onscroll events are notoriously slow). The caveat is that there are bugs/limits in the browsers' CSS engines that limit the potential height of an element. For IE, that happens to be 0x123456 or 1193046 pixels. For other browsers it is higher.
There is an experimental workaround in the "largenum-fix" branch that raises that limit significantly by populating the scrollable area with "pages" set to 1M pixels height and then using relative positioning within those pages. Since the height limit in the CSS engine seems to be different and significantly lower than in the actual layout engine, this gives us a much higher upper limit.
I am still looking for a way to get to unlimited number of rows without giving up the performance edge that SlickGrid currently holds over other implementations.
Rudiger, can you elaborate on how you solved this?

https://github.com/mleibman/SlickGrid/wiki
"SlickGrid utilizes virtual rendering to enable you to easily work with hundreds of thousands of items without any drop in performance. In fact, there is no difference in performance between working with a grid with 10 rows versus a 100’000 rows."
Some highlights:
Adaptive virtual scrolling (handle hundreds of thousands of rows)
Extremely fast rendering speed
Background post-rendering for richer cells
Configurable & customizable
Full keyboard navigation
Column resize/reorder/show/hide
Column autosizing & force-fit
Pluggable cell formatters & editors
Support for editing and creating new rows."
by mleibman
It's free (MIT license).
It uses jQuery.

The best Grids in my opinion are below:
Flexigrid: http://flexigrid.info/
jQuery Grid: http://www.trirand.com/blog/
jqGridView: http://plugins.jquery.com/project/jqGridView
jqxGrid: https://www.jqwidgets.com/
Ingrid: http://reconstrukt.com/ingrid/
SlickGrid http://github.com/mleibman/SlickGrid
DataTables http://www.datatables.net/index
ShieldUI http://demos.shieldui.com/web/grid-virtualization/performance-1mil-rows
Smart.Grid https://www.htmlelements.com/demos/grid/overview/
My best 3 options are jqGrid, jqxGrid and DataTables. They can work with thousands of rows and support virtualization.

I don't mean to start a flame war, but assuming your researchers are human, you don't know them as well as you think. Just because they have petabytes of data doesn't make them capable of viewing even millions of records in any meaningful way. They might say they want to see millions of records, but that's just silly. Have your smartest researchers do some basic math: Assume they spend 1 second viewing each record. At that rate, it will take 1000000 seconds, which works out to more than six weeks (of 40 hour work-weeks with no breaks for food or lavatory).
Do they (or you) seriously think one person (the one looking at the grid) can muster that kind of concentration? Are they really getting much done in that 1 second, or are they (more likely) filtering out the stuff the don't want? I suspect that after viewing a "reasonably-sized" subset, they could describe a filter to you that would automatically filter out those records.
As paxdiablo and Sleeper Smith and Lasse V Karlsen also implied, you (and they) have not thought through the requirements. On the up side, now that you've found SlickGrid, I'm sure the need for those filters became immediately obvious.

I can say with pretty good certainty that you seriously do not need to show millions of rows of data to the user.
There is no user in the world that will be able to comprehend or manage that data set so even if you technically manage to pull it off, you won't solve any known problem for that user.
Instead I would focus on why the user wants to see the data. The user does not want to see the data just to see the data, there is usually a question being asked. If you focus on answering those questions instead, then you would be much closer to something that solves an actual problem.

I recommend the Ext JS Grid with the Buffered View feature.
http://www.extjs.com/deploy/dev/examples/grid/buffer.html

(Disclaimer: I am the author of w2ui)
I have recently written an article on how to implement JavaScript grid with 1 million records (http://w2ui.com/web/blog/7/JavaScript-Grid-with-One-Million-Records). I discovered that ultimately there are 3 restrictions that prevent from taking it highter:
Height of the div has a limit (can be overcome by virtual scrolling)
Operations such as sort and search start being slow after 1 million records or so
RAM is limited because data is stored in JavaScript array
I have tested the grid with 1 million records (except IE) and it performs well. See article for demos and examples.

dojox.grid.DataGrid offers a JS abstraction for data so you can hook it up to various backends with provided dojo.data stores or write your own. You'll obviously need one that supports random access for this many records. DataGrid also provides full accessibility.
Edit so here's a link to Matthew Russell's article that should provide the example you need, viewing millions of records with dojox.grid. Note that it uses the old version of the grid, but the concepts are the same, there were just some incompatible API improvements.
Oh, and it's totally free open source.

I used jQuery Grid Plugin, it was nice.
Demos

Here are a couple of optimizations you can apply you speed up things. Just thinking out loud.
Since the number of rows can be in the millions, you will want a caching system just for the JSON data from the server. I can't imagine anybody wanting to download all X million items, but if they did, it would be a problem. This little test on Chrome for an array on 20M+ integers crashes on my machine constantly.
var data = [];
for(var i = 0; i < 20000000; i++) {
data.push(i);
}
console.log(data.length);
You could use LRU or some other caching algorithm and have an upper bound on how much data you're willing to cache.
For the table cells themselves, I think constructing/destroying DOM nodes can be expensive. Instead, you could just pre-define X number of cells, and whenever the user scrolls to a new position, inject the JSON data into these cells. The scrollbar would virtually have no direct relationship to how much space (height) is required to represent the entire dataset. You could arbitrarily set the table container's height, say 5000px, and map that to the total number of rows. For example, if the containers height is 5000px and there are a total of 10M rows, then the starting row ≈ (scroll.top/5000) * 10M where scroll.top represents the scroll distance from the top of the container. Small demo here.
To detect when to request more data, ideally an object should act as a mediator that listens to scroll events. This object keeps track of how fast the user is scrolling, and when it looks like the user is slowing down or has completely stopped, makes a data request for the corresponding rows. Retrieving data in this fashion means your data is going to be fragmented, so the cache should be designed with that in mind.
Also the browser limits on maximum outgoing connections can play an important part. A user may scroll to a certain position which will fire an AJAX request, but before that finishes the user can scroll to some other portion. If the server is not responsive enough the requests would get queued up and the application will look unresponsive. You could use a request manager through which all requests are routed, and it can cancel pending requests to make space.

I know it's an old question but still.. There is also dhtmlxGrid that can handle millions of rows. There is a demo with 50,000 rows but the number of rows that can be loaded/processed in grid is unlimited.
Disclaimer: I'm from DHTMLX team.

I suggest you read this
http://www.sitepen.com/blog/2008/11/21/effective-use-of-jsonreststore-referencing-lazy-loading-and-more/

Disclaimer: i heavily use YUI DataTable without no headache for a long time. It is powerful and stable. For your needs, you can use a ScrollingDataTable wich suports
x-scrolling
y-scrolling
xy-scrolling
A powerful Event mechanism
For what you need, i think you want is a tableScrollEvent. Its API says
Fired when a fixed scrolling DataTable has a scroll.
As each DataTable uses a DataSource, you can monitoring its data through tableScrollEvent along with render loop size in order to populate your ScrollingDataTable according to your needs.
Render loop size says
In cases where your DataTable needs to display the entirety of a very large set of data, the renderLoopSize config can help manage browser DOM rendering so that the UI thread does not get locked up on very large tables. Any value greater than 0 will cause the DOM rendering to be executed in setTimeout() chains that render the specified number of rows in each loop. The ideal value should be determined per implementation since there are no hard and fast rules, only general guidelines:
By default renderLoopSize is 0, so all rows are rendered in a single loop. A renderLoopSize > 0 adds overhead so use thoughtfully.
If your set of data is large enough (number of rows X number of Columns X formatting complexity) that users experience latency in the visual rendering and/or it causes the script to hang, consider setting a renderLoopSize.
A renderLoopSize under 50 probably isn't worth it. A renderLoopSize > 100 is probably better.
A data set is probably not considered large enough unless it has hundreds and hundreds of rows.
Having a renderLoopSize > 0 and < total rows does cause the table to be rendered in one loop (same as renderLoopSize = 0) but it also triggers functionality such as post-render row striping to be handled from a separate setTimeout thread.
For instance
// Render 100 rows per loop
var dt = new YAHOO.widget.DataTable(<WHICH_DIV_WILL_STORE_YOUR_DATATABLE>, <HOW YOUR_TABLE_IS STRUCTURED>, <WHERE_DOES_THE_DATA_COME_FROM>, {
renderLoopSize:100
});
<WHERE_DOES_THE_DATA_COME_FROM> is just a single DataSource. It can be a JSON, JSFunction, XML and even a single HTML element
Here you can see a Simple tutorial, provided by me. Be aware no other DATA_TABLE pluglin supports single and dual click at the same time. YUI DataTable allows you. And more, you can use it even with JQuery without no headache
Some examples, you can see
List item
Feel free to question about anything else you want about YUI DataTable.
regards,

I kind of fail to see the point, for jqGrid you can use the virtual scrolling functionality:
http://www.trirand.net/aspnetmvc/grid/performancevirtualscrolling
but then again, millions of rows with filtering can be done:
http://www.trirand.net/aspnetmvc/grid/performancelinq
I really fail to see the point of "as if there are no pages" though, I mean... there is no way to display 1,000,000 rows at once in the browser - this is 10MB of HTML raw, I kind of fail to see why users would not want to see the pages.
Anyway...

best approach i could think of is by loading the chunk of data in json format for every scroll or some limit before the scrolling ends. json can be easily converted to objects and hence table rows can be constructed easily unobtrusively

I would highly recommend Open rico.
It is difficult to implement in the the beginning, but once you grab it you will never look back.

I know this question is a few years old, but jqgrid now supports virtual scrolling:
http://www.trirand.com/blog/phpjqgrid/examples/paging/scrollbar/default.php
but with pagination disabled

I suggest sigma grid, sigma grid has embed paging features which could support millions of rows. And also, you may need a remote paging to do it.
see the demo
http://www.sigmawidgets.com/products/sigma_grid2/demos/example_master_details.html

Take a look at dGrid:
https://dgrid.io/
I agree that users will NEVER, EVER need to view millions of rows of data all at once, but dGrid can display them quickly (a screenful at a time).
Don't boil the ocean to make a cup of tea.

Related

Speed comparison in React: Paginated table vs Scrollable table for column sort

Suppose we have two tables, one is Paginated and other is Scrollable. Both have them allow sorting of records by clicking on any column header.
Let's suppose the table has 5000 records of 6 columns. When the user clicks on any of the column to sort, my understanding is that the whole 5000 records will get sorted and my table state will get updated.
In case of Pagination, since I am only rendering 10 records/ page, the rendering will be fast.
In case of Scrollable table, since I am rendering the whole 5000 records, the rendering will be slow.
I have a project to make ahead and it may involve a huge records of data and column sorting is a mandatory feature. I want to validate whether my understanding of rendering speed for this use case is right or not?
What kind of optimizations are available in either cases for me to know?
Follow up:-
Do I actually need react-window or react-virtualized if I am anyway going for Pagination of table?

You are correct in thinking that Paginated table will be faster with rendering than an enitre table rendered with 5000rows. Also an table with 5000rows is likely to cause your browser to slow down due to a large set of elements in the UI
However there is very little performance difference if you use concepts like virtualization or windowing wherin you render only that amount of data as is coming in a view. This is much faster and optimized.
Now if you come to UX point of view. A user is likely to find a paginated table with column sorting much efficient as compared to a scrollable table.
There are three main reasons because of which I would go with a paginated table with sorting on columns
Its easier for users to jump pages when they want to visit an old data instead of scrolling all the way down to it. This is one of the most strong reason for me
When you use Pagination and the user decides to change the sorting order, it might get trickier to maintain scroll if you decide too. However pagination goes along smoothly with it. Either you stay on the same page or you move the first one. In either case it easy to implement
If your data grows, keeping all the data on client side may become an overhead. In such cases its better to depend on a API to get the data. Now virtualization with fetching data on the go can sometimes become tricky and need lot of special attention and details on prefetching
In short its better to go with Pagination both because of UX and Implementation reason for a large table

I think the optimization here here is not a problem, both of the ways could be done with equal performance.
You mentioned react-virtualized - it's common to use it as a solution for scrollable tables with good performance, it gives you ability to render only these fields that are actually required.

virtual scrolling, drag and drop in angular 7

I searched the internet for these new features of angular 7, but didn't fully understand it.
I went through drag and drop and virtual scrolling
Could someone please shed some light on these?

now consider a case where you are to display huge chunk of data, now either you will do pagination which will include an api call per page (if data changes frequently) or if you load everything at once which will slow down or kill UI process.
Virtual Scroll is about loading huge chunk of data in DOM without hampering the performance.
it's key features are:
data is displayed as per size of viewport i.e. if you container div is 500 px it will show around 10-15 rows at a time.
Ad you scroll these rows are changed but number of elements in the DOM will remain consistent.
This is handy when you have to show huge chunk of data without implementing pagination.
Thus it improves the UI performance.
I implemented Virtual list displaying multiple column and array length was 1 million which is a huge amount of data to display at a time.
Virtual list is implemented over virtual scroll and it supports multiple columns.
check out detailed explanation and code here:
https://www.codeproject.com/Articles/5260356/Virtual-List-in-Angular
please have a look in the image:

Limitation of Angular and browsers with regards to data loaded

I would like to know what is the maximum data that angular framework can handle. Say, I am displaying a chart using angular and some charting framework like chartjs. I'd like to know up to how many data can the browser display properly, with slowness, or up to when it crashes.

Your question has no simple answer, but I will try to flatten it and give a simple answer, or at least simple things to consider...
Angular (at runtime), like many other frameworks is simply JavaScript,
So let us reduce the question to "Limitation of JavaScript and browsers with regards to data loaded",
JavaScript has no upper limit of memory or storage it can handle,
I've seen JavaScript applications that require more than 15GB of RAM,
and they performed well too.
So assume data size itself is not an issue (unless your application is poorly implemented, leaking memory or just not very efficient, of course).
The main challenge as I see it, is displaying and manipulating the information
without causing unnecessary delay or unresponsiveness.
Displaying the information - let's say you have a list (or a table) containing 1,000,000 possible gifts which you then want to display for the user to select.
Adding the list items to the document one by one is tempting, but will require the browser to repaint after each addition (causing a delay or full unresponsiveness until finished), another way is adding the elements to some DOM element (denoted by N) still being kept in memory, then adding all elements corresponding the list items to the element N (still, just an in memory operation), finally adding N to the document containing the entire list - the will be a much better solution for displaying the large amount of data.
Manipulating the information - displaying is indeed not enough. you would like to move, drag, sort and filter the data being displayed. And as mentioned before, it is a bad idea removing many elements directly from DOM. You should instead remove container from the document's DOM to memory, manipulate the data in it, and then add the container right back to the document. Angular does this kind of magic for you.
(Toggling the 'display:none\block' css attribute of many elements has a similar blocking effect as I recall).
A good practice is implementing an application/page showing only the amount of data that can be processed by a human at a single glance. The rest of it should be considered in the application data-layer, in memory, and should be loaded to display given the appropriate need or request.
To conclude, you can deal with huge amounts of data as long as you provide a mechanism that efficiently filter the displayed information.
I hope it helps...
for further reading:
Slow and fast ways of adding elements to DOM
A question emphasizes the lack of memory limit used by JS
CSS display attribute performance
A good discussion about the reasons for slow DOM
About using HTML5 correctly - old but still true
Once the DOM creation procedure is understood - it much easier to display data without affecting performance / user experience

React-like programming without React

I grew up using JQuery and have been following a programming pattern which one could say is "React-like", but not using React. I would like to know how my graphics performance is doing so well, nonetheless.
As an example, in my front-end, I have a table that displays some "state" (in React terms). However, this "state" for me is just kept in global variables. I have an update_table() function which is the central place where updates to the table happen. It takes the "state" and renders the table with it. The first thing it does is call $("#table").empty() to get a clean start and then fills in the rows with the "state" information.
I have some dynamically changing data (the "state") every 2-3 seconds on the server side which I poll using Ajax and once I get the data/"state", I just call update_table().
This is the perfect problem for solving with React, I know. However, after implementing this simple solution with JQuery, I see that it works just fine (I'm not populating a huge table here; I have a max of 20 rows and 5 columns).
I expected to see flickering because of the $("#table").empty() call followed by adding rows one-by-one inside the update_table() function. However, the browser (chrome/safari) somehow seems to be doing a very good job of updating only that elements that have actually changed (Almost as if the browser has an implementation of Virtual DOM/diffing, like React!)

I guess your question is why you can have such a good graphics performance without React.
What you see as a "good graphics performance" really is a matter of definition or, worse, opinion.
The classic Netscape processing cycle (which all modern browsers inherit) has basically four main stages. Here is the full-blown Gecko engine description.
As long as you manipulate the DOM, you're in the "DOM update" stage and no rendering is performed AT ALL. Only when your code yields, the next stage starts. Because of the DOM changes the sizes or positions of some elements may have changed, too. So this stage recomputes the layout. After this stage, the next is rendering, where the pixels are redrawn.
This means that if your code changes a very large number elements in the DOM, they are all still rendered together, and not in an incremental fashion. So, the empty() call does not render if you repopulate the table immediately after.
Now, when you see the pixels of an element like "13872", the rendering stage may render those at the exact same position with the exact same colors. You don't have any change in pixel color, and thus there is no flickering you could see.
That said, your graphics performance is excellent -- yes. But how did you measure it? You just looked at it and decided that it's perfect. Now, visually it really may be very very good. Because all you need is avoid the layout stage from sizing/positioning something differently.
But actual performance is not measured with the lazy eyes of us humans (there are many usability studies in that field, let's say that one frame at 60 Hz takes 16.6 ms, so it is enough to render in less than that). It is measured with an actual metric (updates per second or whatever). Consider that on older machines with older browsers and slower graphics cards your "excellent" performance may look shameful. How do you know it is still good on an old Toshiba tablet with 64 MB graphics memory?
And what about scaling? If you have 100x the elements you have now, are you sure it will scale well? What if some data takes more (or less) space and changes the whole layout? All of these edge conditions may not be covered by your simple approach.
A library like React takes into account those cases you may not have encountered yet, and offers a uniform pattern to approach them.
So if you are happy with your solution you don't need React. I often avoid jQuery because ES5/ES6 is already pretty good these days and I can just jot down 3-4 lines of code using document.getElementById() and such. But I realize that on larger projects or complex cases jQuery is the perfect tool.
Look at React like that: a tool that is useful when you realize you need it, and cumbersome when you think you can do without. It's all up to you :)

When you have something like this:
$("#table").empty()
.html("... new content of the table ... ");
then the following happens:
.empty() removes content and marks rendering tree / layout as invalid.
.html() adds new content and marks rendering tree / layout as invalid.
mark as invalid among other things calls InvalidateRect() (on Windows) that causes the window to receive WM_PAINT event at some point in future.
By handling WM_PAINT the browser will calculate layout and render all the result.
Therefore multiple change requests will be collapsed into single window painting operation.

I have a couple thousand javascript objects which I need to display and scroll through. What are my options?

I'm working off of designs which show a scrollable box containing a list of a user's "contacts".
Users may have up to 10,000 contacts.
For now assume that all contacts are already in memory, and I'm simply trying to draw them. If you want to comment on the question of how wise it is to load 10k items of data in a browser, please do it here.
There are two techniques I've seen for managing a huge list like this inside a scrollable box.
Just Load Them All
This seems to be how gmail approaches displaying contacts. I currently have 2k contacts in gmail. If I click "all contacts", I get a short delay, then the scrollable box at the right begins to fill with contacts. It looks like they're breaking the task into chunks, probably separating the DOM additions into smaller steps and putting those steps into timeouts in order to not freeze the entire interface while the process completes.
pros:
Simple to implement
Uses native UI elements the way they were designed to be used
Google does it, it must be ok
cons
Not totally snappy -- there is some delay involved, even on my development machine running Firefox. There will probably be quite a lot of delay for a user running a slower machine running IE6
I don't know what sort of limits there are in how large I can allow the DOM to grow, but it seems to me there must be some limit to how many nodes I can add to it. How will an older browser on an older machine react if I ask it to hold 10k nodes in the DOM?
Draw As Needed
This seems to be how Yahoo deals with displaying contact lists. The create a scrollable box, put a super-tall, empty placeholder inside it, and draw contacts only when the user scrolls to reveal them.
pros:
DOM nodes are drawn only as needed, so there's very little delay while loading, and much less risk of overloading the browser with too many DOM nodes
cons:
Trickier to implement, and more opportunity for bugs. For example, if I scroll quickly in the yahoo mail contact manager as soon as the page loads, I'm able to get contacts to load on top of one another. Of course, bugs can be worked out, but obviously this approach will introduce more bugs.
There's still the potential to add a huge number of DOM nodes. If the user scrolls slowly through the entire list, every item will get drawn, and we'll still end up with an enormous DOM
Are there other approaches in common use for displaying a huge list? Any more pros or cons with each approach to add? Any experience/problems/success using either of these approaches?

I would chunk up the DOM-writing into handle-able amounts (say, 25 or 50), then draw the chunks on demand. I wouldn't worry about removing the old DOM elements until the amount drawn gets quite large.
I would divide the contacts into chunks, and keep a sort of view buffer alive that changes which chunks are written to the DOM as the user scrolls through the list. That way the total number of dom elements never rises above a certain threshold. Could be fairly tricky to implement, however.
Using this method you can also dynamically modify the size of chunks and the size of the buffer, depending on the browser's performance (dynamic performance optimization), which could help out quite a bit.
It's definitely non-trivial to implement, however.
The bug you see in Yahoo may be due to absolutely positioned elements: if you keep your CSS simple and avoid absolutely/relatively positioning your contact entries, it shouldn't happen.

Develop Reference

JavaScript is the programming language of the Web.