Aggregating tabular data in Javascript

Aggregating tabular data in Javascript - javascript

I have a lot of data which is date based in nature but with irregular reporting intervals. What I was hoping to do is to have my PHP backend send the data to the frontend JS/jQuery to display but in order to report effectively I need to be able to report by "week", "month", etc. I could try and do the SQL on the backend and structure different data series for each interval but that would be CPU intensive on the backend and I'd like the frontend to be a bit dynamic in being able to move between these intervals (aka, "show me monthly, no wait, show me weekly", etc.)
What I'm looking for -- I think -- is a JS/jQuery library that will help me take tabular data and group/aggregate it based on date based conditions. If need be I could try adding a column to the tabular structure which specifies the week number (and thereby simplifies the data math on frontend). In any case, I'm flexible at the moment and just hoping to hear of some good resources or approaches that have been tried before to solve this kind of problem.
Note: I am using jqGrid for tabular views on the frontend and Highcharts for graphical presentation. This is potentially not important but it might also open some creative alternatives.

Related

How to handle an extremely big table in a search?

I'm looking for suggestions on how to go about handling the following use case scenario with python django framework, i'm also open to using javascript libraries/ajax.
I'm working with pre-existing table/model called revenue_code with over 600 million rows of data.
The user will need to search three fields within one search (code, description, room) and be able to select multiple search results similar to kendo controls multi select. I first started off by combining the codes in django-filters as shown below, but my application became unresponsive, after waiting 10-15 minutes i was able to view the search results but couldn't select anything.
https://simpleisbetterthancomplex.com/tutorial/2016/11/28/how-to-filter-querysets-dynamically.html
I've also tried to use kendo controls, select2, and chosen because i need the user to be able to select as many rev codes as they need upward to 10-20, but all gave the same unresponsive page when it attempted to load the data into the control/multi-select.
Essentially what I'm looking for is something like this below, which allows the user to select multiple selections and will handle a massive amount of data without becoming unresponsive? Ideally i'd like to be able to query my search without displaying all the data.
https://petercuret.com/add-ajax-to-django-without-writing-javascript/
Is Django framework meant to handle this type of volume. Would it be better to export this data into a file and read the file? I'm not looking for code, just some pointers on how to handle this use case.

What the basic mechanism of "searching 600 millions"? Basically how database do that is to build an index, before search-time, and sufficiently general enough for different types of query, and then at search time you just search on the index - which is much smaller (to put into memory) and faster. But no matter what, "searching" by its nature, have no "pagination" concept - and if 600 millions record cannot go into memory at the same time, then multiple swapping out and in of parts of the 600 millions records is needed - the more parts then the slower the operation. These are hidden behind the algorithms in databases like MySQL etc.
There are very compact representation like bitmap index which can allow you to search on data like male/female very fast, or any data where you can use one bit per piece of information.
So whether Django or not, does not really matters. What matters is the tuning of database, the design of tables to facilitate the queries (types of indices), and the total amount of memory at server end to keep the data in memory.
Check this out:
https://dba.stackexchange.com/questions/20335/can-mysql-reasonably-perform-queries-on-billions-of-rows
https://serverfault.com/questions/168247/mysql-working-with-192-trillion-records-yes-192-trillion
How many rows are 'too many' for a MySQL table?

You can't load all the data into your page at once. 600 million records is too many.
Since you mentioned select2, have a look at their example with pagination.
The trick is to limit your SQL results to maybe 100 or so at a time. When the user scrolls to the bottom of the list, it can automatically load in more.
Send the search query to the server, and do the filtering in SQL (or NoSQL or whatever you use). Database engines are built for that. Don't try filtering/sorting in JS with that many records.

mongodb: How to sample large data-set in a controlled way

I am trying to generate the dataset for latency vs time dataset for graphing.
Dataset has thousands to couple million entries that looks like this:
[{"time_stamp":,"latency_axis":},...}
Querying and generating graphs has poor performance with full dataset for obvious reasons. An analogy is like loading large JPEG, first sample a rough fast image, then render the full image.
How can this be achieved from mongodb query efficiently while providing a rough but accurate representation of dataset with respect to time. My ideas is to organize documents into time buckets each covering a constant range of time, return the 1st, 2nd, 3rd results iteratively until all results finish loading. However, there is a no trivial or intuitive way to implement this prototype efficiently.
Can anyone provide guide/solution/suggestion regarding this situation?
Thanks
Notes for zamnuts:
A quick sample with very small dataset of what we are trying to achieve is: http://baker221.eecg.utoronto.ca:3001/pages/latencyOverTimeGraph.html
We like to dynamically focus on specific regions. Our goal is to get good representation but avoid loading all of the dataset, at the same time have reasonable performance.
DB: mongodb Server: node.js Client: angular + nvd3 directive (proof of concept, will likely switch to d3.js for more control).

Saving and updating the state of an AJAX based graph in Rails

I have a Google Chart's ColumnChart in a Rails project. This is generated and populated in JavaScript, by calling a Rails controller action which renders JSON.
The chart displays a month's worth of information for a customer.
Above the chart I have next and previous arrows which allow a customer to change the month displayed on the chart. These don't have any functionality as it stands.
My question is, what is the best way to save the state of the chart, in terms of it's current month for a customer viewing the chart.
Here is how the I was thinking of doing the workflow:
One of the arrows is selected.
This event is captured in JavaScript.
Another request to the Rails action rendering JSON is performed with an additional GET parameter
passed, based on an data attribute of the arrow button (Either + or - ).
The chart is re-rendered using the new JSON response.
Would the logic around incrementing or decrementing the graphs current date be performed on the server side? With the chart's date being stored in a session array defaulting to the current date on first load?
On the other hand would it make sense to save the chart state on the client side within the JavaScript code or in cookie, then manipulate the date before it's sent to the Rails controller?
I've been developing with Rails for about 6 months and feel comfortable with it, but have only just recently started developing with JavaScript, using AJAX. My experience tying JS code together with Rails is some what limited at this point, so looking for some advice/best practices about how to approach this.
Any advice is much appreciated.

I'm going to go through a couple of options, some good, some bad.
First, what you definitely don't want to do is maintain any notion of what month you are in in cookies or any other form of persistent server-side storage. Certainly sometimes server state is necessary, but it shouldn't be used when their are easy alternatives. Part of REST (which Rails is largely built around) is trying to represent data in pure attributes rather than letting it's state be spread around like that.
From here, most solutions are probably acceptable, and opinion plays a greater role. One thing you could do is calculate a month from the +/- sign using the current month and send that to the server, which will return the information for the month requested.
I'm not a huge fan of this though, as you have to write javascript that's capable of creating valid date ranges, and most of this functionality will probably be on the server already. Just passing a +/- and the current month to the server will work as well, you'll just have to do a bit of additional routing and logic to resolve the sign on the server to a different month.
While either of these would work, my preferred solution would instead have the initial request for the month generate valid representations of the neighbouring months, and returning this to the client. Then, when you update the graph with the requested data, you also replace the forward/backward links on the graph with the ones provided by the server. This provides a nice fusion of the benefits of the prior two solutions - no additional routing on the server, and no substantive addition to the client-side code. Also, you have the added benefit of being able to grey out transitions to months where no data was collected from the client (i.e. before they were a customer and the future). Without this, you'd have to create separate logic to handle client requests for information that doesn't exist, which is extra work for you and more confusion for the customer.

Handling large grid datasets in JavaScript

What are some of the better solutions to handling large datasets (100K) on the client with JavaScript. In particular, if you have multi-column sort and search capabilities, how do you handle fetching (and pre-fetching) the data, client side model binding (for display), and caching the data.
I would imagine a good solution would be doing some thoughtful work in the background. For instance, initially, if the table was displaying N items, it might fetch 2N items, return the data for the user, and then go fetch the next 2N items in the background (even if the user hasn't requested this). As the user made search/sort changes, it would throw out (or maybe even cache the initial base case), and do similar functionality.
Can you share the best solutions you have seen?
Thanks

Use a jQuery table plugin like DataTables: http://datatables.net/
It supports server-side processing for sorting, filtering, and paging. And it includes pipelining support to prefetch the next x pages of records: http://www.datatables.net/examples/server_side/pipeline.html
Actually the DataTables plugin works 4 different ways:
1. With an HTML table, so you could send down a bunch of HTML and then have all the sorting, filtering, and paging work client-side.
2. With a JavaScript array, so you could send down a 2D array and let it create the table from there.
3. Ajax source - which is not really applicable to you.
4. Server-side, where you send data in JSON format to an empty table and let DataTables take it from there.

SlickGrid does exactly what you're looking for. (Demo)
Using the AJAX data store, SlickGrid can handle millions of rows without flinching.

Since you tagged this with Ext JS, I'll point you to Ext.ux.LiveGrid if you haven't already seen it. The source is available, so you might have a look and see how they've addressed this issue. This is a popular and widely-used extension in the Ext JS world.
With that said, I personally think (virtually) loading that much data is useless as a user experience. Manually pulling a scrollbar around (jumping hundreds of records per pixel) is a far inferior experience to simply typing what you want. I'd much prefer some robust filtering/searching instead of presenting that much data to the user.
What if you went to Google and instead of a search box, it just loaded the entire internet into one long virtual list that you had to scroll through to find your site... :)

It depends on how the data will be used.
For a large dataset, where the browser's Find function was adequate, just returning a straight HTML table was effective. It takes a while to load, but the display is responsive on older, slower clients, and you never have to worry about it breaking.
When the client did the sorting and search, and you're not showing the entire table at once, I had the server send tab-delimited tables through XMLHTTPRequest, parsed them in the browser with list = String.split('\n'), and updated the display with repeated calls to $('node').innerHTML = 'blah'. The JS engine can store long strings pretty efficiently. That ran a lot faster on the client than showing, hiding, and rearranging DOM nodes. Creating and destroying new DOM nodes on the fly turned out to be really slow. Splitting each line into fields on-demand seems to work; I haven't experimented with that degree of freedom.
I've never tried the obvious pre-fetch & background trick, because these other methods worked well enough.

Check out this comprehensive list of data grids and
spreadsheets.
For filtering/sorting/pagination purposes you may be interested in great Handsontable, or DataTables as a free alternative.
If you need simply display huge list without any additional features Clusterize.js should be sufficient.

Dynamic filtering, am I doing it wrong?

So I have an umbraco site with a number of products in it that is content managed, I need to search/filter this dataset on the front end based on 5 criteria.
I'd estimate I will have 300 products. I need to filter this data very fast and hide show options that are no longer relevant based on the previous selections.
I'm currently building a webservice and jquery implementation using AJAX.
Is the best way to do this to load it into a javascript data structure and operate on it there or will AJAX calls be fast enough? Obviously this will mean duplicating the functionality on the server side for non-javascript users.

If you need to filter the data "very fast" then I imagine the best way is to preload all the data then manipulate it client side. If you're waiting for an Ajax response every time the user needs to filter the data then it's not going to be as fast as filtering it on the client (assuming they haven't got an ancient computer running IE6).
It would depend on the complexity of your filtering. If all your doing is showing results where, for example, the product's price is greater than $10, then that will definitely be much faster. If you're going to be doing complex searches then it's possible that it could be faster to process serverside. The other question is how much data is saved for each product - preloading a few hundred products with a lot of data may take some time.
As always, the only way you'll truly be able to answer this question is by profiling the two solutions.

Develop Reference

JavaScript is the programming language of the Web.