DC and crossfilter with large datasets - javascript

I have been working on dc and crossfilter js and I currently have a large dataset with 550,000 rows and size 60mb csv and am facing a lot of issues with it like browser crashes etc
So , I'm trying to understand how dc and crossfilter deals with large datasets.
http://dc-js.github.io/dc.js/
The example on their main site runs very smoothly and after seeing timelines->memory (in console) it goes to a max of 34 mb and slowly reduces with time
My project is taking up memory in the range of 300-500mb per dropdown selection, when it loads a json file and renders the entire visualization
So, 2 questions
What is the backend for the dc site example? Is it possible to find out the exact backend file?
How can I reduce the data overload on my RAM from my application, which is running very slowly and eventually crashing?

Hi you can try running loading the data, and filtering it on the server. I faced a similar problem when the size of my dataset was being too big for the browser to handle.
I posted a question a few weeks back as to implementing the same. Using dc.js on the clientside with crossfilter on the server
Here is an overview of going about it.
On the client side, you'd want to create fake dimensions and fake groups that have basic functionality that dc.js expects(https://github.com/dc-js/dc.js/wiki/FAQ#filter-the-data-before-its-charted). You create your dc.js charts on the client side and plug in the fake dimensions and groups wherever required.
Now on the server side you have crossfilter running(https://www.npmjs.org/package/crossfilter). You create your actual dimensions and groups here.
The fakedimensions have a .filter() function that basically sends an ajax request to the server to perform the actual filtering. The filtering information could be encoded in the form of a query string. You'd also need a .all() function on your fake group to return the results of the filtering.

Related

vega visualization tool data limitations

How much data (how many records) can vega handle without any noticeable delay in response to signals created for interactivity.
I couldn't use the URL because of cross origin problems, but created JSON for 60000 records and pasted it in my vega specifications.
I have 4 signals in total - 3 for single value filtering on mouseclick and 1 for range filtering using click and drag select. The dashboard created responds to each signal trigger after nearly 30 seconds.
So I wanted to know the maximum amount of data that can be used in vega and also any alternatives like interfacing vega with something to speed up the process? Any help will be appreciated.

Datatables - Local Server Side Processing

One of the queries I currently run to populate a Datatable is pretty resource intensive, so I am trying to reduce the amount of load on my database by reducing the amount of ajax calls for pagination, sorting, searching etc...
I currently do a single ajax call to get a json array of the entire dataset(received by the browser in a couple of seconds), plug that in to Datatables and have it render the whole set.
A problem occurs when there are tens of thousands of records, the browser hangs for close to 20 seconds before rendering the data.
Is there some sort of middle ground, where datatables doesn't load the whole set from the json array, and instead uses the json array as a sort of local server source for the data. So in practice it would retrieve the first 10 rows from the json array, render them, and when the next page is clicked, or a search is initiated it goes back to the json array for the data instead of the server?
This sounds like a pretty simple solution, but I have not managed to find a function for this looking through the documentation. Is there a way to accomplish this with Datatables, natively or not?

HTML/Javascript + CSVs - Using filters vs. Loading many CSVs

I'm thinking about what is the best way to feed a chart with filtered data. The thing is that I have year and month filters among others and I want the chart to be quick on showing up and switching data on filter change.
Is it better to prepare already filtered data into individual CSVs and load them from server as needed or should I use Javascript to filter data on client side using a big CSV?
Individual files:
- Load quickly
- No client-side computations
Big CSV:
- Avoids network connection (loaded once)
If I choose individual files I would have A LOT of them as filters create many combinations. I don't know if there is any drawback in that case. I think the individual files are the option with the highest performance.

Sorting , lazing loading considerations

I am creating a grid that provides sorting functionality. It uses D3 js . Currently the data set is around 2000 records but in the future this could be a large number i.e in 10,000 to 1 million.
Should client js scripts be used for sorting or should it done on server assuming we have a large recordset.
Also I would at what point should I be considering lazy loading of data for the table.
Thanks

Processing a large (12K+ rows) array in JavaScript

The project requirements are odd for this one, but I'm looking to get some insight...
I have a CSV file with about 12,000 rows of data, approximately 12-15 columns. I'm converting that to a JSON array and loading it via JSONP (has to run client-side). It takes many seconds to do any kind of querying on the data set to returned a smaller, filtered data set. I'm currently using JLINQ to do the filtering, but I'm essentially just looping through the array and returning a smaller set based on conditions.
Would webdb or indexeddb allow me to do this filtering significantly faster? Any tutorials/articles out there that you know of that tackles this particular type of issue?
http://square.github.com/crossfilter/ (no longer maintained, see https://github.com/crossfilter/crossfilter for a newer fork.)
Crossfilter is a JavaScript library for exploring large multivariate
datasets in the browser. Crossfilter supports extremely fast (<30ms)
interaction with coordinated views, even with datasets containing a
million or more records...
This reminds me of an article John Resig wrote about dictionary lookups (a real dictionary, not a programming construct).
http://ejohn.org/blog/dictionary-lookups-in-javascript/
He starts with server side implementations, and then works on a client side solution. It should give you some ideas for ways to improve what you are doing right now:
Caching
Local Storage
Memory Considerations
If you require loading an entire data object into memory before you apply some transform on it, I would leave IndexedDB and WebSQL out of the mix as they typically both add to complexity and reduce the performance of apps.
For this type of filtering, a library like Crossfilter will go a long way.
Where IndexedDB and WebSQL can come into play in terms of filtering is when you don't need to load, or don't want to load, an entire dataset into memory. These databases are best utilized for their ability to index rows (WebSQL) and attributes (IndexedDB).
With in browser databases, you can stream data into a database one record at a time and then cursor through it, one record at a time. The benefit here for filtering is that this you means can leave your data on "disk" (a .leveldb in Chrome and .sqlite database for FF) and filter out unnecessary records either as a pre-filter step or filter in itself.

Categories

Resources