Processing a large (12K+ rows) array in JavaScript - javascript

The project requirements are odd for this one, but I'm looking to get some insight...
I have a CSV file with about 12,000 rows of data, approximately 12-15 columns. I'm converting that to a JSON array and loading it via JSONP (has to run client-side). It takes many seconds to do any kind of querying on the data set to returned a smaller, filtered data set. I'm currently using JLINQ to do the filtering, but I'm essentially just looping through the array and returning a smaller set based on conditions.
Would webdb or indexeddb allow me to do this filtering significantly faster? Any tutorials/articles out there that you know of that tackles this particular type of issue?

http://square.github.com/crossfilter/ (no longer maintained, see https://github.com/crossfilter/crossfilter for a newer fork.)
Crossfilter is a JavaScript library for exploring large multivariate
datasets in the browser. Crossfilter supports extremely fast (<30ms)
interaction with coordinated views, even with datasets containing a
million or more records...

This reminds me of an article John Resig wrote about dictionary lookups (a real dictionary, not a programming construct).
http://ejohn.org/blog/dictionary-lookups-in-javascript/
He starts with server side implementations, and then works on a client side solution. It should give you some ideas for ways to improve what you are doing right now:
Caching
Local Storage
Memory Considerations

If you require loading an entire data object into memory before you apply some transform on it, I would leave IndexedDB and WebSQL out of the mix as they typically both add to complexity and reduce the performance of apps.
For this type of filtering, a library like Crossfilter will go a long way.
Where IndexedDB and WebSQL can come into play in terms of filtering is when you don't need to load, or don't want to load, an entire dataset into memory. These databases are best utilized for their ability to index rows (WebSQL) and attributes (IndexedDB).
With in browser databases, you can stream data into a database one record at a time and then cursor through it, one record at a time. The benefit here for filtering is that this you means can leave your data on "disk" (a .leveldb in Chrome and .sqlite database for FF) and filter out unnecessary records either as a pre-filter step or filter in itself.

Related

Storing in MySQL vs JavaScript Object

I have a set of data associating zipcodes to GPS coordinates (namely latitude and longitude). The very nature of the data makes it immutable, so it has no need to be updated.
What are the pro and cons of storing them in a SQL database vs directly as a JavaScript hashmap? The table resides on the server, it's Node.js, so this is not a server vs browser question.
When retrieving data, one is sync, the other async, but there is less than 10k elements, so I'm not sure whether storing these in MySQL and querying them justifies the overhead.
As there is no complex querying need, are there some points to consider that would justify having the dataset in a database?
* querying speed and CPU used for retrieving a pair,
* RAM used for a big dataset that would need to fit into working memory.
I guess that for a way bigger dataset, (like 100k, 1M or more), it would be too costly in memory and a better fit for the database.
Also, JavaScript obejects use hash tables internally, so we can infer they perform well even with non trivial datasets.
Still, would a database be more efficient at retrieving a value from an indexed key than a simple hashmap?
Anything else I'm not thinking about?
You're basically asking a scalability question... "At what point do I swap from storing things in a program to storing things in a databse?"
Concurrency, persistence, maintainability, security, etc.... are all factors.
If the data is open knowledge, only used by one instance of one program, and will never change, then just hard code it or store it in a flat file.
When you have many applications with different permissions calling a set of data and making changes, a database really shines.
Most basically, an SQL database will [probably ...] be "server side," while your JavaScript hash-table will be "client side." Does the data need to be persisted from one request to the next, and between separate invocations of the JavaScript program? If so, it must be stored ... somewhere.
The decision of whether to use "a hash table" is also up to you: hash tables are great when you are looking for explicit keys. But they're not the only data-structure available to you in JavaScript.
I'd say: carefully work out all the particulars of your exact situation, and use these to inform your decision. "An online web forum like this one" really can't step into your shoes on this. "You're the engineer ..."

How does redux and immutable data structures handle large data sets?

I am building an app where I constantly have to do queries that return unique data sets that can range from 5,000 - 50,000 csv elements. It seems rather inefficient to keep all these queries in memory. The data sets are used for data visualization. Does anyone have suggestions in how to approach this? Or should I just ditch redux / immutable.js?
I have tested immutable with a large JSON file (22mb) and you're correct it was very inefficient. The particular pain point was internet explorer; reducing and filtering that set of data took almost 30s. I don't consider this a fault of Immutable.js however, it's goal isn't to process 250,000 items of JSON in one nuclear data explosion (which is what I was testing in this case, to find the limits of the browsers computational limits with regards to filtering in particular). But it's important to note it's not a library that's designed primarily for speed.
Native objects however brought that time down to at most 6s on IE12 (ironic that 11 was faster, but their you go). But the avg was between 1-2s.
So in short, I wouldn't use immutable for large data sets, I would use native objects and then perhaps use Immutable for a subset of that large dataset if you have a particular design reason for this.

Best practise of using localstorage to store a large amount of objects

Currently I'm experimenting with localStorage to store a large amount of objects of same type, and I am getting a bit confused.
One way of thinking is to store all the object in an array. But then for each read/write of a single object I need to deserialise/serialise the whole array.
The other way is to directly store each object with its key in the localStorage. This will make accessing each object much easier but I'm worried of the amount of objects that will be stored (tens of thousands). Also, getting all the objects will require iterating the whole localStorage!
I'm wondering which way will be better in your experience? Also, would it be worthwhile to try on more sophisticated client side database like PouchDB?
If you want something simple for storing a large amount of key/values, and you don't want to have to worry about the types, then I recommend LocalForage. You can store strings, numbers, arrays, objects, Blobs, whatever you want. It uses IndexedDB and WebSQL where available, so the storage limits are much higher than LocalStorage.
PouchDB works too, but the API is more complex, and it's better-suited for when you want to sync data with CouchDB on the server.
If you do not want to have a lot of keys, you can:
concat row JSONs with \n and store them as a single key
build and update an index(es) stored under separate keys, each linking some key with a particular row number.
In this case parsing rows is just .split('\n') that is ~2 orders of magnitude faster, then JSON.parse.
Please, notice, that you possibly need special effort to syncronize simultaneously opened tabs. It can be a challenge in complex cases.
localStorage has both good and bad parts.
Good parts:
syncronous;
extremely fast, both read and write are just memcpy – it‘s 100+Mb/s throughput even on weak devices (for example JSON.stringify is in general 5-20 times slower than localStorage.setItem);
thoroughly tested and reliable.
Bad news:
no transactions, so you need an engineering effort to sync tabs;
think you have not more than 2Mb (cause there exist systems with this limit);
2Mb of storage actually mean 1M chars you can save.
These points show borders of localStorage applicability as a DB. LS is good for tasks, where you need syncronicity and speed, and where you can trim you DB to fit into quota.
So localStorage is good for caches and logs. Not more.
I hadn't personally used localStorage to manage so many elements.
However, the pattern I usually use to manage data is to load the complete info database into a javascript object, manage it on memory during the proccess and saving it again to localStorage when the proccess is finished.
Of course, this pattern may not be a good approach to your needings, depending on your project specifications.
If you need to save data constantly, data access could become a problem, and thus probably using some type of small database access is a better option.
If your data volume is exceptionally high it also could be a problem to manage it on memory, however, depending on data model, you'd be able to build it to efficient structures that would allow you to load and save data just when it's needed.

In node.js, should a common set of data be in the database or a class?

In my database I have a table with data of cities. It includes the city name, translation of the name (it's a bi-lingual website), and latitude/longitude. This data will not change and every user will need to load this data. There are about 300 rows.
I'm trying to keep the pressure put on the server as low as possible (at least to a reasonable extent), but I'd also prefer to keep these in the database. Would it be best to have this data inside a class that is loaded in my main app.js file? It should be kept in memory and global to all users, correct? Or would it be better on the server to keep it in the database and select the data whenever a user needs it? The sign in screen has the listing of cities, so it would be loaded often.
I've just seen that unlike PHP, many of the Node.js servers don't have tons of memory, even the ones that aren't exactly cheap, so I'm worried about putting unnecessary things into memory.
I decided to give this a try. Using an example data set consisting of 300 rows (each containing 24 string characters and two doubles and property names), a small node.js script indicated an additional memory usage of 80 to 100 KB.
You should ask yourself:
How often will the data be used? How much of the data does a user need?
If the whole dataset will be used on a regular basis (let's say multiple times a second), you should consider keeping the data in memory. If, however, your users will need a part of the data and only once from time to time, you might consider loading the data from a database.
Can I guarantee efficient loading from the database?
An important fact is that loading parts of the data from a database might even require more memory, because the V8 garbage collector might delay the collection of the loaded data, so the same data (or multiple parts of the data) might be in memory at the same time. There is also a guaranteed overhead due to database / connection data and so on.
Is my approach sustainable?
Finally, consider the possibility of data growth. If you expect the dataset to grow by a non-trivial amount, think about the above points again and decide whether a growth is likely enough to justify database usage.

Handling large grid datasets in JavaScript

What are some of the better solutions to handling large datasets (100K) on the client with JavaScript. In particular, if you have multi-column sort and search capabilities, how do you handle fetching (and pre-fetching) the data, client side model binding (for display), and caching the data.
I would imagine a good solution would be doing some thoughtful work in the background. For instance, initially, if the table was displaying N items, it might fetch 2N items, return the data for the user, and then go fetch the next 2N items in the background (even if the user hasn't requested this). As the user made search/sort changes, it would throw out (or maybe even cache the initial base case), and do similar functionality.
Can you share the best solutions you have seen?
Thanks
Use a jQuery table plugin like DataTables: http://datatables.net/
It supports server-side processing for sorting, filtering, and paging. And it includes pipelining support to prefetch the next x pages of records: http://www.datatables.net/examples/server_side/pipeline.html
Actually the DataTables plugin works 4 different ways:
1. With an HTML table, so you could send down a bunch of HTML and then have all the sorting, filtering, and paging work client-side.
2. With a JavaScript array, so you could send down a 2D array and let it create the table from there.
3. Ajax source - which is not really applicable to you.
4. Server-side, where you send data in JSON format to an empty table and let DataTables take it from there.
SlickGrid does exactly what you're looking for. (Demo)
Using the AJAX data store, SlickGrid can handle millions of rows without flinching.
Since you tagged this with Ext JS, I'll point you to Ext.ux.LiveGrid if you haven't already seen it. The source is available, so you might have a look and see how they've addressed this issue. This is a popular and widely-used extension in the Ext JS world.
With that said, I personally think (virtually) loading that much data is useless as a user experience. Manually pulling a scrollbar around (jumping hundreds of records per pixel) is a far inferior experience to simply typing what you want. I'd much prefer some robust filtering/searching instead of presenting that much data to the user.
What if you went to Google and instead of a search box, it just loaded the entire internet into one long virtual list that you had to scroll through to find your site... :)
It depends on how the data will be used.
For a large dataset, where the browser's Find function was adequate, just returning a straight HTML table was effective. It takes a while to load, but the display is responsive on older, slower clients, and you never have to worry about it breaking.
When the client did the sorting and search, and you're not showing the entire table at once, I had the server send tab-delimited tables through XMLHTTPRequest, parsed them in the browser with list = String.split('\n'), and updated the display with repeated calls to $('node').innerHTML = 'blah'. The JS engine can store long strings pretty efficiently. That ran a lot faster on the client than showing, hiding, and rearranging DOM nodes. Creating and destroying new DOM nodes on the fly turned out to be really slow. Splitting each line into fields on-demand seems to work; I haven't experimented with that degree of freedom.
I've never tried the obvious pre-fetch & background trick, because these other methods worked well enough.
Check out this comprehensive list of data grids and
spreadsheets.
For filtering/sorting/pagination purposes you may be interested in great Handsontable, or DataTables as a free alternative.
If you need simply display huge list without any additional features Clusterize.js should be sufficient.

Categories

Resources