I am building an app where I constantly have to do queries that return unique data sets that can range from 5,000 - 50,000 csv elements. It seems rather inefficient to keep all these queries in memory. The data sets are used for data visualization. Does anyone have suggestions in how to approach this? Or should I just ditch redux / immutable.js?
I have tested immutable with a large JSON file (22mb) and you're correct it was very inefficient. The particular pain point was internet explorer; reducing and filtering that set of data took almost 30s. I don't consider this a fault of Immutable.js however, it's goal isn't to process 250,000 items of JSON in one nuclear data explosion (which is what I was testing in this case, to find the limits of the browsers computational limits with regards to filtering in particular). But it's important to note it's not a library that's designed primarily for speed.
Native objects however brought that time down to at most 6s on IE12 (ironic that 11 was faster, but their you go). But the avg was between 1-2s.
So in short, I wouldn't use immutable for large data sets, I would use native objects and then perhaps use Immutable for a subset of that large dataset if you have a particular design reason for this.
Related
So I have an app that needs to JSON.stringify its data to put into localStorage, but as the data gets larger, this operation gets outrageously expensive.
So, I tried moving this onto a webWorker so it's off the main thread, but I'm now learning posting an object to a webWorker is even more expensive than stringifying it.
So I guess I'm asking, is there any way whatsoever to get JSON.stringify off the main thread, or at least make it less expensive?
I'm familiar with fast-json-stringify, but I don't think I can feasibly provide a complete schema every time...
You have correctly observed that passing object to web worker costs as much as serializing it. This is because web workers also need to receive serialized data, not native JS objects, because the instance objects are bound to the JS thread they were created in.
The generic solution is applicable to many programming problems: chose the right data structures when working with large datasets. When data gets larger it's better sacrifice simplicity of access for performance. Thus do any of:
Store data in indexedDB
If your large object contains lists of the same kind of entry, use indexed DB for reading and writing and you don't need to worry about serialization at all. This will require refactor of your code, but this is the correct solution for large datasets.
Store data in ArrayBuffer
If your data is mostly fixed-size values, use an ArrayBuffer. ArrayBuffer can be copied or moved to web worker pretty much instantly and if your entries are all same size, serialization can be done in parallel. For access, you may write simple wrappers classes that will translate your binary data into something more readable.
I'm a game developer and I have been trying different structures to see what would give me the best results but it seems that JavaScript is mostly unaffected by data locality, meaning that processing times are within margin of error and memory usage is mostly as expected.
Does data locality matter at all in JavaScript or am I just wasting my time trying to improve certain structures?
Is it due to the sandboxed nature of the execution environment (i.e. it would matter outside the browser)?
This article is an interesting read. It says, in summary, low level locality optimizations are a lost cause, but big data structures, big arrays of data structures more so, should be accessed in linear order.
Coming from a C background, I tend to access a grid held in a 1D array in a Y-outer/X-inner pattern anyway, but when I have done otherwise, I can tell you from experience, it starts to lag on large grids.
So, to attempt to answer your question, and in part theorize: to the degree that this holds true, the classic structure-of-arrays rather than array-of-structures mentality might very well be performant for sufficiently large arrays of large structures with limited variable access in a given case. But I would definitely test both macro-structures if I were implementing a critical feature :)
I am relatively new to ReactJS and web scene so I am turning to you, SO.
Our app is working with Firebase to store data. Recently I created a little stress test for our app which is loading about 14k entries from database and saving all of it in array. To be exact, we are downloading some data that we will use to represent different kind of charts and display some statistics about them. 14k, number of entries, is here just to test the speed of everything, we are actually not expecting that many entries per customer (at least not yet). Anyway I decided to think about future and how I should handle this kind of problems.
I did some rough calculations about loading times:
4-6s from Firebase into array
3-4s generating dates for chart and checking for matching dates (fetched data with generated dates) and calculating data
Everything, from saving results from database on is done using array.map. I have also tried using while / for, since I read some articles that .map is not very good when it comes to performance. The result was the same.
So what am I asking here is what kind of "system" should I use if I want to be prepared for "larger" amount of data in arrays? Can .map handle that much data? What are the best practices?
Currently I'm experimenting with localStorage to store a large amount of objects of same type, and I am getting a bit confused.
One way of thinking is to store all the object in an array. But then for each read/write of a single object I need to deserialise/serialise the whole array.
The other way is to directly store each object with its key in the localStorage. This will make accessing each object much easier but I'm worried of the amount of objects that will be stored (tens of thousands). Also, getting all the objects will require iterating the whole localStorage!
I'm wondering which way will be better in your experience? Also, would it be worthwhile to try on more sophisticated client side database like PouchDB?
If you want something simple for storing a large amount of key/values, and you don't want to have to worry about the types, then I recommend LocalForage. You can store strings, numbers, arrays, objects, Blobs, whatever you want. It uses IndexedDB and WebSQL where available, so the storage limits are much higher than LocalStorage.
PouchDB works too, but the API is more complex, and it's better-suited for when you want to sync data with CouchDB on the server.
If you do not want to have a lot of keys, you can:
concat row JSONs with \n and store them as a single key
build and update an index(es) stored under separate keys, each linking some key with a particular row number.
In this case parsing rows is just .split('\n') that is ~2 orders of magnitude faster, then JSON.parse.
Please, notice, that you possibly need special effort to syncronize simultaneously opened tabs. It can be a challenge in complex cases.
localStorage has both good and bad parts.
Good parts:
syncronous;
extremely fast, both read and write are just memcpy – it‘s 100+Mb/s throughput even on weak devices (for example JSON.stringify is in general 5-20 times slower than localStorage.setItem);
thoroughly tested and reliable.
Bad news:
no transactions, so you need an engineering effort to sync tabs;
think you have not more than 2Mb (cause there exist systems with this limit);
2Mb of storage actually mean 1M chars you can save.
These points show borders of localStorage applicability as a DB. LS is good for tasks, where you need syncronicity and speed, and where you can trim you DB to fit into quota.
So localStorage is good for caches and logs. Not more.
I hadn't personally used localStorage to manage so many elements.
However, the pattern I usually use to manage data is to load the complete info database into a javascript object, manage it on memory during the proccess and saving it again to localStorage when the proccess is finished.
Of course, this pattern may not be a good approach to your needings, depending on your project specifications.
If you need to save data constantly, data access could become a problem, and thus probably using some type of small database access is a better option.
If your data volume is exceptionally high it also could be a problem to manage it on memory, however, depending on data model, you'd be able to build it to efficient structures that would allow you to load and save data just when it's needed.
The project requirements are odd for this one, but I'm looking to get some insight...
I have a CSV file with about 12,000 rows of data, approximately 12-15 columns. I'm converting that to a JSON array and loading it via JSONP (has to run client-side). It takes many seconds to do any kind of querying on the data set to returned a smaller, filtered data set. I'm currently using JLINQ to do the filtering, but I'm essentially just looping through the array and returning a smaller set based on conditions.
Would webdb or indexeddb allow me to do this filtering significantly faster? Any tutorials/articles out there that you know of that tackles this particular type of issue?
http://square.github.com/crossfilter/ (no longer maintained, see https://github.com/crossfilter/crossfilter for a newer fork.)
Crossfilter is a JavaScript library for exploring large multivariate
datasets in the browser. Crossfilter supports extremely fast (<30ms)
interaction with coordinated views, even with datasets containing a
million or more records...
This reminds me of an article John Resig wrote about dictionary lookups (a real dictionary, not a programming construct).
http://ejohn.org/blog/dictionary-lookups-in-javascript/
He starts with server side implementations, and then works on a client side solution. It should give you some ideas for ways to improve what you are doing right now:
Caching
Local Storage
Memory Considerations
If you require loading an entire data object into memory before you apply some transform on it, I would leave IndexedDB and WebSQL out of the mix as they typically both add to complexity and reduce the performance of apps.
For this type of filtering, a library like Crossfilter will go a long way.
Where IndexedDB and WebSQL can come into play in terms of filtering is when you don't need to load, or don't want to load, an entire dataset into memory. These databases are best utilized for their ability to index rows (WebSQL) and attributes (IndexedDB).
With in browser databases, you can stream data into a database one record at a time and then cursor through it, one record at a time. The benefit here for filtering is that this you means can leave your data on "disk" (a .leveldb in Chrome and .sqlite database for FF) and filter out unnecessary records either as a pre-filter step or filter in itself.