I am relatively new to ReactJS and web scene so I am turning to you, SO.
Our app is working with Firebase to store data. Recently I created a little stress test for our app which is loading about 14k entries from database and saving all of it in array. To be exact, we are downloading some data that we will use to represent different kind of charts and display some statistics about them. 14k, number of entries, is here just to test the speed of everything, we are actually not expecting that many entries per customer (at least not yet). Anyway I decided to think about future and how I should handle this kind of problems.
I did some rough calculations about loading times:
4-6s from Firebase into array
3-4s generating dates for chart and checking for matching dates (fetched data with generated dates) and calculating data
Everything, from saving results from database on is done using array.map. I have also tried using while / for, since I read some articles that .map is not very good when it comes to performance. The result was the same.
So what am I asking here is what kind of "system" should I use if I want to be prepared for "larger" amount of data in arrays? Can .map handle that much data? What are the best practices?
Related
On my web-app, the user can request different data lines. Those data lines have an unique "statusID" each, say "18133". When the user requests to load the data, it is either loaded from the server or from indexedDB(that part im trying to figure out). In order to make it as fast as possible, I want the index to be the timestamp of the data, as I will request ranges which are smaller than the actual data in the indexedDB. However, I am trying to figure out how to create the stores and store the data properly. I tried to dynamically create stores everytime data with a new Id is requested, but creating stores is only possible in "onupradeneeded". I also thought about storing everything in the same store, but I fear that the performance will make that bad. I do not really now how to approach this thing.
What I do know: If you index a value, it means that the data is sorted, which is exactly what I want. I dont know if the following is possible but this would solve my issue too: store everthing in the same store, index by "statusID" and index by "timestamp". This way, it would be fast too i guess.
Note that I am talking about many many datapoints, possible in the millions.
You can index by multiple values, allowing you to get all by statusID and restricting to a range for your timestamp. So I'd go with the one datastore solution. Performance should not be an issue.
This earlier post may be helpful: Javascript: Searching indexeddb using multiple indexes
I am building an app where I constantly have to do queries that return unique data sets that can range from 5,000 - 50,000 csv elements. It seems rather inefficient to keep all these queries in memory. The data sets are used for data visualization. Does anyone have suggestions in how to approach this? Or should I just ditch redux / immutable.js?
I have tested immutable with a large JSON file (22mb) and you're correct it was very inefficient. The particular pain point was internet explorer; reducing and filtering that set of data took almost 30s. I don't consider this a fault of Immutable.js however, it's goal isn't to process 250,000 items of JSON in one nuclear data explosion (which is what I was testing in this case, to find the limits of the browsers computational limits with regards to filtering in particular). But it's important to note it's not a library that's designed primarily for speed.
Native objects however brought that time down to at most 6s on IE12 (ironic that 11 was faster, but their you go). But the avg was between 1-2s.
So in short, I wouldn't use immutable for large data sets, I would use native objects and then perhaps use Immutable for a subset of that large dataset if you have a particular design reason for this.
Currently I'm experimenting with localStorage to store a large amount of objects of same type, and I am getting a bit confused.
One way of thinking is to store all the object in an array. But then for each read/write of a single object I need to deserialise/serialise the whole array.
The other way is to directly store each object with its key in the localStorage. This will make accessing each object much easier but I'm worried of the amount of objects that will be stored (tens of thousands). Also, getting all the objects will require iterating the whole localStorage!
I'm wondering which way will be better in your experience? Also, would it be worthwhile to try on more sophisticated client side database like PouchDB?
If you want something simple for storing a large amount of key/values, and you don't want to have to worry about the types, then I recommend LocalForage. You can store strings, numbers, arrays, objects, Blobs, whatever you want. It uses IndexedDB and WebSQL where available, so the storage limits are much higher than LocalStorage.
PouchDB works too, but the API is more complex, and it's better-suited for when you want to sync data with CouchDB on the server.
If you do not want to have a lot of keys, you can:
concat row JSONs with \n and store them as a single key
build and update an index(es) stored under separate keys, each linking some key with a particular row number.
In this case parsing rows is just .split('\n') that is ~2 orders of magnitude faster, then JSON.parse.
Please, notice, that you possibly need special effort to syncronize simultaneously opened tabs. It can be a challenge in complex cases.
localStorage has both good and bad parts.
Good parts:
syncronous;
extremely fast, both read and write are just memcpy – it‘s 100+Mb/s throughput even on weak devices (for example JSON.stringify is in general 5-20 times slower than localStorage.setItem);
thoroughly tested and reliable.
Bad news:
no transactions, so you need an engineering effort to sync tabs;
think you have not more than 2Mb (cause there exist systems with this limit);
2Mb of storage actually mean 1M chars you can save.
These points show borders of localStorage applicability as a DB. LS is good for tasks, where you need syncronicity and speed, and where you can trim you DB to fit into quota.
So localStorage is good for caches and logs. Not more.
I hadn't personally used localStorage to manage so many elements.
However, the pattern I usually use to manage data is to load the complete info database into a javascript object, manage it on memory during the proccess and saving it again to localStorage when the proccess is finished.
Of course, this pattern may not be a good approach to your needings, depending on your project specifications.
If you need to save data constantly, data access could become a problem, and thus probably using some type of small database access is a better option.
If your data volume is exceptionally high it also could be a problem to manage it on memory, however, depending on data model, you'd be able to build it to efficient structures that would allow you to load and save data just when it's needed.
I am trying to generate the dataset for latency vs time dataset for graphing.
Dataset has thousands to couple million entries that looks like this:
[{"time_stamp":,"latency_axis":},...}
Querying and generating graphs has poor performance with full dataset for obvious reasons. An analogy is like loading large JPEG, first sample a rough fast image, then render the full image.
How can this be achieved from mongodb query efficiently while providing a rough but accurate representation of dataset with respect to time. My ideas is to organize documents into time buckets each covering a constant range of time, return the 1st, 2nd, 3rd results iteratively until all results finish loading. However, there is a no trivial or intuitive way to implement this prototype efficiently.
Can anyone provide guide/solution/suggestion regarding this situation?
Thanks
Notes for zamnuts:
A quick sample with very small dataset of what we are trying to achieve is: http://baker221.eecg.utoronto.ca:3001/pages/latencyOverTimeGraph.html
We like to dynamically focus on specific regions. Our goal is to get good representation but avoid loading all of the dataset, at the same time have reasonable performance.
DB: mongodb Server: node.js Client: angular + nvd3 directive (proof of concept, will likely switch to d3.js for more control).
The project requirements are odd for this one, but I'm looking to get some insight...
I have a CSV file with about 12,000 rows of data, approximately 12-15 columns. I'm converting that to a JSON array and loading it via JSONP (has to run client-side). It takes many seconds to do any kind of querying on the data set to returned a smaller, filtered data set. I'm currently using JLINQ to do the filtering, but I'm essentially just looping through the array and returning a smaller set based on conditions.
Would webdb or indexeddb allow me to do this filtering significantly faster? Any tutorials/articles out there that you know of that tackles this particular type of issue?
http://square.github.com/crossfilter/ (no longer maintained, see https://github.com/crossfilter/crossfilter for a newer fork.)
Crossfilter is a JavaScript library for exploring large multivariate
datasets in the browser. Crossfilter supports extremely fast (<30ms)
interaction with coordinated views, even with datasets containing a
million or more records...
This reminds me of an article John Resig wrote about dictionary lookups (a real dictionary, not a programming construct).
http://ejohn.org/blog/dictionary-lookups-in-javascript/
He starts with server side implementations, and then works on a client side solution. It should give you some ideas for ways to improve what you are doing right now:
Caching
Local Storage
Memory Considerations
If you require loading an entire data object into memory before you apply some transform on it, I would leave IndexedDB and WebSQL out of the mix as they typically both add to complexity and reduce the performance of apps.
For this type of filtering, a library like Crossfilter will go a long way.
Where IndexedDB and WebSQL can come into play in terms of filtering is when you don't need to load, or don't want to load, an entire dataset into memory. These databases are best utilized for their ability to index rows (WebSQL) and attributes (IndexedDB).
With in browser databases, you can stream data into a database one record at a time and then cursor through it, one record at a time. The benefit here for filtering is that this you means can leave your data on "disk" (a .leveldb in Chrome and .sqlite database for FF) and filter out unnecessary records either as a pre-filter step or filter in itself.