JavaScript and JSON efficiency? Should we make a smaller array?

JavaScript and JSON efficiency? Should we make a smaller array? - javascript

I'm building (with a partner) a small web app. It is an information app and pulls the info from a JSON file; we'll end up with about 150 items in the JSON, each with 130 properties.
We build the buttons in the app by querying the JSON, which is stored in localStorage, and getting things like item[i].name and item[i].cssClass to construct buttons.
Question is - would it be more efficient and worth it to create 2 arrays in localStorage that hold the name and cssClass for the purposes of constructing these buttons, or is this a waste of time and should we just pull the name and cssClass straight from the JSON?
I should clarify - We need to the items by sortable by name, cssClass, etc; users can sort the data into lists and the buttons can be constructed as alphabetic lists (which take you to the details of the items) or as category buttons which take you to lists of items in that category.
The issue is - does sorting the JSON carry a significant overhead compared to just sorting the array of names?

Is this a waste of time and should we just pull the name and cssClass straight from the JSON?
Yes, it will be fine.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - http://c2.com/cgi/wiki?PrematureOptimization

Retrieving the i-th element of an array, same as getting a property of an object, has time complexity O(1). JSON is quite a high-level language. As long as you don't work with millions of items, you shouldn't care about the implementation details of the interpreter.
I'd guess that the web browser will sure spend much more time working with DOM / rendering the page, compared to operations performed on your data structures.

Related

Managing Memory of JavaScript Arrays. Does Splicing arrays remove the forgotten data from memory?

I'm creating a relatively simple but very useful web application that goes through ALOT of data (small JSON/js objects). I couldn't find any database services that were stored locally that were to my liking. I came up with this solution.
Context:
The client simply gets the data from an API that I'm creating and analyzes for the user, all the data is meant to be stored in the server of course.
An instance of the following class I will describe collects about 100k data objects a day on average.
Until further development, the client will only be allowed access to the last 24 hours' worth of real-time data.
The class I created collects the data in an array and posts the data to the API, but because of the sheer amount of data, I'm concerned with memory over the long haul. My solution is every day, for the class to save the data in a timestamped JSON file in a folder designated for the instance of that class. For example, if this instance is tracking ETHUSDT data, each json file would be named something like 'ETHUSDT_m100k_10/15/2021-10/16/2021.json'. Then with the array, delete any data older than a day, so that the array doesn't take up too much memory. This way, the class can still post a day's worth of data to the API, without losing data after a day and maintain a sizeable array.
My concern is that the data spliced from the array will still be left in memory without a reference. Is this the case? I've only begun learning to code this February and am self-taught, so I understand there will be gaps in my knowledge. I know the JS garbage collector is pretty good, but I wrote a similar program in Python and with the same tactic, the memory still continued to grow.
Sorry for the long explanation for a simple question. In other words:
Is data removed from an array left in memory?

How to order huge ( GB sized ) CSV file?

Background
I have a huge CSV file that has several million rows. Each row has a timestamp I can use to order it.
Naive approach
So, my first approach was obviously to read the entire thing by putting it in memory and then ordering. It didn't work that well as you may guess....
Naive approach v2
My second try was to follow a bit the idea behind MapReduce.
So, I would slice this huge file in several parts, and order each one. Then I would combine all the parts into the final file.
The issue here is that part B may have a message that should be in part A. So in the end, even though each part is ordered, I cannot guarantee the order of the final file....
Objective
My objective is to create a function that when given this huge unordered CSV file, can create an ordered CSV file with the same information.
Question
What are the popular solutions/algorithm to order data sets this big?

What are the popular solutions/algorithm to order data sets this big?
Since you've already concluded that the data is too large to sort/manipulate in the memory you have available, the popular solution is a database which will build disk-based structures for managing and sorting more data than can be in memory.
You can either build your own disk-based scheme or you can grab one that is already fully developed, optimized and maintained (e.g. a popular database). The "popular" solution that you asked about would be to use a database for managing/sorting large data sets. That's exactly what they're built for.
Database
You could set up a table that was indexed by your sort key, insert all the records into the database, then create a cursor sorted by your key and iterate the cursor, writing the now sorted records to your new file one at a time. Then, delete the database when done.
Chunked Memory Sort, Manual Merge
Alternatively, you could do your chunked sort where you break the data into smaller pieces that can fit in memory, sort each piece, write each sorted block to disk, then do a merge of all the blocks where you read the next record from each block into memory, find the lowest one from all the blocks, write it out to your final output file, read the next record from that block and repeat. Using this scheme, the merge would only ever have to have N records in memory at a time where N is the number of sorted chunks you have (less than the original chunked block sort, probably).
As juvian mentioned, here's an overview of how an "external sort" like this could work: https://en.wikipedia.org/wiki/External_sorting.
One key aspect of the chunked memory sort is determining how big to make the chunks. There are a number of strategies. The simplest may be to just decide how many records you can reliably fit and sort in memory based on a few simple tests or even just a guess that you're sure is safe (picking a smaller number to process at a time just means you will split the data across more files). Then, just read that many records into memory, sort them, write them out to a known filename. Repeat that process until you have read all the records and then are now all in temp files with known filenames on the disk.
Then, open each file, read the first record from each one, find the lowest record of each that you read in, write it out to your final file, read the next record from that file and repeat the process. When you get to the end of a file, just remove it from the list of data you're comparing since it's now done. When there is no more data, you're done.
Sort Keys only in Memory
If all the sort keys themselves would fit in memory, but not the associated data, then you could make and sort your own index. There are many different ways to do that, but here's one scheme.
Read through the entire original data capturing two things into memory for every record, the sort key and the file offset in the original file where that data is stored. Then, once you have all the sort keys in memory, sort them. Then, iterate through the sorted keys one by one, seeking to the write spot in the file, reading that record, writing it to the output file, advancing to the next key and repeating until the data for every key was written in order.
BTree Key Sort
If all the sort keys won't fit in memory, then you can get a disk-based BTree library that will let you sort things larger than can be in memory. You'd use the same scheme as above, but you'd be putting the sort key and file offset into a BTree.
Of course, it's only one step further to put the actual data itself from the file into the BTree and then you have a database.

I would read the entire file row-by-row and output each line into a temporary folder grouping lines into files by reasonable time interval (should the interval be a year, a day, an hour, ... etc. is for you to decide basing on your data). So the temporary folder would contain individual files for each interval (for example, for day interval split that would be 2018-05-20.tmp, 2018-05-21.tmp, 2018-05-22.tmp, ... etc.). Now we can read the files in order, sort each in memory and output into the target sorted file.

Best practise of using localstorage to store a large amount of objects

Currently I'm experimenting with localStorage to store a large amount of objects of same type, and I am getting a bit confused.
One way of thinking is to store all the object in an array. But then for each read/write of a single object I need to deserialise/serialise the whole array.
The other way is to directly store each object with its key in the localStorage. This will make accessing each object much easier but I'm worried of the amount of objects that will be stored (tens of thousands). Also, getting all the objects will require iterating the whole localStorage!
I'm wondering which way will be better in your experience? Also, would it be worthwhile to try on more sophisticated client side database like PouchDB?

If you want something simple for storing a large amount of key/values, and you don't want to have to worry about the types, then I recommend LocalForage. You can store strings, numbers, arrays, objects, Blobs, whatever you want. It uses IndexedDB and WebSQL where available, so the storage limits are much higher than LocalStorage.
PouchDB works too, but the API is more complex, and it's better-suited for when you want to sync data with CouchDB on the server.

If you do not want to have a lot of keys, you can:
concat row JSONs with \n and store them as a single key
build and update an index(es) stored under separate keys, each linking some key with a particular row number.
In this case parsing rows is just .split('\n') that is ~2 orders of magnitude faster, then JSON.parse.
Please, notice, that you possibly need special effort to syncronize simultaneously opened tabs. It can be a challenge in complex cases.
localStorage has both good and bad parts.
Good parts:
syncronous;
extremely fast, both read and write are just memcpy – it‘s 100+Mb/s throughput even on weak devices (for example JSON.stringify is in general 5-20 times slower than localStorage.setItem);
thoroughly tested and reliable.
Bad news:
no transactions, so you need an engineering effort to sync tabs;
think you have not more than 2Mb (cause there exist systems with this limit);
2Mb of storage actually mean 1M chars you can save.
These points show borders of localStorage applicability as a DB. LS is good for tasks, where you need syncronicity and speed, and where you can trim you DB to fit into quota.
So localStorage is good for caches and logs. Not more.

I hadn't personally used localStorage to manage so many elements.
However, the pattern I usually use to manage data is to load the complete info database into a javascript object, manage it on memory during the proccess and saving it again to localStorage when the proccess is finished.
Of course, this pattern may not be a good approach to your needings, depending on your project specifications.
If you need to save data constantly, data access could become a problem, and thus probably using some type of small database access is a better option.
If your data volume is exceptionally high it also could be a problem to manage it on memory, however, depending on data model, you'd be able to build it to efficient structures that would allow you to load and save data just when it's needed.

In node.js, should a common set of data be in the database or a class?

In my database I have a table with data of cities. It includes the city name, translation of the name (it's a bi-lingual website), and latitude/longitude. This data will not change and every user will need to load this data. There are about 300 rows.
I'm trying to keep the pressure put on the server as low as possible (at least to a reasonable extent), but I'd also prefer to keep these in the database. Would it be best to have this data inside a class that is loaded in my main app.js file? It should be kept in memory and global to all users, correct? Or would it be better on the server to keep it in the database and select the data whenever a user needs it? The sign in screen has the listing of cities, so it would be loaded often.
I've just seen that unlike PHP, many of the Node.js servers don't have tons of memory, even the ones that aren't exactly cheap, so I'm worried about putting unnecessary things into memory.

I decided to give this a try. Using an example data set consisting of 300 rows (each containing 24 string characters and two doubles and property names), a small node.js script indicated an additional memory usage of 80 to 100 KB.
You should ask yourself:
How often will the data be used? How much of the data does a user need?
If the whole dataset will be used on a regular basis (let's say multiple times a second), you should consider keeping the data in memory. If, however, your users will need a part of the data and only once from time to time, you might consider loading the data from a database.
Can I guarantee efficient loading from the database?
An important fact is that loading parts of the data from a database might even require more memory, because the V8 garbage collector might delay the collection of the loaded data, so the same data (or multiple parts of the data) might be in memory at the same time. There is also a guaranteed overhead due to database / connection data and so on.
Is my approach sustainable?
Finally, consider the possibility of data growth. If you expect the dataset to grow by a non-trivial amount, think about the above points again and decide whether a growth is likely enough to justify database usage.

Processing a large (12K+ rows) array in JavaScript

The project requirements are odd for this one, but I'm looking to get some insight...
I have a CSV file with about 12,000 rows of data, approximately 12-15 columns. I'm converting that to a JSON array and loading it via JSONP (has to run client-side). It takes many seconds to do any kind of querying on the data set to returned a smaller, filtered data set. I'm currently using JLINQ to do the filtering, but I'm essentially just looping through the array and returning a smaller set based on conditions.
Would webdb or indexeddb allow me to do this filtering significantly faster? Any tutorials/articles out there that you know of that tackles this particular type of issue?

http://square.github.com/crossfilter/ (no longer maintained, see https://github.com/crossfilter/crossfilter for a newer fork.)
Crossfilter is a JavaScript library for exploring large multivariate
datasets in the browser. Crossfilter supports extremely fast (<30ms)
interaction with coordinated views, even with datasets containing a
million or more records...

This reminds me of an article John Resig wrote about dictionary lookups (a real dictionary, not a programming construct).
http://ejohn.org/blog/dictionary-lookups-in-javascript/
He starts with server side implementations, and then works on a client side solution. It should give you some ideas for ways to improve what you are doing right now:
Caching
Local Storage
Memory Considerations

If you require loading an entire data object into memory before you apply some transform on it, I would leave IndexedDB and WebSQL out of the mix as they typically both add to complexity and reduce the performance of apps.
For this type of filtering, a library like Crossfilter will go a long way.
Where IndexedDB and WebSQL can come into play in terms of filtering is when you don't need to load, or don't want to load, an entire dataset into memory. These databases are best utilized for their ability to index rows (WebSQL) and attributes (IndexedDB).
With in browser databases, you can stream data into a database one record at a time and then cursor through it, one record at a time. The benefit here for filtering is that this you means can leave your data on "disk" (a .leveldb in Chrome and .sqlite database for FF) and filter out unnecessary records either as a pre-filter step or filter in itself.

Develop Reference

JavaScript is the programming language of the Web.