Javascript - temporary storage of large (> 10 MB) object data

Javascript - temporary storage of large (> 10 MB) object data - javascript

I'm looking for a solution that allows me to cache a large object over a page reload. More specifically I have a large crossfilter.js object that I'd like to retain, since creating it takes a while. I couldn't find any native way to persist a crossfilter.js instance.
I know about the following generic options:
Local storage/session storage. Problem: Apparently no objects bigger
than 5 or 10 MB can be stored, and the ones I'm trying to cache are upwards of 10 MB.
Abusing window.name for a serialized version of my object.
Problem: While it works for classes I've written myself, trying to
serialize and de-serialize the crossfilter.js instance or its
groups/dimensions leads to exceptions; i. e. the internal state of the
crossfilter.js instance is not maintained. I'm using https://github.com/hunterloftis/cryo for serialization.
IndexedDB. Problem: Same as with window.name - I'd have to
serialize my data, which I haven't found a feasible approach for yet. Also a bit of an overkill for my needs, I guess.
Summarized: I want to keep a particular complex object in memory after a page reload. Possible solutions would allow to
store complex objects/class instances (with methods) in the
local/session storage with increased or no memory limits,
using a hack like dumping the object in window.name, but accepting a
complex object without the need for serialization as necessary for
window.name or
using native crossfilter.js functionality to dump/cache one of its instances.
Any tips? Although a browser-independent version is preferred, a Chrome-specific solution will be accepted as well.
Thanks!

Related

Is there a way to get the address of a variable or object in javascript? [duplicate]

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.

If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?

It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.

You can using a side-channel, but you can't do anything useful with it other than attacking browser security!
The closest to virtual addresses are ArrayBuffers.
If one virtual address within an ArrayBuffer is identified,
the remaining addresses are also known, as both the addresses
of the memory and the array indices are linear.
Although virtual addresses are not themselves physical memory addresses, there are ways to translate virtual address into a physical memory address.
Browser engines allocate ArrayBuffers always page
aligned. The first byte of the ArrayBuffer is therefore at the
beginning of a new physical page and has the least significant
12 bits set to ‘0’.
If a large chunk of memory is allocated, browser engines typically
use mmap to allocate this memory, which is optimized to
allocate 2 MB transparent huge pages (THP) instead of 4 KB
pages.
As these physical pages are mapped on
demand, i.e., as soon as the first access to the page occurs,
iterating over the array indices results in page faults at the
beginning of a new page. The time to resolve a page fault is
significantly higher than a normal memory access. Thus, you can knows the index at which a new 2 MB page starts. At
this array index, the underlying physical page has the 21 least
significant bits set to ‘0’.
This answer is not trying to provide a proof of concept because I don’t have time for this, but I may be able to do so in the future. This answer is an attempt to point the right direction to the person asking the question.
Sources,
http://www.misc0110.net/files/jszero.pdf
https://download.vusec.net/papers/anc_ndss17.pdf

I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

Use web worker to stringify

So I have an app that needs to JSON.stringify its data to put into localStorage, but as the data gets larger, this operation gets outrageously expensive.
So, I tried moving this onto a webWorker so it's off the main thread, but I'm now learning posting an object to a webWorker is even more expensive than stringifying it.
So I guess I'm asking, is there any way whatsoever to get JSON.stringify off the main thread, or at least make it less expensive?
I'm familiar with fast-json-stringify, but I don't think I can feasibly provide a complete schema every time...

You have correctly observed that passing object to web worker costs as much as serializing it. This is because web workers also need to receive serialized data, not native JS objects, because the instance objects are bound to the JS thread they were created in.
The generic solution is applicable to many programming problems: chose the right data structures when working with large datasets. When data gets larger it's better sacrifice simplicity of access for performance. Thus do any of:
Store data in indexedDB
If your large object contains lists of the same kind of entry, use indexed DB for reading and writing and you don't need to worry about serialization at all. This will require refactor of your code, but this is the correct solution for large datasets.
Store data in ArrayBuffer
If your data is mostly fixed-size values, use an ArrayBuffer. ArrayBuffer can be copied or moved to web worker pretty much instantly and if your entries are all same size, serialization can be done in parallel. For access, you may write simple wrappers classes that will translate your binary data into something more readable.

How to share variables from main thread and web workers in javascript?

I have a web app, and it has some variables in the main program. These variables are objects with lots of strings or arrays in them.
But then it makes 4 web workers. But it then sends the giant object as messages to each web worker. This basically clones the object 4 times.
I want to use the new sharedarraybuffer datatype http://lucasfcosta.com/2017/04/30/JavaScript-From-Workers-to-Shared-Memory.html to be able to have the web workers be able to access the object from the parent, so it can be more memory-efficient.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer
How can I do this? The example seems to initialize it by making it based on number of bytes needed. I don't understand how I would calculate that.
Does anyone know how to do this?
Thanks

You cannot really use that for the data structure you describe. I mean theoretically you could use getters and setters that would refer to byte offsets in the underlying shared byte array, but that would be really complicated project, most likely not worth the effort unless you're doing it for fun.
You will need to re-think your data structure if you want to use shared memory. You need to ensure that it's constant size and you need to simplify it.
Try to use profiler to see which part of your object is biggest and try start by only sharing that part.

Best practise of using localstorage to store a large amount of objects

Currently I'm experimenting with localStorage to store a large amount of objects of same type, and I am getting a bit confused.
One way of thinking is to store all the object in an array. But then for each read/write of a single object I need to deserialise/serialise the whole array.
The other way is to directly store each object with its key in the localStorage. This will make accessing each object much easier but I'm worried of the amount of objects that will be stored (tens of thousands). Also, getting all the objects will require iterating the whole localStorage!
I'm wondering which way will be better in your experience? Also, would it be worthwhile to try on more sophisticated client side database like PouchDB?

If you want something simple for storing a large amount of key/values, and you don't want to have to worry about the types, then I recommend LocalForage. You can store strings, numbers, arrays, objects, Blobs, whatever you want. It uses IndexedDB and WebSQL where available, so the storage limits are much higher than LocalStorage.
PouchDB works too, but the API is more complex, and it's better-suited for when you want to sync data with CouchDB on the server.

If you do not want to have a lot of keys, you can:
concat row JSONs with \n and store them as a single key
build and update an index(es) stored under separate keys, each linking some key with a particular row number.
In this case parsing rows is just .split('\n') that is ~2 orders of magnitude faster, then JSON.parse.
Please, notice, that you possibly need special effort to syncronize simultaneously opened tabs. It can be a challenge in complex cases.
localStorage has both good and bad parts.
Good parts:
syncronous;
extremely fast, both read and write are just memcpy – it‘s 100+Mb/s throughput even on weak devices (for example JSON.stringify is in general 5-20 times slower than localStorage.setItem);
thoroughly tested and reliable.
Bad news:
no transactions, so you need an engineering effort to sync tabs;
think you have not more than 2Mb (cause there exist systems with this limit);
2Mb of storage actually mean 1M chars you can save.
These points show borders of localStorage applicability as a DB. LS is good for tasks, where you need syncronicity and speed, and where you can trim you DB to fit into quota.
So localStorage is good for caches and logs. Not more.

I hadn't personally used localStorage to manage so many elements.
However, the pattern I usually use to manage data is to load the complete info database into a javascript object, manage it on memory during the proccess and saving it again to localStorage when the proccess is finished.
Of course, this pattern may not be a good approach to your needings, depending on your project specifications.
If you need to save data constantly, data access could become a problem, and thus probably using some type of small database access is a better option.
If your data volume is exceptionally high it also could be a problem to manage it on memory, however, depending on data model, you'd be able to build it to efficient structures that would allow you to load and save data just when it's needed.

Using a JSON object vs. localStorage/sessionStorage/IndexedDB/WebSQL/etc.?

I've got a web app which gets a couple dozen items at boot. All these items are JSON and are smaller then 1kb.
Now there are a number of storage options as seen in the Question.
I was thinking of just storing these objects inside a variable in the browser JS. I don't really see why I would want to use any of these browser storages?
So what would be reasons to use any of the browser-based storage instead of a variable inside JS.
Could be that from a certain data size it is preferable to use browser storage, e.g. from 100kb onwards it's better to not use a JS variable.
var myModel = {}
NOTE
Every time the user enters the app he will get fresh content from the server. The content is too realtime for caching.
`

localStorage , globalStorage and sessionStorage:
These features are ready in browsers that have implemented the "Web Storage", they all refer to a kind of HashMap, a map between string keys and string values. but the life is different. once the active page is closed sessionStorage would be cleaned but the localStorage is permanent.(MDN DOM Storage guide)
There is a point about globalStorage, which is its being obsolete since Gecko 1.9.1 (Firefox 3.5) and unsupported since Gecko 13 (Firefox 13), since then we should use localStorage. the difference between these 2 was just the HTML5 scope support(scheme + hostname + non-standard port).
These could be useful for you to:
-Share your objects between your different pages, in your site.
-Offline programming.
-Caching large object
-Or whenever you need to a local persistent storage.
IndexedDB:
IndexedDB is useful for applications that store a large amount of data (for example, a catalog of DVDs in a lending library) and applications that don't need persistent internet connectivity to work (for example, mail clients, to-do lists, and notepads)
based on this quote from MDN you can easily find your answer out, regarding using IndexedDB, if you don't know whether IndexedDB is useful for you or not, just answer these questions:
Do you store a large amount of data on client? if yes, so consider using it.
Does your app need to be offline enabled? if yes, so consider using IndexedDB.
Does your app need to a persistent internet connectivity? If yes, it stays still an option, based on the other factors.
So other than working offline as far as you don't need it, I guess, because as you said:
The content is too realtime for caching.
These have some features like sharing objects, and managing large amount of data, which you should be the one to decide.

localStorage and sessionStorage are solving a caching problem; think of them as cookies. You've said you don't want caching, so you can ignore them.
JavaScript objects behave basically like O(1) lookup tables (see How is a JavaScript hash map implemented?, and make sure you read both the top two answers, as both have something useful to say), and there is no maximum memory limit that I am aware of, or a point where another solution becomes a better choice
The only reason I can think of that you should bother with the extra step of inserting the data in an IndexedDB is if you need O(1) lookups on a field that is not the object key you are using.

Develop Reference

JavaScript is the programming language of the Web.