So I have an app that needs to JSON.stringify its data to put into localStorage, but as the data gets larger, this operation gets outrageously expensive.
So, I tried moving this onto a webWorker so it's off the main thread, but I'm now learning posting an object to a webWorker is even more expensive than stringifying it.
So I guess I'm asking, is there any way whatsoever to get JSON.stringify off the main thread, or at least make it less expensive?
I'm familiar with fast-json-stringify, but I don't think I can feasibly provide a complete schema every time...
You have correctly observed that passing object to web worker costs as much as serializing it. This is because web workers also need to receive serialized data, not native JS objects, because the instance objects are bound to the JS thread they were created in.
The generic solution is applicable to many programming problems: chose the right data structures when working with large datasets. When data gets larger it's better sacrifice simplicity of access for performance. Thus do any of:
Store data in indexedDB
If your large object contains lists of the same kind of entry, use indexed DB for reading and writing and you don't need to worry about serialization at all. This will require refactor of your code, but this is the correct solution for large datasets.
Store data in ArrayBuffer
If your data is mostly fixed-size values, use an ArrayBuffer. ArrayBuffer can be copied or moved to web worker pretty much instantly and if your entries are all same size, serialization can be done in parallel. For access, you may write simple wrappers classes that will translate your binary data into something more readable.
Related
I've been reading up on JSON and serialization, from what I understand JSON is a format often used for transferring data over a network e.g. from/to a web server or storing the data to disk.
The data could be strings, numbers, objects etc. I haven't found a clear explanation for why the serialization is needed, for instance when sending a string to a web server or saving it to disk, isn't the string already stored as a series of bits and bytes by the computer, isn't this the most basic form for the data? so why can't these be sent/stored as they are?
Why does it need to be stringified into JSON i.e. serialised, which turns it into a string?
To be clear, I'm asking why it's needed and a simple clear explanation for that.
Thanks
Broadly speaking serialization does two important, mostly independent jobs:
collects all the information into a single "chunk" (stream) of data that's self-contained and
turns all the information into an agreed-on format (usually optimized for either compactness or ease of parsing)
#1 is important because a single object with many properties and sub-object can be spread all over the memory of a running program.
For example a JavaScript runtime could have a dedicated memory pool for strings constants. Then an object that uses some constant as a key would just reference into that pool from its data structure. That means that the object is no longer in a single self-contained block in memory: it's spread out over multiple areas. This kind of spreading-out is actually the norm: objects don't usually contain complex data directly and depending on the language even "primitive" values such as number could be stored as references to another place in memory.
#2 is important mostly because the format used to quickly access data in-memory might not be suitable to transfer (because it might contain unnecessary redundancy or memory pointers that don't make any sense when transferred to another computer, which partially ties to reason #1).
An example of that would be a map (or dictionary): the in-memory representation will usually involve multiple buckets that hold hashed-values and some kind of collision-handling structure inside those buckets (a linked list or a tree, for example). That structure helps with efficient access to separate keys, but transferring that structure directly over the wire is pointless: it's very easy to re-build and there's no guarantee that the receiving end uses the exact same way to represent a map. So instead we just send each key and the associated value and let the receiving end deal with re-constructing any data structures it needs for efficient access.
The simple reason is that data can be stored differently in memory on different computers, or even by programs on the same computer written in different programming languages.
Serialization formats like JSON provide a defined way for exchanging data between computers or programs.
Not everyone knows how to parse or interpret those series of bits. Sometimes you need some general structure, some format, that can be passed around so that other people understand what it is you're trying to tell them.
I'm building a nodejs app that needs access to some data. I am not sure what is the best way to store the data. If it is json or mongodb or a sql database considering the performance of the read operation.
The app will never update/ insert/ delete any of the data. That's why I wrote it is static. And the amount of data could be a total of at most a few hundreds small objects.
What is your opinion on that? Really considering the max performance of the read operation.
Since it is 'static' data and that too only a few hundreds small objects, I'd recommend that you go ahead with JSON. SQL should be preferred when operations such as data manipulation, concurrent sessions etc. are involved.
This is not opinion based.
The answer is a flat file.
Reasoning: When leveraging a database, there are defined use cases. triggers, inserts, deletes, updates, etc. All of this is managed by a database language of your choosing.
If you are not leveraging any key aspects of a database, then why do you need the overhead of it.
The best way to approach this situation would be to consolidate the access to a class you create called: StaticService or whatever fits your fancy. In this class you will read in the data and store it in memory as a property. Then have various methods in that service which will get you the data you request.
Even with a Database, you would still implement this kind of service worker, but you dont have this overhead. You can also optimize it as you see fit, but it sounds like you may be looking to display lists, or specific values which are generally o(1) access if the json is designed correctly.
I have a web app, and it has some variables in the main program. These variables are objects with lots of strings or arrays in them.
But then it makes 4 web workers. But it then sends the giant object as messages to each web worker. This basically clones the object 4 times.
I want to use the new sharedarraybuffer datatype http://lucasfcosta.com/2017/04/30/JavaScript-From-Workers-to-Shared-Memory.html to be able to have the web workers be able to access the object from the parent, so it can be more memory-efficient.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer
How can I do this? The example seems to initialize it by making it based on number of bytes needed. I don't understand how I would calculate that.
Does anyone know how to do this?
Thanks
You cannot really use that for the data structure you describe. I mean theoretically you could use getters and setters that would refer to byte offsets in the underlying shared byte array, but that would be really complicated project, most likely not worth the effort unless you're doing it for fun.
You will need to re-think your data structure if you want to use shared memory. You need to ensure that it's constant size and you need to simplify it.
Try to use profiler to see which part of your object is biggest and try start by only sharing that part.
Currently I'm experimenting with localStorage to store a large amount of objects of same type, and I am getting a bit confused.
One way of thinking is to store all the object in an array. But then for each read/write of a single object I need to deserialise/serialise the whole array.
The other way is to directly store each object with its key in the localStorage. This will make accessing each object much easier but I'm worried of the amount of objects that will be stored (tens of thousands). Also, getting all the objects will require iterating the whole localStorage!
I'm wondering which way will be better in your experience? Also, would it be worthwhile to try on more sophisticated client side database like PouchDB?
If you want something simple for storing a large amount of key/values, and you don't want to have to worry about the types, then I recommend LocalForage. You can store strings, numbers, arrays, objects, Blobs, whatever you want. It uses IndexedDB and WebSQL where available, so the storage limits are much higher than LocalStorage.
PouchDB works too, but the API is more complex, and it's better-suited for when you want to sync data with CouchDB on the server.
If you do not want to have a lot of keys, you can:
concat row JSONs with \n and store them as a single key
build and update an index(es) stored under separate keys, each linking some key with a particular row number.
In this case parsing rows is just .split('\n') that is ~2 orders of magnitude faster, then JSON.parse.
Please, notice, that you possibly need special effort to syncronize simultaneously opened tabs. It can be a challenge in complex cases.
localStorage has both good and bad parts.
Good parts:
syncronous;
extremely fast, both read and write are just memcpy – it‘s 100+Mb/s throughput even on weak devices (for example JSON.stringify is in general 5-20 times slower than localStorage.setItem);
thoroughly tested and reliable.
Bad news:
no transactions, so you need an engineering effort to sync tabs;
think you have not more than 2Mb (cause there exist systems with this limit);
2Mb of storage actually mean 1M chars you can save.
These points show borders of localStorage applicability as a DB. LS is good for tasks, where you need syncronicity and speed, and where you can trim you DB to fit into quota.
So localStorage is good for caches and logs. Not more.
I hadn't personally used localStorage to manage so many elements.
However, the pattern I usually use to manage data is to load the complete info database into a javascript object, manage it on memory during the proccess and saving it again to localStorage when the proccess is finished.
Of course, this pattern may not be a good approach to your needings, depending on your project specifications.
If you need to save data constantly, data access could become a problem, and thus probably using some type of small database access is a better option.
If your data volume is exceptionally high it also could be a problem to manage it on memory, however, depending on data model, you'd be able to build it to efficient structures that would allow you to load and save data just when it's needed.
I'm a fairly well versed programmer, so learning new technologies shouldn't be that big of an issue. That being said I'm currently attempting to make a card game in HTML5 using canvas/javascript etc.
The current question that I have is what to use to store instances of the cards. I was thinking about using XML to store the card data, but I'd like to limit the amount of work the browser has to do so the game runs more smoothly, I've heard JSON is a good alternative, but I'm just looking for suggestions. Thanks!
JSON is better in my opinion.
You can serialize objects to JSON at server side and send JSON string to client (browser), then your client will be able to parse JSON string into regular JavaScript object using JSON.parse.
In this way you'll not need to walk through XML to find particular nodes, but will just work with data in more convenient way using native JavaScript objects/arrays.
Also in most cases JSON will be more compact than XML so this can save bandwidth and speed-up data loading.
Also the data types stuff may be important here - JSON represents datatypes correctly (integers, booleans, floats, strings) and XML is storing them as strings so you'll need some additional attributes to set datatype during serialization and determine it during deserialization.
I am not sure how to do this without a framework, but what I would do is use Backbone.JS and create a model of what an instance would look like. Eg:{CardNumber:'2', CardColor: 'red', CardClass: 'hearts'}. Now I would create a collection to hold all these models, see backbone collections.
So I would store all this data client side, and possibly provide the user with an option to save the game, to persist this data to a database. This stores it as JSON and then when you persist it to the database, you can serialize it to get the individual components.
If you dont want to save to the db and do not want to use a framework. Try stack/queue implementations in Javascript. See:How do you implement a Stack and a Queue in JavaScript?
I hope that answers your question.
Stick to JSON because JSON is just a string representation of plain JS objects, and browsers are very comfortable with it. JS have no good XML handling and that will be too expensive.
Use HTML5 localStorage for keeping data until you really need to sync with the server. Frequent server operations will cause your game to suffer. Use bulk data transfers instead of many small server connections (for example at the start and the end).
Consider using a game library if the canvas graphics are intense. I have used http://jawsjs.com sometime back, but there should be better libs available out there. Selectively render only the dynamic objects, not everything on canvas.
JSON in conjunction with localStorage is a great way to go.
There are libraries available to serialize and deserialize Javascript objects and allow you tp store and retrieve it from localStorage. Simple Github search is a good way to start