I've been reading up on JSON and serialization, from what I understand JSON is a format often used for transferring data over a network e.g. from/to a web server or storing the data to disk.
The data could be strings, numbers, objects etc. I haven't found a clear explanation for why the serialization is needed, for instance when sending a string to a web server or saving it to disk, isn't the string already stored as a series of bits and bytes by the computer, isn't this the most basic form for the data? so why can't these be sent/stored as they are?
Why does it need to be stringified into JSON i.e. serialised, which turns it into a string?
To be clear, I'm asking why it's needed and a simple clear explanation for that.
Thanks
Broadly speaking serialization does two important, mostly independent jobs:
collects all the information into a single "chunk" (stream) of data that's self-contained and
turns all the information into an agreed-on format (usually optimized for either compactness or ease of parsing)
#1 is important because a single object with many properties and sub-object can be spread all over the memory of a running program.
For example a JavaScript runtime could have a dedicated memory pool for strings constants. Then an object that uses some constant as a key would just reference into that pool from its data structure. That means that the object is no longer in a single self-contained block in memory: it's spread out over multiple areas. This kind of spreading-out is actually the norm: objects don't usually contain complex data directly and depending on the language even "primitive" values such as number could be stored as references to another place in memory.
#2 is important mostly because the format used to quickly access data in-memory might not be suitable to transfer (because it might contain unnecessary redundancy or memory pointers that don't make any sense when transferred to another computer, which partially ties to reason #1).
An example of that would be a map (or dictionary): the in-memory representation will usually involve multiple buckets that hold hashed-values and some kind of collision-handling structure inside those buckets (a linked list or a tree, for example). That structure helps with efficient access to separate keys, but transferring that structure directly over the wire is pointless: it's very easy to re-build and there's no guarantee that the receiving end uses the exact same way to represent a map. So instead we just send each key and the associated value and let the receiving end deal with re-constructing any data structures it needs for efficient access.
The simple reason is that data can be stored differently in memory on different computers, or even by programs on the same computer written in different programming languages.
Serialization formats like JSON provide a defined way for exchanging data between computers or programs.
Not everyone knows how to parse or interpret those series of bits. Sometimes you need some general structure, some format, that can be passed around so that other people understand what it is you're trying to tell them.
Related
So I have an app that needs to JSON.stringify its data to put into localStorage, but as the data gets larger, this operation gets outrageously expensive.
So, I tried moving this onto a webWorker so it's off the main thread, but I'm now learning posting an object to a webWorker is even more expensive than stringifying it.
So I guess I'm asking, is there any way whatsoever to get JSON.stringify off the main thread, or at least make it less expensive?
I'm familiar with fast-json-stringify, but I don't think I can feasibly provide a complete schema every time...
You have correctly observed that passing object to web worker costs as much as serializing it. This is because web workers also need to receive serialized data, not native JS objects, because the instance objects are bound to the JS thread they were created in.
The generic solution is applicable to many programming problems: chose the right data structures when working with large datasets. When data gets larger it's better sacrifice simplicity of access for performance. Thus do any of:
Store data in indexedDB
If your large object contains lists of the same kind of entry, use indexed DB for reading and writing and you don't need to worry about serialization at all. This will require refactor of your code, but this is the correct solution for large datasets.
Store data in ArrayBuffer
If your data is mostly fixed-size values, use an ArrayBuffer. ArrayBuffer can be copied or moved to web worker pretty much instantly and if your entries are all same size, serialization can be done in parallel. For access, you may write simple wrappers classes that will translate your binary data into something more readable.
I recently found out that JSON values can only store string, number, object, array, true, false or null. But from my understanding, JSON is how Javascript represents its objects internally. I don't understand how it is possible to store Javascript objects as JSON if most objects have methods, which are functions? Aren't functions objects? What the heck are functions in my Javascript interpreter's (Node.js) opinion and how does it represent them? Thanks!
JSON is a string interchange format. It stands for JavaScript Object Notation. It was invented long after Javascript and has nothing at all to do with how Javascript stores data internally.
JSON is typically used as an interchange format or storage format. One would take some Javascript data, serialize it to the JSON format and take the resulting string and send it to another process or computer or save it in some sort of storage.
The recipient of the JSON can then parse it back into whatever their local data is. JSON is even used to send data from a Javascript program to a program written in another language (Python, Ruby, C++, etc...).
Functions have no connection at all to JSON. They are not stored in JSON. Their internal storage format inside the JS interpreter is specific to whatever interpreter implementation and is not accessible to the outside world or governed by any standard. It's an implementation detail for any Javascript engine and they can do it however they want and each interpreter likely has it's own implmentation or variation. I don't know of any reason why it would matter to your Javascript code.
I recently found out that JSON values can only store string, number, object, array, true, false or null. But from my understanding, JSON is how Javascript represents its objects internally.
That is not correct. JSON is not something that the Javascript interpreter uses for its objects internally. Internal object formats are specific to a particular Javascript interpreter and are not accessible to Javascript code, nor really relevant when writing code.
I don't understand how it is possible to store Javascript objects as JSON if most objects have methods, which are functions?
Javascript does not use JSON for internal storage so it has nothing at all to do with the internal implementation of Javascript data types.
Aren't functions objects?
Yes, but they have nothing to do with JSON.
What the heck are functions in my Javascript interpreter's (Node.js) opinion and how does it represent them?
Each JS interpreter has its own internal implementation/storage for functions. It is not governed by any standard and is largely irrelevant to how you write code in Javascript.
If you had some reason to want to know how a specific Javascript implementation stores its variables internally, you would have to look into the source code. The V8 implementation from Google (used in Chrome and node.js) and the Firefox implementation from Mozilla are both open source and you could dive into that code (it would be mostly C++ code).
This can get pretty complicated because some data types such as Arrays are stored in a variety of different formats depending upon the structure of the array. I believe V8 has at least three storage formats for arrays depending upon whether the array is compacted or sparse and based on its overall size. This is to optimize for both memory consumption and run-time performance.
Likewise properties on objects may be arranged in highly optimized storage formats if the interpreter has advance information from the code about what is being used and what is not, compared to arbitrary programmatically generated properties.
FYI, you can find the Google repository here: https://chromium.googlesource.com/v8/v8.git and the Mozilla code here: https://hg.mozilla.org/.
I'm just wondering why everyone uses ArrayBuffer instead of just a normal array, string or stringified JSON for sending messages from the server to the client. Is it more efficient?
Also, just wondering what Uint8Array is, how it is different, where to use the two etc.
I am currently using Node.js with Socket.io, but I am happy to change to pure WebSockets if it is a better approach.
An ArrayBuffer is more than just a simple array. It contains raw binary data. This is very useful for direct memory manipulation and conserving space.
When you create a normal array, you won't get a proper set of contiguous memory in many cases since arrays can contain any combination of different kinds of objects. This is useful when you have dynamic data and multiple types together (frequently happens in JS) but is not so useful when you know the exact layout of memory that you need.
This also allows you to view the data at the byte level. For example, it's pretty common in binary data formats to have a n byte long identifier number, an m byte long field telling you how many bytes are used for this field, and m' bytes of data that actually makes up the data field.
[ identifier ][ bytes of data ][ data ]
With an ArrayBuffer, you have the option of moving through that data on the byte level by using various Views. A regular array doesn't allow you to move through the data with that level of granularity because no guarantees are made about the memory layout.
Finally, because you are telling the compiler/interpreter exactly how much space you're going to use and exactly how you're going to view it, it can do much more advanced optimizations when working with that data. When iterating through that data, it doesn't have to make calculated leaps through memory. Instead, it knows exactly how far to move ahead in memory to find the next data point.
As for what Uint8Array is, it's a typed array. Essentially, it tells the compiler/interpreter that you will be accessing this data exclusively as 8-bit uints which, again, allows it to make better optimizations. Then you can use standard array indexing on it (arr[0], arr[1], etc.) and you'll be getting back the equivalent uint values out of the array.
TL;DR They take less space when the exact data format is known, allows you to move more exactly through your data and gives the compiler/interpreter greater options for optimization.
Currently I'm experimenting with localStorage to store a large amount of objects of same type, and I am getting a bit confused.
One way of thinking is to store all the object in an array. But then for each read/write of a single object I need to deserialise/serialise the whole array.
The other way is to directly store each object with its key in the localStorage. This will make accessing each object much easier but I'm worried of the amount of objects that will be stored (tens of thousands). Also, getting all the objects will require iterating the whole localStorage!
I'm wondering which way will be better in your experience? Also, would it be worthwhile to try on more sophisticated client side database like PouchDB?
If you want something simple for storing a large amount of key/values, and you don't want to have to worry about the types, then I recommend LocalForage. You can store strings, numbers, arrays, objects, Blobs, whatever you want. It uses IndexedDB and WebSQL where available, so the storage limits are much higher than LocalStorage.
PouchDB works too, but the API is more complex, and it's better-suited for when you want to sync data with CouchDB on the server.
If you do not want to have a lot of keys, you can:
concat row JSONs with \n and store them as a single key
build and update an index(es) stored under separate keys, each linking some key with a particular row number.
In this case parsing rows is just .split('\n') that is ~2 orders of magnitude faster, then JSON.parse.
Please, notice, that you possibly need special effort to syncronize simultaneously opened tabs. It can be a challenge in complex cases.
localStorage has both good and bad parts.
Good parts:
syncronous;
extremely fast, both read and write are just memcpy – it‘s 100+Mb/s throughput even on weak devices (for example JSON.stringify is in general 5-20 times slower than localStorage.setItem);
thoroughly tested and reliable.
Bad news:
no transactions, so you need an engineering effort to sync tabs;
think you have not more than 2Mb (cause there exist systems with this limit);
2Mb of storage actually mean 1M chars you can save.
These points show borders of localStorage applicability as a DB. LS is good for tasks, where you need syncronicity and speed, and where you can trim you DB to fit into quota.
So localStorage is good for caches and logs. Not more.
I hadn't personally used localStorage to manage so many elements.
However, the pattern I usually use to manage data is to load the complete info database into a javascript object, manage it on memory during the proccess and saving it again to localStorage when the proccess is finished.
Of course, this pattern may not be a good approach to your needings, depending on your project specifications.
If you need to save data constantly, data access could become a problem, and thus probably using some type of small database access is a better option.
If your data volume is exceptionally high it also could be a problem to manage it on memory, however, depending on data model, you'd be able to build it to efficient structures that would allow you to load and save data just when it's needed.
I'm a fairly well versed programmer, so learning new technologies shouldn't be that big of an issue. That being said I'm currently attempting to make a card game in HTML5 using canvas/javascript etc.
The current question that I have is what to use to store instances of the cards. I was thinking about using XML to store the card data, but I'd like to limit the amount of work the browser has to do so the game runs more smoothly, I've heard JSON is a good alternative, but I'm just looking for suggestions. Thanks!
JSON is better in my opinion.
You can serialize objects to JSON at server side and send JSON string to client (browser), then your client will be able to parse JSON string into regular JavaScript object using JSON.parse.
In this way you'll not need to walk through XML to find particular nodes, but will just work with data in more convenient way using native JavaScript objects/arrays.
Also in most cases JSON will be more compact than XML so this can save bandwidth and speed-up data loading.
Also the data types stuff may be important here - JSON represents datatypes correctly (integers, booleans, floats, strings) and XML is storing them as strings so you'll need some additional attributes to set datatype during serialization and determine it during deserialization.
I am not sure how to do this without a framework, but what I would do is use Backbone.JS and create a model of what an instance would look like. Eg:{CardNumber:'2', CardColor: 'red', CardClass: 'hearts'}. Now I would create a collection to hold all these models, see backbone collections.
So I would store all this data client side, and possibly provide the user with an option to save the game, to persist this data to a database. This stores it as JSON and then when you persist it to the database, you can serialize it to get the individual components.
If you dont want to save to the db and do not want to use a framework. Try stack/queue implementations in Javascript. See:How do you implement a Stack and a Queue in JavaScript?
I hope that answers your question.
Stick to JSON because JSON is just a string representation of plain JS objects, and browsers are very comfortable with it. JS have no good XML handling and that will be too expensive.
Use HTML5 localStorage for keeping data until you really need to sync with the server. Frequent server operations will cause your game to suffer. Use bulk data transfers instead of many small server connections (for example at the start and the end).
Consider using a game library if the canvas graphics are intense. I have used http://jawsjs.com sometime back, but there should be better libs available out there. Selectively render only the dynamic objects, not everything on canvas.
JSON in conjunction with localStorage is a great way to go.
There are libraries available to serialize and deserialize Javascript objects and allow you tp store and retrieve it from localStorage. Simple Github search is a good way to start