Choosing an appropriate compression scheme for data transfer over JSON

Choosing an appropriate compression scheme for data transfer over JSON - javascript

After some comments by David, I've decided to revise my question. The original question can be found below as well as the newly revised question. I'm leaving the original question simply to have a history as to why this question was started.
Original Question (Setting LZMA properties for jslzma)
I've got some large json files I need to transfer with ajax. I'm currently using jQuery and $.getJSON(). I'd like to use the jslzma library to decompress the files upon receiving them. Currently, I'm using django with the pylzma library to compress the files.
The only problem is that there's a lack of documentation for the jslzma library. There is some, but not enough. So I have two questions about how to use the library.
It gives this as an example:
LZMA.decompress(properties, inStream, outStream, outSize);
I know how to set the inStream and outStream variables, but not the properties or the outSize. So can anyone give an example(s) on how to set the properties variable (ie. what's expected) and how to calculate the outSize...
Thanks.
Edit #1 (Revised Question)
I'm looking for a compression scheme that lends itself to highly repeatable data using python (django) and javascript.
The data being transferred contains elevation measurements. Each file has 1200x1200 data points, which equates to about 2.75MB in it's raw binary form uncompressed. JSON balloons it to between 5-6MB. I've also looked into base64 (just to cover all the bases), which would reduce the size but I haven't had any success reading it in js. I think the data lends itself to easy compression just because of the highly repeatable data values. For example, one file only has 83 unique elevation values to describe 1440000 data points.
I just haven't had much luck, mainly because I'm just starting to learn JavaScript.
So can anyone suggest a compression scheme for this type of data? The goal is to minimize the transfer time by reducing the size for the data.
Thanks.

For what it's worth LZMA is typically very slow to compress as well as decompress; and thus it is more common to use bit faster compression schemes. Standard GZIP (deflate) has reasonably good balance: its compression ratio is acceptable, and its compression speed is MUCH better than that of LZMA or bzip2.
Also: most web servers and clients support automatic handling of gzip compression, which makes it even more convenient to use.

Decompression on the client side with Javacscript can take a significant longer time and highly depends on the available bandwidth of the client's box. Why not just implement a lesser but faster and easier to write decompression like rle, delta or golomb code? Or maybe you want to look into compressed Jsons?

Related

Reading JPEG file to retrieve orientation information

I've been researching ways to retrieve orientation information from a JPEG file in pure JavaScript.
An excellent way to get this information is outlined in this SO answer. Essentially one reads the entire file using readAsArrayBuffer and then processes it for the required information.
However, is it really necessary to read the whole file to retrieve EXIF information? Is there an optimization whereby one can read a subset of bytes when doing this?
For instance, this SO answer seems to suggest the first 20 bytes are good enough for the job. However, the former answer's writer himself asserts that he removed the slice statement because sometimes the tag came in after the limit (he had originally set it to 64KB, i.e. reader.readAsArrayBuffer(file.slice(0, 64 * 1024));)
So what's a rule of thumb one can use when programming this sort of a thing? Or does one not exist at all? I want to write code where performance doesn't get heavily affected by the size (in bytes) of file uploaded by a user. That is my goal.
Note: I've tried Googling this information as well, however haven't found anything meaningful.

Till a more seasoned expert chimes in, I've settled for reader.readAsArrayBuffer(file.slice(0, 128 * 1024));.

Is it worth it to compress medium sized javascripts array before sending to the client trough socket?

I'm just wondering if it's worth it, I'm using nodejs with socket.io and I need to send medium sized arrays to clients which contains small strings and numbers.
Would it be worth it to zip them or something or would the time to compress them would defeat it's own purpose to be faster ? The array I'm trying to compress are less that 1 mb.
As of now I see no latency but who knows, someone might have slow internet or old devices.

It depends entirely upon how large the arrays are and how much they would benefit from compression - neither of which you have disclosed.
For example, if they were 50k and could be compressed to 40k, that difference would be unlikely to be perceived.
If they were 1MB and could be compressed to 300k, that difference could be meaningful.
You will need to measure how large they typically are and then, if those are in a range where it might make a meaningful difference to compress them, then do some tests on how much they compress.
FYI, you can also look at how exactly the data is being sent over the wire because socket.io's default of JSON is not always the most compact way to format things either. For example, sending a large array of objects is going to repeat property names over and over in the JSON which might benefit a lot from compression, but might benefit even more from using a custom data format that's more compact.

What is the most efficient way in JavaScript to parse huge amounts of data from a file

What is the most efficient way in JavaScript to parse huge amounts of data from a file?
Currently I use JSON parse to serialize an uncompressed 250MB file, which is really slow. Is there a simple and fast way to read a lot of data in JavaScript from a file without looping through every character? The data stored in the file are only a few floating point arrays?
UPDATE:
The file contains a 3d mesh, 6 buffers (vert, uv etc). Also the buffers need to be presented as typed arrays. streaming is not a option because the file has to be fully loaded before the graphics engine can continue. Maybe a better question is how to transfer huge typed arrays from a file to javascript in the most efficient way.

I would recommend a SAX based parser for these kind of JavaScript or a stream parser.
DOM parsing would load the whole thing in memory and this is not the way to go by for large files like you mentioned.
For Javascript based SAX Parsing (in XML) you might refer to
https://code.google.com/p/jssaxparser/
and
for JSON you might write your own, the following link demonstrates how to write a basic SAX based parser in Javascript
http://ajaxian.com/archives/javascript-sax-based-parser

Have you tried encoding it to a binary and transferring it as a blob?
https://developer.mozilla.org/en-US/docs/DOM/XMLHttpRequest/Sending_and_Receiving_Binary_Data
http://www.htmlgoodies.com/html5/tutorials/working-with-binary-files-using-the-javascript-filereader-.html#fbid=LLhCrL0KEb6

There isn't a really good way of doing that, because the whole file is going to be loaded into memory and we all know that all of them have big memory leaks. Can you not instead add some paging for viewing the contents of that file?
Check if there are any plugins that allow you to read the file as a stream, that will improve this greatly.
UPDATE
http://www.html5rocks.com/en/tutorials/file/dndfiles/
You might want to read about the new HTML5 API's to read local files. You will have the issue with downloading 250mb of data still tho.

I can think of 1 solution and 1 hack
SOLUTION:
Extending the split the data in chunks: it boils down to http protocol. REST parts on the notion that http has enough "language" for most client-server scenarios.
You can setup on the client a request header Content-len to establish how much data you need per request
Then on the backend have some options http://httpstatus.es
Reply a 413 if the server is simply unable to get that much data from the db
417 if the server is able to reply but not under the requested header (Content-len)
206 with the provided chunk, letting know the client "there is more from where that came from"
HACK:
Use Websocket and get the binary file. Then use the html5 FileAPI to load it into memory.
This is likely to fail though because its not the download causing the problem, but the parsing of an almost-endless JS object

You're out of luck on the browser. Not only do you have to download the file, but you'll have to parse the json regardless. Parse it on the server, break it into smaller chunks, store that data into the db, and query for what you need.

Json compression for transfer

I was wondering what the current state of javascript based json compression is. Are there any libraries currently available that allow compressing json, either by replacing long names with single characters, or some other method?

Someone has implemented HPack in Javascript, which could really improve JSON data set sizes, assuming your data set is homogeneous.
Since your emphasis is on transfer, rather than storage, don't forget to use things like gzip and to minimise your JSON. Those should be the first steps before adding yet more compression overhead.

C.S. Basics: Understanding Data Packets, Protocols, Wireshark

The Quest
I'm trying to talk to a SRCDS Server from node.js via the RCON Protocol.
The RCON Protocol seems to be explained enough, implementations can be found on the bottom of the site in every major programming language. Using those is simple enough, but understanding the protocol and develop a JS library is what I set out to do.
Background
Being a self taught programmer, I skipped a lot of Computer Science Basics - learned only what I needed, to accomplish what I wanted. I started coding with PHP, eventually wrapped my head around OO, talked to databases etc. I'm currently programming with JavaScript, more specifically doing web stuff with node.js ..
Binary Data?!?!
I've read and understood the absolute binary basics. But when it comes to the packet data I'm totally lost. I'd like to read and understand the wireshark output, but I can't make any sense if it. My biggest problem is probably that I don't understand what the binary representation of the various INT and STRING (char ..) from JS look like and how I convert from data I got from the server to something usable in the program.
Help
So I'd be more than grateful if someone can point me to a tutorial on these topics. Tutorial as in "explanation that mere mortals can understand, preferably not written by a C.S. professor". :)
When I'm looking at the PHP reference implementation I see (too much) magic happening there which I can't translate to JS. Sending and reading data from a socket is no problem, but I need to know how PHPs unpack function works respectively how I can do that in JS with node.js.
So I hope you can see what I'm trying to accomplish here. First and foremost is understanding the whole theory needed to make implementing the protocol a breeze. But because I'm only good with scripting languages it would be incredibly helpful if someone could guide me a bit in the HOWTO part in PHP/JS..
Thank you so much for your time!

I applaud the low level protocol pursuit.
I'll tell you the path I took. My approach was to use the client and server that already spoke the protocol and use libpcap to do analysis. I created a library that was able to unpack the custom protocol I was analyzing during this phase.
Its super helpful to start with diagrams like this one:
From the wiki on TCP. Its an incredibly useful way to visualize the structure of the binary data. Its tightly packed, so slicing it apart requires attention to detail.
Buffers and Binary
I read up on Buffer. Its the way you deal with Binary in node. http://nodejs.org/docs/v0.4.8/api/buffers.html -- the first thing to realize here is that buffers can be accessed bit by bit via array syntax, ie buffer[0] and such.
Visualization
Its helpful to be able to dump your binary data into a hex representation. I used https://github.com/a2800276/hexy.js to achieve this.
node_pcap
I grabbed https://github.com/mranney/node_pcap -- this is the equivalent to wireshark, but you can programmatically poke at all outgoing and incoming traffic. I added udp payload support: https://github.com/jmoyers/node_pcap/commit/2852a8123486339aa495ede524427f6e5302326d
I read through all mranney's "unpack" code https://github.com/mranney/node_pcap/blob/master/pcap.js#L116-171
I found https://github.com/rmustacc/node-ctype
I read through all their "unpack" code https://github.com/rmustacc/node-ctype/blob/master/ctio.js
Now, things to remember when you're looking through this stuff. Most of the time they're taking a binary Buffer representation and converting to a native javascript type, like say Number or String. They'll use advanced techniques to do so -- bitwise operations like shifts and such. You don't necessarily need to understand all that.
The key things are:
1) endianness -- the ordering of bits (network and host byte order can be reverse from each other) as this pertains to how things are unpacked
2) Javascript Number representation is quirky -- node-ctype goes into detail in the comments about how they convert the various number types in javascript's Number. Integer, float, double etc are all Number in javascript land.
In the end, its likely fine if you just USE these unpackers for your adventures. I ended up having to unpack things that weren't covered in these libraries, like GUIDs and such, and it was tremendously helpful to study the source.
Isolate the traffic you're looking at
Filter, filter, filter. Target one host. Target one direction. Target one message type. Focus on stripping off data that has a known fixed length first -- often times the header in a protocol is a good place to start. Once you get the header unpacking into a nice json structure from binary, you are well on your way.
After that, its one field at a time, top to bottom, one message at a time. You can use Buffer#slice and the unpack functions from node-ctype to grab each piece of data at a time.

Develop Reference

JavaScript is the programming language of the Web.