I'm working on a JavaScript compression library. I've already created a relatively fast DEFLATE compressor and decompressor, but they require the data to be fully loaded in memory prior to use. I don't think adding streaming support for the compression should be too difficult; I can just compress whatever data is available into a set of full blocks, append the result to the output stream, and avoid setting the BFINAL marker until the final chunk is passed. However, an issue arises with decompression streams.
Since my code currently needs to read a full chunk to generate an output and does not preserve state, the best I can do is guess how long a chunk is and hope that I reach the end, because if I don't, I've wasted multiple CPU cycles reading the headers, Huffman codes, and length/literal and distance codes of the unfinished block, and I'll have to do that all over again. This seems to be an awful way of solving the problem, and I'm wondering if there's any other way to do this that doesn't involve rewriting my code to preserve state.
Current decompression code
Nope, you'll need to preserve state. Or be prepared to read the entire stream into memory. There is nothing that prevents a deflate stream from consisting of a single long deflate block.
Related
I'm writing a client-side upload script that pre-chunks the input data before sending.
I was just wondering if this could potentially lead to a huge memory spike, calling the slice method on the potentially very big File (e.g., slicing a 2GB file into 1000x2MB chunks) before starting the upload, as opposed to slicing it just before each chunk upload starts.
I figured the browser is not actually handling the data with the slice method, it's just returning a new instance of Blob with a different start and end range, but I just want to be sure.
I know it's easily testable but I'm not sure if I can trust a test result on Chrome would apply to all browsers.
What is the most efficient way in JavaScript to parse huge amounts of data from a file?
Currently I use JSON parse to serialize an uncompressed 250MB file, which is really slow. Is there a simple and fast way to read a lot of data in JavaScript from a file without looping through every character? The data stored in the file are only a few floating point arrays?
UPDATE:
The file contains a 3d mesh, 6 buffers (vert, uv etc). Also the buffers need to be presented as typed arrays. streaming is not a option because the file has to be fully loaded before the graphics engine can continue. Maybe a better question is how to transfer huge typed arrays from a file to javascript in the most efficient way.
I would recommend a SAX based parser for these kind of JavaScript or a stream parser.
DOM parsing would load the whole thing in memory and this is not the way to go by for large files like you mentioned.
For Javascript based SAX Parsing (in XML) you might refer to
https://code.google.com/p/jssaxparser/
and
for JSON you might write your own, the following link demonstrates how to write a basic SAX based parser in Javascript
http://ajaxian.com/archives/javascript-sax-based-parser
Have you tried encoding it to a binary and transferring it as a blob?
https://developer.mozilla.org/en-US/docs/DOM/XMLHttpRequest/Sending_and_Receiving_Binary_Data
http://www.htmlgoodies.com/html5/tutorials/working-with-binary-files-using-the-javascript-filereader-.html#fbid=LLhCrL0KEb6
There isn't a really good way of doing that, because the whole file is going to be loaded into memory and we all know that all of them have big memory leaks. Can you not instead add some paging for viewing the contents of that file?
Check if there are any plugins that allow you to read the file as a stream, that will improve this greatly.
UPDATE
http://www.html5rocks.com/en/tutorials/file/dndfiles/
You might want to read about the new HTML5 API's to read local files. You will have the issue with downloading 250mb of data still tho.
I can think of 1 solution and 1 hack
SOLUTION:
Extending the split the data in chunks: it boils down to http protocol. REST parts on the notion that http has enough "language" for most client-server scenarios.
You can setup on the client a request header Content-len to establish how much data you need per request
Then on the backend have some options http://httpstatus.es
Reply a 413 if the server is simply unable to get that much data from the db
417 if the server is able to reply but not under the requested header (Content-len)
206 with the provided chunk, letting know the client "there is more from where that came from"
HACK:
Use Websocket and get the binary file. Then use the html5 FileAPI to load it into memory.
This is likely to fail though because its not the download causing the problem, but the parsing of an almost-endless JS object
You're out of luck on the browser. Not only do you have to download the file, but you'll have to parse the json regardless. Parse it on the server, break it into smaller chunks, store that data into the db, and query for what you need.
For small size image what's (if any) the benefit in loading time using base64 encoded image in a javascript file (or in a plain HTML file)?
$(document).ready(function(){
var imgsrc = "../images/icon.png";
var img64 = "P/iaVYUy94mcZxqpf9cfCwtPdXVmBfD49NHxwMraWV/iJErLmNwAGT3//w3NB";
$('img.icon').attr('src', imgsrc); // Type 1
$('img.icon').attr('src', 'data:image/png;base64,' + img64); // Type 2 base64
});
The benefit is that you have to make one less HTTP request, since the image is "included" in a file you have made a request for anyway. Quantifying that depends on a whole lot of parameters such as caching, image size, network speed, and latency, so the only way is to measure (and the actual measurement would certainly not apply to everyone everywhere).
I should mention that another common approach to minimizing the number of HTTP requests is by using CSS sprites to put many images into one file. This would arguably be an even more efficient approach, since it also results in less bytes being transferred over (base64 bloats the byte size by a factor of about 1.33).
Of course, you do end up paying a price for this: decreased convenience of modifying your graphics assets.
You need to make multiple server requests, lets say you download a contrived bit of HTML such as:
<img src="bar.jpg" />
You already needed to make a request to get that. A TCP/IP socket was created, negotiated, downloaded that HTML, and closed. This happens for every file you download.
So off your browser goes to create a new connection and download that jpg, P/iaVYUy94mcZxqpf9cfCwtPdXVmBfD49NHxwMraWV/iJErLmNwAGT3//w3NB
The time to transfer that tiny bit of text was massive, not because of the file download, but simply because of the negotiation to get to the download part.
That's a lot of work for one image, so you can in-line the image with base64 encoding. This doesn't work with legacy browsers mind you, only modern ones.
The same idea behind base64 inline data is why we've done things like closure compiler (optimizes speed of download against execution time), and CSS Spirtes (get as much data from one request as we can, without being too slow).
There's other uses for base64 inline data, but your question was about performance.
Be careful not to think that the HTTP overhead is so massive and you should only make one request-- that's just silly. You don't want to go overboard and inline all the things, just really trivial bits. It's not something you should be using in a lot of places. Seperation of concerns is good, don't start abusing this because you think your pages will be faster (they'll actually be slower because the download for a single file is massive, and your page won't start pre-rendering till it's done).
It saves you a request to the server.
When you reference an image through the src-property, it'll load the page, and then do the additional request to fetch the image.
When you use the base64 encoded image, it'll save you that delay.
After some comments by David, I've decided to revise my question. The original question can be found below as well as the newly revised question. I'm leaving the original question simply to have a history as to why this question was started.
Original Question (Setting LZMA properties for jslzma)
I've got some large json files I need to transfer with ajax. I'm currently using jQuery and $.getJSON(). I'd like to use the jslzma library to decompress the files upon receiving them. Currently, I'm using django with the pylzma library to compress the files.
The only problem is that there's a lack of documentation for the jslzma library. There is some, but not enough. So I have two questions about how to use the library.
It gives this as an example:
LZMA.decompress(properties, inStream, outStream, outSize);
I know how to set the inStream and outStream variables, but not the properties or the outSize. So can anyone give an example(s) on how to set the properties variable (ie. what's expected) and how to calculate the outSize...
Thanks.
Edit #1 (Revised Question)
I'm looking for a compression scheme that lends itself to highly repeatable data using python (django) and javascript.
The data being transferred contains elevation measurements. Each file has 1200x1200 data points, which equates to about 2.75MB in it's raw binary form uncompressed. JSON balloons it to between 5-6MB. I've also looked into base64 (just to cover all the bases), which would reduce the size but I haven't had any success reading it in js. I think the data lends itself to easy compression just because of the highly repeatable data values. For example, one file only has 83 unique elevation values to describe 1440000 data points.
I just haven't had much luck, mainly because I'm just starting to learn JavaScript.
So can anyone suggest a compression scheme for this type of data? The goal is to minimize the transfer time by reducing the size for the data.
Thanks.
For what it's worth LZMA is typically very slow to compress as well as decompress; and thus it is more common to use bit faster compression schemes. Standard GZIP (deflate) has reasonably good balance: its compression ratio is acceptable, and its compression speed is MUCH better than that of LZMA or bzip2.
Also: most web servers and clients support automatic handling of gzip compression, which makes it even more convenient to use.
Decompression on the client side with Javacscript can take a significant longer time and highly depends on the available bandwidth of the client's box. Why not just implement a lesser but faster and easier to write decompression like rle, delta or golomb code? Or maybe you want to look into compressed Jsons?
I was wondering, do whitespaces and comments slow down JavaScript? I'm doing a brute force attack which takes some time (30 seconds). Removing whitespaces does not show a significant growth in speed, but I think the browser just does have to parse more.
So, is it of any use to remove unnecessary whitespaces and comments to speed the whole up?
People usually use minimizers to reduce the SIZE of the script, to improve download speed, rather than to make any difference in speed of parsing the script.
Whitespace and comments will have little effect in how long it takes a browser to execute, as the parser needs to check if it is whitespace, or a comment, but in reality this will be so minute with current computing power, it would be impossible to notice any impact.
SIZE however is still important even with the large bandwidth available in our broadband world.
Whitespaces and comments increase the size of the JavaScript file, which slows down the actual downloading of the file from the server - minification is the process of stripping unnecessary characters from a JavaScript file to make it smaller and easier to download.
However, since you mention a brute force attack, the bottleneck is probably not the download. Try using a profiler to find what slows you down.
There is always a point in minifying, combining and gzipping your assets, to ease server load.
Minifying is the act you refer to, of stripping away unnecessary whitespace and comments, to make the download speed smaller.
Combining will most likely show an even greater increase in page rendering speed; it is the act of merging all your javascript files into one, and all your css files into one (it can also be done for most images, but that taks requires some more work). This is done to reduce the amount of requests the browser has to make towards your server, to be able to display the page.
GZipping is the act of further compressing the data, in a zipped format, to the browsers that indicate that they'll accept such data. This further reduces size, but adds some extra work load at both ends. You're likely to see a net gain from it.
Depending on what environment you're working in, there are different components that'll help you with this, that usually covers all of the above in one go.
The time your code takes to download from the server has a direct effect on how long the page takes to render. JavaScript is blocking, meaning that a JS block will prevent any furhter rendering, until the block has executed entirely. As such, where you put your javascript files (i.e. in which point in the rendering process they'll be requested), how many requests it takes for it to be completely downloaded, and how much data there is to download, will have an impact on your page load, as it appears to the user.
Once the browser has parsed your code, be it javascript, css or html, it'll have created internal representations of the part it needs to keep remembering, and the actual formatting will no longer affect it.
I don't think whitespace in js-code slows down the execution of it. As far as I understand a javascript interpreter strips all comments and redundant whitespace before processing. It can influence download time en thus loading time of a web page however.
Take a look here for a bit of extra information.
It has little to no impact on actual processing speed, however...
Smaller size => less bandwith => less costs => ??? => profit!