Base64 file encoding that is large

Base64 file encoding that is large - javascript

is there any way to encode file that has for example 2GB without "chopping" it for chunks? Because files larger than 2GB throw error that file is too large for fs. And making it smaller chunks dont work either, cause of encoding/decoding problem. Thanks for any help :)

Base64 isn't a good solution for large file transfer.
It's simple, and easy to work with, but will increase your file size. See MDN's article about this. I would recommend looking into best practices for data transfer in JS. MDN has an other article on this that breaks down the DataTransfer API.
Encoded size increase
Each Base64 digit represents exactly 6 bits of data. So, three 8-bits bytes of the input string/binary file (3×8 bits
= 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).
This means that the Base64 version of a string or file will be at
least 133% the size of its source (a ~33% increase). The increase may
be larger if the encoded data is small. For example, the string "a"
with length === 1 gets encoded to "YQ==" with length === 4 — a 300%
increase.
Additionally
Could share what you're trying to do, and add a MRE? There are so many different ways to tackle this problem, it's hard to narrow it down without knowing any of the requirements.

Related

float Array compression in javascript

I saw quite many compression methods for js but in most cases compressed data was in string and it contained text. I need to compress array of less than 10^7 floats in range 0-1.
As precision is not really important eventually i can save it as string containing only numbers 0-9 (containing only 2 first digits after decimal of each float). What method would be best for data like this? I'd like to have smallest possible output but also it should't take more than ~10 sec to compress this string it's about up to 10 000 000 signs when saving 2 digits per float.I saw quite many compression methods for js but in most cases compressed data was in string and it contained text. I need to compress array of less than 10^7 floats in range 0-1.
As precision is not really important eventually i can save it as string containing only numbers 0-9 (containing only 2 first digits after decimal of each float). What method would be best for data like this? I'd like to have smallest possible output but also it should't take more than ~10 sec to decompress this string it's about up to 10 000 000 signs when saving 2 digits per float.
Data contains records of sound waveform for visualization on archaic browsers not supporting Web Audio API. Waveform is recorded at 20 fps on Chrome user client, compressed and stored in server db. Then send back to IE or ff after request to draw visualization - so I need lossy compression - it can be really lossy to achieve size able to be send with song metadata request. I hope compression on wav -> mp3 64k level would be possible (like 200:1 or something) noone will recognise that waveform is not perfect on visualization, I thought maybe about saving theese floats as 0-9a-Z it gives 36 instead of 100 steps but reduces record of one frequency to 1 sign. but what next, what compression to use on this string with 0-Z signs to achieve best compression? would lzma be suitable for string like this? compression / decompression would run on web worker so it doesn't need to be really instant - decompression like 10 sec, compression doesn't matter - rather less than one song so about 2 min

Taking a shot in the dark, if you truly can rely on only the first two digits after the decimal (i.e. there are no 0.00045s in the array), and you need two digits, the easiest thing to do would be multiply by 256 and take the integer part as a byte
encoded = Math.floor(floatValue * 256)
decoded = encoded / 256.0
However, if you know more about your data, you can squeeze more entropy out of your values. This comes out to a 4:1 compression ration.

Compressing Guid based string to write on QR Code

I am developing a phonegap application in html5/javascript. I have a string of around 1000 characters comprising of guids in below format
1=0a0a8907-40b9-4e81-8c4d-d01af26efb78;2=cd4713339;3=Cjdnd;4=19120581-21e5-42b9-b85f-3b8c5b1206d9;5=hdhsfsdfsd;6=30a21580-48f3-40e8-87a3-fa6e39e6412f; ...............
I have to write this particular string into a QR code. Is there any working technique to compress this string and store in QR code. The QR generated by this string is too complex and is not easily read by the QR scanner of mobile phones. Pls suggest the approach to reduce the size of string to around 200-250 character which can be easily read.
Any help is appreciated.

In your question you have the following sample data:
1=0a0a8907-40b9-4e81-8c4d-d01af26efb78;2=cd4713339;3=Cjdnd;
4=19120581-21e5-42b9-b85f-3b8c5b1206d9;5=hdhsfsdfsd;6=30a21
580-48f3-40e8-87a3-fa6e39e6412f; ..............
Where 1, 4 & 6 looks like version 4 UUIDs as described here. I suspect that 2, 3 and 5 might also actually be UUIDs?!
The binary representation of a UUIDs are 128 bits long, and they should be fairly simple to convert to this representation by just reading the hex digits of the UUIDs and convert to binary. This gives 16 bytes per UUID.
However - as the UUID's are version 4, they are based on random data, that in effect counter further compression (appart from the few bits representing the UUID version). So apart from getting rid of the counters (1=, 2=) and the seperater: ;, no further compression seem to be possible.

QR codes encode data using different character sets depending on the range of characters being used. IOW, if you use just ascii digits it will use an encoding that doesn't use 8 bits per digit. See the wikipedia page on QR codes.
Because of the characters in your example, e.g., lower case, you'll be using a binary encoding which is way overkill for your actual information content.
Presuming you have control over the decoder, you could use any compression library to take your ascii data and compress it before encoding, encode/decode the binary result, and then decompress it in the decoder. There are a world of techniques for trying to get the most out of the compression. You can also start with a non-ascii encoding and elminate redudant information like the #= parts.
Couldn't say, though, how much this will buy you.

If you have access to a database already, can you create a table to support this? If so, archive the value and use an ID for QR.
1) Simple schema: ID = bigint with Identity (1000,1) and set as primary key, Value = NVARCHAR(MAX). Yes this is a bit overkill, so modify to taste.
2) Create a function to add your string value to the table and get the ID back as a string for the QR code.
3) Create another function to return the string value when passed a valid ID number.
Stays below the 200 character limit for a very long time.

You don't need the whole guid; that could eliminate all but one record out of 2^128 records (enough to address every bit of digital information on earth many times over).
How many records do you need to eliminate? Probably a lot less than 4 billion right? That's 2^32, so just take the first 1/4 of the guid and there's your 1000 characters to 250.

How to measure the size of base 64 image?

I have an image already encoded in base64, how can I determin how big this object is. I want to tell the user of an application, if this file will take very long to transfer.

If you have encoded to a Base64 string, then the size that you're transferring across the wire is simply the count of the characters. Since Base64 fits within the character space of ASCII, your bytes across the wire are simply:
var bytes = encodedString.length
If size is a major concern, you probably want to consider not Base64 encoding your images. Base64 will translate every 6 bits into an 8 bit character representation. This means that you multiply the original by 4/3 when you Base64 encode.

reading binary file containing integers and chars

I have a small issue. I'm trying to read a binary file with header made of multiple characters and the data part containing multiple 16bit integer. My problem is reading integer. If I read the file in PHP or Java I can see that the higher byte goes first followed by the lower byte. Let say number 300 if reading bytes(8bit integer) will be written like 1 followed by 44.
But I realized using ArrayBuffer in Javascript (HTML5) when I create an unsigned 16 bit integer view of the ArrayBuffer containing my file - Uint16Array reads it the other way around - instead of 300 I get 11,265.
So the first byte is considered lower followed by higher byte.
It is not a big issue since I can always stick to reading 8 bit integers I'm just curious is there any rule how to read or write binary files or is it just a design choice?

Is a js text file's size the amount of characters within it plus one as a rule?

I want to determine how large a file would be based on some text input but without having to save it to file.
From tests it appears a file with 4 characters in it will be 5 bytes.
Does this hold true in general, charcount + 1?
It's a bunch of javascript that I am looking to save.
Many thanks for any advice.

Well it all breaks down when somebody puts in a comment in his native language, using some UTF characters, that have varying size (then one character != one byte). Other than that there are also some differences in the filesystem the file is stored on; usually the smallest unit that can be allocated on a hard disk drive is specified and file sizes will always be a multiple of this number.

No.
An ASCII text file is exactly one byte per character long. But line breaks are also (one or two) characters, that is probably where your extra byte comes from.
For non-ASCII text, every character can take up more than one byte, in UTF-8 encoding usually one to three.
In addition to that, the file may take up some extra space on disk, because depending on the file system being used it may need to be rounded up to a minimum block size, for example 8K.

Develop Reference

JavaScript is the programming language of the Web.