reading binary file containing integers and chars

reading binary file containing integers and chars - javascript

I have a small issue. I'm trying to read a binary file with header made of multiple characters and the data part containing multiple 16bit integer. My problem is reading integer. If I read the file in PHP or Java I can see that the higher byte goes first followed by the lower byte. Let say number 300 if reading bytes(8bit integer) will be written like 1 followed by 44.
But I realized using ArrayBuffer in Javascript (HTML5) when I create an unsigned 16 bit integer view of the ArrayBuffer containing my file - Uint16Array reads it the other way around - instead of 300 I get 11,265.
So the first byte is considered lower followed by higher byte.
It is not a big issue since I can always stick to reading 8 bit integers I'm just curious is there any rule how to read or write binary files or is it just a design choice?

Related

Base64 file encoding that is large

is there any way to encode file that has for example 2GB without "chopping" it for chunks? Because files larger than 2GB throw error that file is too large for fs. And making it smaller chunks dont work either, cause of encoding/decoding problem. Thanks for any help :)

Base64 isn't a good solution for large file transfer.
It's simple, and easy to work with, but will increase your file size. See MDN's article about this. I would recommend looking into best practices for data transfer in JS. MDN has an other article on this that breaks down the DataTransfer API.
Encoded size increase
Each Base64 digit represents exactly 6 bits of data. So, three 8-bits bytes of the input string/binary file (3×8 bits
= 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).
This means that the Base64 version of a string or file will be at
least 133% the size of its source (a ~33% increase). The increase may
be larger if the encoded data is small. For example, the string "a"
with length === 1 gets encoded to "YQ==" with length === 4 — a 300%
increase.
Additionally
Could share what you're trying to do, and add a MRE? There are so many different ways to tackle this problem, it's hard to narrow it down without knowing any of the requirements.

How to determine the size in bytes of a Javascript UTF-16 string

In a Firefox addon I am caching lengthy strings to disk. I would like to be able to give users some idea of how much disk space in bytes these strings are taking up.
I understand that Javascript stores strings as UTF-16. If a UTF-8 string is saved in a variable, it is converted to UTF-16. So UTF-8 methods of determining string size will not do here.
From this reference:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length#Description
It states that the value of string.length is actually the number of UTF-16 code units, and not the number of characters.
From this I infer that the disk space in bytes would simply be string.length * 2. I am looking for confirmation as to whether my assumption is correct.
EDIT:
(Several edits made to the title and original text. Also, the following:)
It was suggested that this is a duplicate of How many bytes in a JavaScript string?. However this does not address my question, as it refers to methods of getting string size of UTF-8 strings, however Javascript converts UTF-8 strings to UTF-16 when it stores them. For example a UTF-8 character that takes up 3 bytes may only use 2 bytes (1 UTF-16 code unit) when converted to UTF-16.

JavaScript misinterpretes number

I perform an AJAX call to generate an ID. This ID is sent back to the client in the response and shown in an input field. I was made aware that the ID displayed in the browser is not the one generated - the last digit differs. On the serverside I serialize data to pass it back to the client using Adobe ColdFusion's own serializeJSON() function. It recognizes the sequence of digits and serializes it as a number. I logged the contents of my variables on different places in my codde, it looked fine all the way. Only the browser does not do what I want/expect.
I boiled it down to this simple sample:
var stru = {"MYID":2761602017000540006};
console.dir(stru);
The console logs 2761602017000540000 instead of 2761602017000540006
Why is that? Is this number too large to be stored in JavaScript?

Is the number too large to be stored in JavaScript?
Yes, the max safe integer is 9,007,199,254,740,991 and the number you're attempting to send is 2,761,602,017,000,540,006 (which is a factor of ~1000x larger).
This is because the JavaScript number type follows the IEEE 754 64-bit floating point number format, which doesn't allow as for as large of numbers as a 64-bit integer normally would. You can see the definition of the number type value here in the ECMAScript spec 4.3.20.
I suggest you send the ID over as a String.

In JavaScript, one has at most 53 bits for integers. so you can not put integers larger that 53bits into javascript variables, so the other way is to use strings for saving this long id . I hope that this help you

As Arash said, your number is too long (more than 53 bits).
You can have more information on this topic: Javascript long integer
The only solution seems to be using string instead of numbers

float Array compression in javascript

I saw quite many compression methods for js but in most cases compressed data was in string and it contained text. I need to compress array of less than 10^7 floats in range 0-1.
As precision is not really important eventually i can save it as string containing only numbers 0-9 (containing only 2 first digits after decimal of each float). What method would be best for data like this? I'd like to have smallest possible output but also it should't take more than ~10 sec to compress this string it's about up to 10 000 000 signs when saving 2 digits per float.I saw quite many compression methods for js but in most cases compressed data was in string and it contained text. I need to compress array of less than 10^7 floats in range 0-1.
As precision is not really important eventually i can save it as string containing only numbers 0-9 (containing only 2 first digits after decimal of each float). What method would be best for data like this? I'd like to have smallest possible output but also it should't take more than ~10 sec to decompress this string it's about up to 10 000 000 signs when saving 2 digits per float.
Data contains records of sound waveform for visualization on archaic browsers not supporting Web Audio API. Waveform is recorded at 20 fps on Chrome user client, compressed and stored in server db. Then send back to IE or ff after request to draw visualization - so I need lossy compression - it can be really lossy to achieve size able to be send with song metadata request. I hope compression on wav -> mp3 64k level would be possible (like 200:1 or something) noone will recognise that waveform is not perfect on visualization, I thought maybe about saving theese floats as 0-9a-Z it gives 36 instead of 100 steps but reduces record of one frequency to 1 sign. but what next, what compression to use on this string with 0-Z signs to achieve best compression? would lzma be suitable for string like this? compression / decompression would run on web worker so it doesn't need to be really instant - decompression like 10 sec, compression doesn't matter - rather less than one song so about 2 min

Taking a shot in the dark, if you truly can rely on only the first two digits after the decimal (i.e. there are no 0.00045s in the array), and you need two digits, the easiest thing to do would be multiply by 256 and take the integer part as a byte
encoded = Math.floor(floatValue * 256)
decoded = encoded / 256.0
However, if you know more about your data, you can squeeze more entropy out of your values. This comes out to a 4:1 compression ration.

Compressing Guid based string to write on QR Code

I am developing a phonegap application in html5/javascript. I have a string of around 1000 characters comprising of guids in below format
1=0a0a8907-40b9-4e81-8c4d-d01af26efb78;2=cd4713339;3=Cjdnd;4=19120581-21e5-42b9-b85f-3b8c5b1206d9;5=hdhsfsdfsd;6=30a21580-48f3-40e8-87a3-fa6e39e6412f; ...............
I have to write this particular string into a QR code. Is there any working technique to compress this string and store in QR code. The QR generated by this string is too complex and is not easily read by the QR scanner of mobile phones. Pls suggest the approach to reduce the size of string to around 200-250 character which can be easily read.
Any help is appreciated.

In your question you have the following sample data:
1=0a0a8907-40b9-4e81-8c4d-d01af26efb78;2=cd4713339;3=Cjdnd;
4=19120581-21e5-42b9-b85f-3b8c5b1206d9;5=hdhsfsdfsd;6=30a21
580-48f3-40e8-87a3-fa6e39e6412f; ..............
Where 1, 4 & 6 looks like version 4 UUIDs as described here. I suspect that 2, 3 and 5 might also actually be UUIDs?!
The binary representation of a UUIDs are 128 bits long, and they should be fairly simple to convert to this representation by just reading the hex digits of the UUIDs and convert to binary. This gives 16 bytes per UUID.
However - as the UUID's are version 4, they are based on random data, that in effect counter further compression (appart from the few bits representing the UUID version). So apart from getting rid of the counters (1=, 2=) and the seperater: ;, no further compression seem to be possible.

QR codes encode data using different character sets depending on the range of characters being used. IOW, if you use just ascii digits it will use an encoding that doesn't use 8 bits per digit. See the wikipedia page on QR codes.
Because of the characters in your example, e.g., lower case, you'll be using a binary encoding which is way overkill for your actual information content.
Presuming you have control over the decoder, you could use any compression library to take your ascii data and compress it before encoding, encode/decode the binary result, and then decompress it in the decoder. There are a world of techniques for trying to get the most out of the compression. You can also start with a non-ascii encoding and elminate redudant information like the #= parts.
Couldn't say, though, how much this will buy you.

If you have access to a database already, can you create a table to support this? If so, archive the value and use an ID for QR.
1) Simple schema: ID = bigint with Identity (1000,1) and set as primary key, Value = NVARCHAR(MAX). Yes this is a bit overkill, so modify to taste.
2) Create a function to add your string value to the table and get the ID back as a string for the QR code.
3) Create another function to return the string value when passed a valid ID number.
Stays below the 200 character limit for a very long time.

You don't need the whole guid; that could eliminate all but one record out of 2^128 records (enough to address every bit of digital information on earth many times over).
How many records do you need to eliminate? Probably a lot less than 4 billion right? That's 2^32, so just take the first 1/4 of the guid and there's your 1000 characters to 250.

Develop Reference

JavaScript is the programming language of the Web.