I have a class project where we're converting a bmp file to a buffer, manipulating the buffer, and writing it to a new file.
https://en.wikipedia.org/wiki/BMP_file_format
The first two bytes return the characters "BM" when converted to utf-8 using
buffer.toString('utf-8', 0, 2);
And sure enough the octets "42 4d" return "BM" when converted.
In class we were told that:
buffer.readUInt32LE(2) would return the size of the file (according to the file header).
In this case it's 4 octets:
"66 75 00 00"
Which using the readUInt32LE returns the number "30054" and sure enough when I checked the number of bytes, this was the case.
What I'm trying to figure out is the series of operations I need to make in order to for "66 75 00 00" to become "30054"
Any clarification would be helpful!
EDIT: I needed to calculated the hex values in the other direction! "75 66" and not "66 75". Thanks Guido :)
Related
I am trying to understand why when I use the readFileSync method and use different encodings (for example utf-8, hex, ascii) I have the same output on the console and when I dont pass any specific encoding I receive the output in utf-8 format.
I mean, shouldn’t I receive the info of the file on the file format (in this case .sol) If I dont specify any encoding and receive the info on utf-8 format If I specify utf-8 format?
I think there is something I am not understanding about how enconding works.
const path = require('path');
const fs = require('fs');
const solc = require('solc')
const inboxPath = path.resolve(__dirname, 'contracts', 'Inbox.sol');
const source = fs.readFileSync(inboxPath, 'utf8');
console.log(solc.compile(source, 1));
When you call it like this:
const source = fs.readFileSync(inboxPath, 'utf8');
You are passing the utf8 encoding to the function. It will read the data from the file and apply the encoding to the data to convert it to a string.
If you call it like this with no encoding:
console.log(solc.compile(source, 1));
It will give you the raw binary data in a Buffer object and you will get something that (if you console.log(source)) will look like this:
<Buffer 54 65 73 74 69 6e 67 20 4e 6f 64 65 2e 6a 73 20 72 65 61 64 46 69 6c 65 28 29>
That display of the Buffer data is showing one hexadecimal value for each 8-bits of the binary data (for our viewing convenience). The 54 at the start of the buffer corresponds to the letter T so if you converted that buffer to a string with either the utf8 or ascii encoding, you would get a string that starts with a T.
If your data is all made up of characters with character codes less than 128, then interpreting it with utf8 and ascii encoding give identical results. This is because for characters with codes less than 128, utf8 just uses the code for the character directly. It's only when character codes get higher than 128 that utf8 starts using more than one byte for the character (in fact, it can use 1-4 bytes depending upon the actual code). There are 1,112,064 code points in unicode. Since you can only represent 256 unique values in a single byte, it obviously takes more than one byte to represent all 1,112,064 code points in unicode. When using utf8, that is a variable length encoding that uses one byte per character for anything with a code less than 128 and once you get above 128, it starts using more than one byte for those characters.
Your function call here:
console.log(solc.compile(source, 1));
is apparently expecting a string in the source argument so you HAVE to give it a string. If you don't pass an encoding like this:
const source = fs.readFileSync(inboxPath, 'utf8');
Then, source is a Buffer object (not a string) and the solc.compile(source, 1) function does not like that and gives you an error. You apparently need to pass that function a string. So, the code you show in your question:
const inboxPath = path.resolve(__dirname, 'contracts', 'Inbox.sol');
const source = fs.readFileSync(inboxPath, 'utf8');
console.log(solc.compile(source, 1));
is properly getting a string from fs.readFileSync() and then passing that string to solc.compile().
I have a img field whose value is base64 encoded. I checked the value in a decoder and it is bmp. when i put the value like it is not showing. I saw an example where the encoded string is trimmed with (base64 encoded string).substr(104). Image start rendering. I did not get a proper reason what is the reason behind trimming it.. Please suggest me with the exact reason. Thanks.
If it works after trimming first 104 characters, then if 1 character is 6 bits of information, then 104 characters == 624 bits == 78 bytes. If it works after trimming, then those first 78 bytes are redundant, they are probably some information, ex. a header.
It would be a lot harder if you'd have to remove number of bytes non-evenly divisible by 3. As 78 is evenly divisible by 3, it corresponds exactly to 104 characters.
I am working with Northwind service in SAP Web IDE. Images in this service is stored in base64 String format:FRwvAAIAAAAN.....
I found out that I can't use these images in my app directly, using given base64 String value, because Northwind DB is old and made in MS Access and there are 78 redundant bytes which represent OLE header. So I would like to remove these 78 bytes from base64 String.
Can you please help me, using JavaScript language (I am new in this language). I hope for you experts. Here is what I have done:
I created function:
photo : function (value) {
var str = "";
for (var p in value) {
if (value.hasOwnProperty(p)) {
str += value[p];
}
}
..........
With this function I am taking base64 Sting as import parameter. I converted that import parameter from object to string.
So what should I do next? Create Array or something else? How can I remove 78 BYTES from String?
In base64 each character contains six bits of information, so four characters contains 24 bits of information, which is three bytes.
You are in luck. As 78 happens to be evenly divisble by three, the first 78 bytes corresponds exactly to the first 104 characters (78 bytes = 624 bits = 104 characters).
So, to remove the first 78 bytes of a base64 string, you remove the first 104 characters:
s = s.substr(104);
(If you hadn't been so lucky, you would have had to decode the entire string into bytes, remove the first 78 bytes, the encode the bytes into a string again.)
I have a string exactly 53 characters long that contains a limited set of possible characters.
[A-Za-z0-9\.\-~_+]{53}
I need to reduce this to length 50 without loss of information and using the same set of characters.
I think it should be possible to compress most strings down to 50 length, but is it possible for all possible length 53 strings? We know that in the worst case 14 characters from the possible set will be unused. Can we use this information at all?
Thanks for reading.
If, as you stated, your output strings have to use the same set of characters as the input string, and if you don't know anything special about the requirements of the input string, then no, it's not possible to compress every possible 53-character string down to 50 characters. This is a simple application of the pigeonhole principle.
Your input strings can be represented as a 53-digit number in base 67, i.e., an integer from 0 to 6753 - 1 ≅ 6*1096.
You want to map those numbers to an integer from 0 to 6750 - 1 ≅ 2*1091.
So by the pigeonhole principle, you're guaranteed that 673 = 300,763 different inputs will map to each possible output -- which means that, when you go to decompress, you have no way to know which of those 300,763 originals you're supposed to map back to.
To make this work, you have to change your requirements. You could use a larger set of characters to encode the output (you could get it down to 50 characters if each one had 87 possible values, instead of the 67 in the input). Or you could identify redundancy in the input -- perhaps the first character can only be a '3' or a '5', the nineteenth and twentieth are a state abbreviation that can only have 62 different possible values, that sort of thing.
If you can't do either of those things, you'll have to use a compression algorithm, like Huffman coding, and accept the fact that some strings will be compressible (and get shorter) and others will not (and will get longer).
What you ask is not possible in the most general case, which can be proven very simply.
Say it was possible to encode an arbitrary 53 character string to 50 chars in the same set. Do that, then add three random characters to the encoded string. Then you have another arbitrary, 53 character string. How do you compress that?
So what you want can not be guaranteed to work for any possible data. However, it is possible that all your real data has low enough entropy that you can devise a scheme that will work.
In that case, you will probably want to do some variant of Huffman coding, which basically allocates variable-bit-length encodings for the characters in your set, using the shortest encodings for the most commonly used characters. You can analyze all your data to come up with a set of encodings. After Huffman coding, your string will be a (hopefully shorter) bitstream, which you encode to your character set at 6 bits per character. It may be short enough for all your real data.
A library-based encoding like Smaz (referenced in another answer) may work as well. Again, it is impossible to guarantee that it will work for all possible data.
One byte (character) can encode 256 values (0-255) but your set of valid characters uses only 67 values, which can be represented in 7 bits (alas, 6 bits gets you only 64) and none of your characters uses the high bit of the byte.
Given that, you can throw away the high bit and store only 7 bits, running the initial bits of the next character into the "spare" space of the first character. This would require only 47 bytes of space to store. (53 x 7 = 371 bits, 371 / 8 = 46.4 == 47)
This is not really considered compression, but rather a change in encoding.
For example "ABC" is 0x41 0x42 0x43
0x41 0x42 0x43 // hex values
0100 0001 0100 0010 0100 0011 // binary
100 0001 100 0010 100 0011 // drop high bit
// run it all together
100000110000101000011
// split as 8 bits (and pad to 8)
10000011 00001010 00011[000]
0x83 0x0A 0x18
As an example these 3 characters won't save any space, but your 53 characters will always come out as 47, guaranteed.
Note, however, that the output will not be in your original character set, if that is important to you.
The process becomes:
original-text --> encode --> store output-text (in database?)
retrieve --> decode --> original-text restored
If I remember correctly Huffman coding is going to be the most compact way to store the data. It has been too long since I used it to write the algorithm quickly, but the general idea is covered here, but if I remember correctly what you do is:
get the count for each character that is used
prioritize them based on how frequently they occurred
build a tree based off the prioritization
get the compressed bit representation of each character by traversing the tree (start at the root, left = 0 right = 1)
replace each character with the bits from the tree
Smaz is a simple compression library suitable for compressing very short strings.
I'm working with an api that sends data in a series of base64 strings that I'm converting into an array of bytes. I'm been able to parse the time values sent in the data (year, day, hour etc. The api lists their datatype as unsigned char). I'm using parseInt(..., 2) in javascript.
The difficulty I'm having is converting signed int32 and unsigned int16 into their decimal values. For example, these are the bit stings for voltage and power:
Voltage (unsigned int16 ) 01101010 00001001 - Should be around 120.0
Power (signed int32) 10101010 00010110 00000000 00000000 - Should be 0-10 kWh
Does anyone know how I can convert these values? Also, I wrote a simple function to convert base64 to an array of bytes that I'm pretty sure is correct, but the above values don't make any sense maybe it isn't. If that's the case, does anyone know of a plugin that converts base64 to binary.
Thanks,
Tristan
I can't see how 0110101000001001 converts into 120... It's either 27415 or 2410 depending on endianness
Your voltage as an unsigned int is 27145, is that what you're getting from your conversion, because it is the correct value. Your power is -1441398784 as a signed int.