File API - HEX conversion - Javascript - javascript

I am trying to read a local text file with the help of the File API, and convert it to an HEX file using a similar function to "bin2hex()" (using CharCodeAt() function), and then finally process the HEX numbers to obtain my results. All this in Javascript.
To convert my file to an HEX array, I scan each character of the file via a for loop file and then use the bin2hex() function to obtain the HEX value. I would expect a result between 0x00 and 0xFF corresponding to whatever character I am trying to convert. But It seems that sometimes I am obtaining 0xfffd or 0x00 for no apparent reasons. Is there a limitations in terms of which characters you can process through the charcodeat() function or read with the File API? Or is there maybe easier way to do it (PHP, Ajax)?
Many thanks,
Jerome

Go straight into Bytes rather than via String
var file = new Blob(['hello world']); // your file
var fr = new FileReader();
fr.addEventListener('load', function () {
var u = new Uint8Array(this.result),
a = new Array(u.length),
i = u.length;
while (i--) // map to hex
a[i] = (u[i] < 16 ? '0' : '') + u[i].toString(16);
u = null; // free memory
console.log(a); // work with this
});
fr.readAsArrayBuffer(file);

Related

Extract a buffer having different types of data with it

Can a buffer have both string and image associated with it? If so, how to extract them separately.
An example case would be a buffer with image data and also file name data.
I have worked with sharedArrayBuffers/arrayBuffers before.
If you are storing image pixel data, it's going to be a u32-int array, with 4 8-bit segment controlling rbga respectively... yes: you CAN tack on string data at the front in the form of a 'header' if you encode it and decode it to int values... but I have a hard time understanding why that might be desirable. because working with raw pixel data that is ONLY pixel-data is simpler. (I usually just stick it as a property of an object, with whatever other data I want to store)
Data Buffers
Typed arrays
You can use ArrayBuffer to create a buffer to hold the data. You then create a view using a typed array. eg unsigned characters Uint8Array. Types can be 8-16-32-64 bit (un/signed integers), float - double (32 - 64 bit floating point)
One buffer can have many view. You can read and write to view just like any JS array. The values are automatically converted to the correct type when you write to a buffer, and converted to Number when you read from a view
Example
Using buffer and views to read different data types
For example say you have file data that has a 4 character header, followed by a 16 bit unsigned integer chunk length, then 2 signed 16 bit integer coordinates, and more data
const fileBuffer = ArrayBuffer(fileSizeInBytes);
// Create a view of the buffer so we can fill it with file data
const dataRaw = new Uint8Array(data);
// load the data into dataRaw
// To get a string from the data we can create a util function
function readBufferString(buffer, start, length) {
// create a view at the position of the string in the buffer
const bytes = new Uint8Array(buffer, start, length);
// read each byte converting to JS unicode string
var str = "", idx = 0;
while (idx < length) { str += String.fromCharCode(bytes[idx++]) }
return str;
}
// get 4 char chunk header at start of buffer
const header = readBufferString(fileBuffer, 0, 4);
if (header === "HEAD") {
// Create views for 16 bit signed and unsigned integers
const ints = new Int16Array(fileBuffer);
const uints = new Uint16Array(fileBuffer);
const length = uints[2]; // get the length as unsigned int16
const x = ints[3]; // get the x coord as signed int16
const y = ints[4]; // get the y coord as signed int16
A DataView
The above example is one way of extracting the different types of data from a single buffer. However there could be an problem with older files and some data sources regarding the order of bytes that create multi byte types (eg 32 integers). This is called endianness
To help with using the correct endianness and to simplify access to all the different data types in a buffer you can use a DataView
The data view lets you read from the buffer by type and endianness. For example to read a unsigned 64bit integer from a buffer
// fileBuffer is a array buffer with the data
// Create a view
const dataView = new DataView(fileBuffer);
// read the 64 bit uint starting at the first byte in the buffer
// Note the returned value is a BigInt not a Number
const bInt = dataView.getBigUint64(0);
// If the int was in little endian order you would use
const bInt = dataView.getBigUint64(0, true); // true for little E
Notes
Buffers are not dynamic. That means they can not grow and shrink and that you must know how large the buffer needs to be when you create it.
Buffers tend to be a little slower than JavaScript's standard array as there is a lot type coercion when read or writing to buffers
Buffers can be transferred (Zero copy transfer) across threads making them ideal for distributing large data structures between WebWorkers. There is also a SharedArrayBuffer that lets you create true parallel processing solutions in JS

How to generate a Shift_JIS(SJIS) percent encoded string in JavaScript

I'm new to both JavaScript and Google Apps Script and having a problem to convert texts written in a cell to the Shift-JIS (SJIS) encoded letters.
For example, the Japanese string "あいう" should be encoded as "%82%A0%82%A2%82%A4" not as "%E3%81%82%E3%81%84%E3%81%86" which is UTF-8 encoded.
I tried EncodingJS and the built-in urlencode() function but it both returns the UTF-8 encoded one.
Would any one tell me how to get the SJIS-encoded letters properly in GAS? Thank you.
You want to do the URL encode from あいう to %82%A0%82%A2%82%A4 as Shift-JIS of the character set.
%E3%81%82%E3%81%84%E3%81%86 is the result converted as UTF-8.
You want to achieve this using Google Apps Script.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Points of this answer:
In order to use Shift-JIS of the character set at Google Apps Script, it is required to use it as the binary data. Because, when the value of Shift-JIS is retrieved as the string by Google Apps Script, the character set is automatically changed to UTF-8. Please be careful this.
Sample script 1:
In order to convert from あいう to %82%A0%82%A2%82%A4, how about the following script? In this case, this script can be used for HIRAGANA characters.
function muFunction() {
var str = "あいう";
var bytes = Utilities.newBlob("").setDataFromString(str, "Shift_JIS").getBytes();
var res = bytes.map(function(byte) {return "%" + ("0" + (byte & 0xFF).toString(16)).slice(-2)}).join("").toUpperCase();
Logger.log(res)
}
Result:
You can see the following result at the log.
%82%A0%82%A2%82%A4
Sample script 2:
If you want to convert the values including the KANJI characters, how about the following script? In this case, 本日は晴天なり is converted to %96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8.
function muFunction() {
var str = "本日は晴天なり";
var conv = Utilities.newBlob("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz*-.#_").getBytes().map(function(e) {return ("0" + (e & 0xFF).toString(16)).slice(-2)});
var bytes = Utilities.newBlob("").setDataFromString(str, "Shift_JIS").getBytes();
var res = bytes.map(function(byte) {
var n = ("0" + (byte & 0xFF).toString(16)).slice(-2);
return conv.indexOf(n) != -1 ? String.fromCharCode(parseInt(n[0], 16).toString(2).length == 4 ? parseInt(n, 16) - 256 : parseInt(n, 16)) : ("%" + n).toUpperCase();
}).join("");
Logger.log(res)
}
Result:
You can see the following result at the log.
%96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8
When 本日は晴天なり is converted with the sample script 1, it becomes like %96%7B%93%FA%82%CD%90%B0%93%56%82%C8%82%E8. This can also decoded. But it seems that the result value converted with the sample script 2 is generally used.
Flow:
The flow of this script is as follows.
Create new blob as the empty data.
Put the text value of あいう to the blob. At that time, the text value is put as Shift-JIS of the the character set.
In this case, even when blob.getDataAsString("Shift_JIS") is used, the result becomes UTF-8. So the blob is required to be used as the binary data without converting to the string data. This is the important point in this answer.
Convert the blob to the byte array.
Convert the bytes array of the signed hexadecimal to the unsigned hexadecimal.
At Google Apps Script, the byte array is uses as he signed hexadecimal. So it is required to convert to the unsigned hexadecimal.
When the value is the KANJI character, when the characters of 2 bytes can be converted to the string value as the ascii code, the string value is required to be used. The script of "Sample script 2" can be used for this situation.
At above sample, 天 becomes %93V.
Add % to the top character of each byte.
References:
newBlob(data)
setDataFromString(string, charset)
getBytes()
map()
If I misunderstood your question and this was not the direction you want, I apologize.
Let libraries do the hard work! EncodingJS, which you mentioned, can produce URL-encoded Shift-JIS strings from ordinary String objects.
Loading the library in Apps Script is a bit tricky, but nonetheless possible as demonstrated in this answer:
/**
* Specific to Apps Script. See:
* https://stackoverflow.com/a/33315754/13301046
*
* You can instead use <script>, import or require()
* depending on the environment the code runs in.
*/
eval(UrlFetchApp.fetch('https://cdnjs.cloudflare.com/ajax/libs/encoding-japanese/2.0.0/encoding.js').getContentText());
URL encoding is achieved is as follows:
function muFunction() {
const utfString = '本日は晴天なり';
const sjisArray = Encoding.convert(utfString, {
to: 'SJIS',
from: 'UNICODE'
})
const sjisUrlEncoded = Encoding.urlEncode(sjisArray)
Logger.log(sjisUrlEncoded)
}
This emits an URL-encoded Shift-JIS string to the log:
'%96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8'

Can't fit file encoding when working with Chrome File System API

I need to read a file which contains a group of symbols moved 65 in ASCII table. It means, for each symbol I am meant to do:
String.fromCharCode('¢'.charCodeAt(0)-65) // returns 'a'
But it is not working at all. I have asked friends of mine to do the test using Python inputting the same file and they got the correct result.
When I try to do the same work with Chrome File System it does not work at all.
I can't get back the expected symbols. I think it is a problem with my encoding/charset plataform but I can't figure out what is and how fix it.
I have tried opening the file with other encoding:
var reader=new FileReader();
reader.readAsText(file, 'windows-1252'); // no success
reader.readAsText(file, 'ISO-8859-2'); // no success
Appreciate any help
Problem is, your shifted text is no longer text by readAsText criteria. Trying to read it with any standard codepage is not going to work.
You should read the file as binary with readAsArrayBuffer(), interpret it as unsigned 8-bit int array, shift the bytes, and then convert the result to string.
var buf = new Uint8Array(reader.readAsArrayBuffer(file));
buf = buf.map((byte) => byte-65);
var string = new TextDecoder("ascii").decode(buf);

javascript hex codes with TCP/IP communication

I’m using a node module ‘net’ to create a client application that sends data through a TCP socket. The server-side application accepts this message if it starts and ends with a correct hex code, just for example the data packet would start with a hex “0F” and ends with a hex “0F1C”. How would I create these hex codes with javascript ? I found this code to convert a UTF-8 string into a hex code, not sure if this is what I need as I don’t have much experience with TCP/IP socket connections. Heres some javascript I've used to convert a utf-8 to a hex code. But I'm not sure this is what I'm looking for? Does anyone have experience with TCP/IP transfers and/or javascript hex codes?.
function toHex(str,hex){
try{
hex = unescape(encodeURIComponent(str))
.split('').map(function(v){
return v.charCodeAt(0).toString(16)
}).join('')
}
catch(e){
hex = str
console.log('invalid text input: ' + str)
}
return hex
}
First of all, you do not need to convert your data string into hex values, in order to send it over TCP. Every string in node.js is converted to bytes when sent over the network.
Normally, you'd send over a string like so:
var data = "ABC";
socket.write(data); // will send bytes 65 66 67, or in hex: 44 45 46
Node.JS also allows you to pass Buffer objects to functions like .write().
So, probably the easiest way to achieve what you wish, is to create an appropriate buffer to hold your data.
var data = "ABC";
var prefix = 0x0F; // JavaScript allows hex numbers.
var suffix = 0x0FC1;
var dataSize = Buffer.byteLength(data);
// compute the required buffer length
var bufferSize = 1 + dataSize + 2;
var buffer = new Buffer(bufferSize);
// store first byte on index 0;
buffer.writeUInt8(prefix, 0);
// store string starting at index 1;
buffer.write(data, 1, dataSize);
// stores last two bytes, in big endian format for TCP/IP.
buffer.writeUInt16BE(suffix, bufferSize - 2);
socket.write(buffer);
Explanation:
The prefix hex value 0F requires 1 byte of space. The suffix hex value 0FC1 actually requires two bytes (a 16-bit integer).
When computing the number of required bytes for a string (JavaScript strings are UTF-16 encoded!), str.length is not accurate most of the times, especially when your string has non-ASCII characters in it. For this, the proper way of getting the byte size of a string is to use Buffer.byteLength().
Buffers in node.js have static allocations, meaning you can't resize them after you created them. Hence, you'll need to compute the size of the buffer -in bytes- before creating it. Looking at our data, that is 1 (for our prefix) + Buffer.byteLength(data) (for our data) + 2 (for our suffix).
After that -imagine buffers as arrays of bytes (8-bit values)-, we'll populate the buffer, like so:
write the first byte (the prefix) using writeUInt8(byte, offset) with offset 0 in our buffer.
write the data string, using .write(string[, offset[, length]][, encoding]), starting at offset 1 in our buffer, and length dataSize.
write the last two bytes, using .writeUInt16BE(value, offset) with offset bufferSize - 2. We're using writeUInt16BE to write the 16-bit value in big-endian encoding, which is what you'd need for TCP/IP.
Once we've filled our buffer with the correct data, we can send it over the network, using socket.write(buffer);
Additional tip:
If you really want to convert a large string to bytes, (e.g. to later print as hex), then Buffer is also great:
var buf = Buffer.from('a very large string');
// now you have a byte represetantion of the string.
Since bytes are all 0-255 decimal values, you can easily print them as hex values in console, like so:
for (i = 0; i < buf.length; i++) {
const byte = buf[i];
const hexChar = byte.toString(16); // convert the decimal `byte` to hex string;
// do something with hexChar, e.g. console.log(hexChar);
}

What is the best way to convert Matlab multidimensional cell array to Javascript array?

I have a very large 4D Matlab matrix (31x31x86x127) that I wish to convert into a Javascript 4D array. What is the best way to do this?
Currently my tentative approach will be to either:
1) Write the Matlab matrix into a binary file, and then read that in and build the Javascript.
2) Use JSONlab (http://www.mathworks.com/matlabcentral/fileexchange/33381-jsonlab--a-toolbox-to-encode-decode-json-files-in-matlab-octave) to convert the Matlab matrix into a JSON string and then write a custom decoder to turn that JSON string into a Javascript Array. Issue is that the JSON text file is 1.98GB...
3) This may be the best way.
fileID = fopen('test.bin', 'w');
fwrite(fileID,value,'double');
Test.bin is then around 82MB, which is actually what I expect. 31*31*86*127*8bits/double = 82ish MB! However, how do I then read (in the browser) this binary file to a 4d Javascript array? Thanks!
Thoughts?
Thanks for your help!
save is not the right function to write a text file. Use savejson or saveubjson and pass the filename to the function. Do not use the return argument of these functions. Doing so I get a ubjson with less than 100MB and a json with less than 150MB.
My original answer, based on insufficient knowledge about the used code:
Instead of writing your own binary format, use one of the already available binary formats. Try writing it to universal binary json, jsonlab does support it. you should end up with a reasonable sized data without losing the advantages of a standardized file exchange format.
I think the best way is to
Write the matrix out as a string or text file (binary file is not necessary). You will need n-1 delimiters, where n=4 is the number of dimensions for your case. See this Saturn Fiddle as an example for a 2D matrix. Code below
Read the text file into a JavaScript string. How you do this really depends on if you're using JavaScript on the server or web browser.
Parse the string into a JavaScript array. You will have to use the split function on the delimiters from (1) and then enter them into an array like this example.
Code part (1):
% Welcome to SaturnAPI!
% Start collaborating with MATLAB-Octave fiddles and accomplish more.
% Start your script below these comments.
A = [ 1 2 3 ; 4 5 6 ; 7 8 9 ]
for ii=1:size(A)(1)
for jj=1:size(A)(2)
printf(" %d ", A(ii,jj));
end
printf(";");
end
Code part (3):
function make(dim, lvl, arr) {
if (lvl === 1) return [];
if (!lvl) lvl = dim;
if (!arr) arr = [];
for (var i = 0, l = dim; i < l; i += 1) {
arr[i] = make(dim, lvl - 1, arr[i]);
}
return arr;
}
var myMultiArray = make(4);

Categories

Resources