How to generate a Shift_JIS(SJIS) percent encoded string in JavaScript - javascript

I'm new to both JavaScript and Google Apps Script and having a problem to convert texts written in a cell to the Shift-JIS (SJIS) encoded letters.
For example, the Japanese string "あいう" should be encoded as "%82%A0%82%A2%82%A4" not as "%E3%81%82%E3%81%84%E3%81%86" which is UTF-8 encoded.
I tried EncodingJS and the built-in urlencode() function but it both returns the UTF-8 encoded one.
Would any one tell me how to get the SJIS-encoded letters properly in GAS? Thank you.

You want to do the URL encode from あいう to %82%A0%82%A2%82%A4 as Shift-JIS of the character set.
%E3%81%82%E3%81%84%E3%81%86 is the result converted as UTF-8.
You want to achieve this using Google Apps Script.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Points of this answer:
In order to use Shift-JIS of the character set at Google Apps Script, it is required to use it as the binary data. Because, when the value of Shift-JIS is retrieved as the string by Google Apps Script, the character set is automatically changed to UTF-8. Please be careful this.
Sample script 1:
In order to convert from あいう to %82%A0%82%A2%82%A4, how about the following script? In this case, this script can be used for HIRAGANA characters.
function muFunction() {
var str = "あいう";
var bytes = Utilities.newBlob("").setDataFromString(str, "Shift_JIS").getBytes();
var res = bytes.map(function(byte) {return "%" + ("0" + (byte & 0xFF).toString(16)).slice(-2)}).join("").toUpperCase();
Logger.log(res)
}
Result:
You can see the following result at the log.
%82%A0%82%A2%82%A4
Sample script 2:
If you want to convert the values including the KANJI characters, how about the following script? In this case, 本日は晴天なり is converted to %96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8.
function muFunction() {
var str = "本日は晴天なり";
var conv = Utilities.newBlob("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz*-.#_").getBytes().map(function(e) {return ("0" + (e & 0xFF).toString(16)).slice(-2)});
var bytes = Utilities.newBlob("").setDataFromString(str, "Shift_JIS").getBytes();
var res = bytes.map(function(byte) {
var n = ("0" + (byte & 0xFF).toString(16)).slice(-2);
return conv.indexOf(n) != -1 ? String.fromCharCode(parseInt(n[0], 16).toString(2).length == 4 ? parseInt(n, 16) - 256 : parseInt(n, 16)) : ("%" + n).toUpperCase();
}).join("");
Logger.log(res)
}
Result:
You can see the following result at the log.
%96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8
When 本日は晴天なり is converted with the sample script 1, it becomes like %96%7B%93%FA%82%CD%90%B0%93%56%82%C8%82%E8. This can also decoded. But it seems that the result value converted with the sample script 2 is generally used.
Flow:
The flow of this script is as follows.
Create new blob as the empty data.
Put the text value of あいう to the blob. At that time, the text value is put as Shift-JIS of the the character set.
In this case, even when blob.getDataAsString("Shift_JIS") is used, the result becomes UTF-8. So the blob is required to be used as the binary data without converting to the string data. This is the important point in this answer.
Convert the blob to the byte array.
Convert the bytes array of the signed hexadecimal to the unsigned hexadecimal.
At Google Apps Script, the byte array is uses as he signed hexadecimal. So it is required to convert to the unsigned hexadecimal.
When the value is the KANJI character, when the characters of 2 bytes can be converted to the string value as the ascii code, the string value is required to be used. The script of "Sample script 2" can be used for this situation.
At above sample, 天 becomes %93V.
Add % to the top character of each byte.
References:
newBlob(data)
setDataFromString(string, charset)
getBytes()
map()
If I misunderstood your question and this was not the direction you want, I apologize.

Let libraries do the hard work! EncodingJS, which you mentioned, can produce URL-encoded Shift-JIS strings from ordinary String objects.
Loading the library in Apps Script is a bit tricky, but nonetheless possible as demonstrated in this answer:
/**
* Specific to Apps Script. See:
* https://stackoverflow.com/a/33315754/13301046
*
* You can instead use <script>, import or require()
* depending on the environment the code runs in.
*/
eval(UrlFetchApp.fetch('https://cdnjs.cloudflare.com/ajax/libs/encoding-japanese/2.0.0/encoding.js').getContentText());
URL encoding is achieved is as follows:
function muFunction() {
const utfString = '本日は晴天なり';
const sjisArray = Encoding.convert(utfString, {
to: 'SJIS',
from: 'UNICODE'
})
const sjisUrlEncoded = Encoding.urlEncode(sjisArray)
Logger.log(sjisUrlEncoded)
}
This emits an URL-encoded Shift-JIS string to the log:
'%96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8'

Related

How can I convert from a Javascript Buffer to a String only containing 0 and 1

So I currently am trying to implement the huffman alg and it works fine for decoding and encoding. However, I store the encoded data as follows.
The result of the encoding function is a list containing many strings made up of 0 and 1 and all are varying length.
If i'd safe them in a normal txt file it would take up more space, if Id store them how they are in a binary file it could be that for example an 'e' which would have the code 101 would be stored in a full 8 bits looking like '00000101' which is wasteful and wont take up less storage then the original txt file. I took all the strings in the list and put them into one string and split it into equal parts of length 8 to store them more effectively.
However if I wanna read the data now, instead of 0 and 1 I get utf-8 chars, even some escape characters.
I'm reading the file with fs.readFileSync("./encoded.bin", "binary"); but javascript then thinks it's a buffer already and converts it to a string and it gets all weird... Any solutions or ideas to convert it back to 0 and 1?
I also tried to switch the "binary" in fs.readFileSync("./encoded.bin", "binary"); to a "utf-8" which helped with not crashing my terminal but still is "#��C��Ʃ��Ԧ�y�Kf�g��<�e�t"
To clarify, my goal in the end is to read out the massive string of binary data which would look like this "00011001000101001010" and actually get this into a string...
You can convert a String of 1s and 0s to the numerical representation of a byte using Number.parseInt(str, 2) and to convert it back, you can use nr.toString(2).
The entire process will look something like this:
const original = '0000010100000111';
// Split the string in 8 char long substrings
const stringBytes = original.match(/.{8}/g);
// Convert the 8 char long strings to numerical byte representations
const numBytes = stringBytes.map((s) => Number.parseInt(s, 2));
// Convert the numbers to an ArrayBuffer
const buffer = Uint8Array.from(numBytes);
// Write to file
// Read from file and reverse the process
const decoded = [...buffer].map((b) => b.toString(2).padStart(8, '0')).join('');
console.log('original', original, 'decoded', decoded, 'same', original === decoded);
var binary = fs.readFileSync("./binary.bin");
binary = [...binary].map((b) => b.toString(2).padStart(8, "0")).join("");
console.log(binary);
//Output will be like 010000111011010

Javascript: Convert Unicode Character to hex string [duplicate]

I'm using a barcode scanner to read a barcode on my website (the website is made in OpenUI5).
The scanner works like a keyboard that types the characters it reads. At the end and the beginning of the typing it uses a special character. These characters are different for every type of scanner.
Some possible characters are:
█
▄
–
—
In my code I use if (oModelScanner.oData.scanning && oEvent.key == "\u2584") to check if the input from the scanner is ▄.
Is there any way to get the code from that character in the \uHHHH style? (with the HHHH being the hexadecimal code for the character)
I tried the charCodeAt but this returns the decimal code.
With the codePointAt examples they make the code I need into a decimal code so I need a reverse of this.
Javascript strings have a method codePointAt which gives you the integer representing the Unicode point value. You need to use a base 16 (hexadecimal) representation of that number if you wish to format the integer into a four hexadecimal digits sequence (as in the response of Nikolay Spasov).
var hex = "▄".codePointAt(0).toString(16);
var result = "\\u" + "0000".substring(0, 4 - hex.length) + hex;
However it would probably be easier for you to check directly if you key code point integer match the expected code point
oEvent.key.codePointAt(0) === '▄'.codePointAt(0);
Note that "symbol equality" can actually be trickier: some symbols are defined by surrogate pairs (you can see it as the combination of two halves defined as four hexadecimal digits sequence).
For this reason I would recommend to use a specialized library.
you'll find more details in the very relevant article by Mathias Bynens
var hex = "▄".charCodeAt(0).toString(16);
var result = "\\u" + "0000".substring(0, 4 - hex.length) + hex;
If you want to print the multiple code points of a character, e.g., an emoji, you can do this:
const facepalm = "🤦🏼‍♂️";
const codePoints = Array.from(facepalm)
.map((v) => v.codePointAt(0).toString(16))
.map((hex) => "\\u{" + hex + "}");
console.log(codePoints);
["\u{1f926}", "\u{1f3fc}", "\u{200d}", "\u{2642}", "\u{fe0f}"]
If you are wondering about the components and the length of 🤦🏼‍♂️, check out this article.

javascript hex codes with TCP/IP communication

I’m using a node module ‘net’ to create a client application that sends data through a TCP socket. The server-side application accepts this message if it starts and ends with a correct hex code, just for example the data packet would start with a hex “0F” and ends with a hex “0F1C”. How would I create these hex codes with javascript ? I found this code to convert a UTF-8 string into a hex code, not sure if this is what I need as I don’t have much experience with TCP/IP socket connections. Heres some javascript I've used to convert a utf-8 to a hex code. But I'm not sure this is what I'm looking for? Does anyone have experience with TCP/IP transfers and/or javascript hex codes?.
function toHex(str,hex){
try{
hex = unescape(encodeURIComponent(str))
.split('').map(function(v){
return v.charCodeAt(0).toString(16)
}).join('')
}
catch(e){
hex = str
console.log('invalid text input: ' + str)
}
return hex
}
First of all, you do not need to convert your data string into hex values, in order to send it over TCP. Every string in node.js is converted to bytes when sent over the network.
Normally, you'd send over a string like so:
var data = "ABC";
socket.write(data); // will send bytes 65 66 67, or in hex: 44 45 46
Node.JS also allows you to pass Buffer objects to functions like .write().
So, probably the easiest way to achieve what you wish, is to create an appropriate buffer to hold your data.
var data = "ABC";
var prefix = 0x0F; // JavaScript allows hex numbers.
var suffix = 0x0FC1;
var dataSize = Buffer.byteLength(data);
// compute the required buffer length
var bufferSize = 1 + dataSize + 2;
var buffer = new Buffer(bufferSize);
// store first byte on index 0;
buffer.writeUInt8(prefix, 0);
// store string starting at index 1;
buffer.write(data, 1, dataSize);
// stores last two bytes, in big endian format for TCP/IP.
buffer.writeUInt16BE(suffix, bufferSize - 2);
socket.write(buffer);
Explanation:
The prefix hex value 0F requires 1 byte of space. The suffix hex value 0FC1 actually requires two bytes (a 16-bit integer).
When computing the number of required bytes for a string (JavaScript strings are UTF-16 encoded!), str.length is not accurate most of the times, especially when your string has non-ASCII characters in it. For this, the proper way of getting the byte size of a string is to use Buffer.byteLength().
Buffers in node.js have static allocations, meaning you can't resize them after you created them. Hence, you'll need to compute the size of the buffer -in bytes- before creating it. Looking at our data, that is 1 (for our prefix) + Buffer.byteLength(data) (for our data) + 2 (for our suffix).
After that -imagine buffers as arrays of bytes (8-bit values)-, we'll populate the buffer, like so:
write the first byte (the prefix) using writeUInt8(byte, offset) with offset 0 in our buffer.
write the data string, using .write(string[, offset[, length]][, encoding]), starting at offset 1 in our buffer, and length dataSize.
write the last two bytes, using .writeUInt16BE(value, offset) with offset bufferSize - 2. We're using writeUInt16BE to write the 16-bit value in big-endian encoding, which is what you'd need for TCP/IP.
Once we've filled our buffer with the correct data, we can send it over the network, using socket.write(buffer);
Additional tip:
If you really want to convert a large string to bytes, (e.g. to later print as hex), then Buffer is also great:
var buf = Buffer.from('a very large string');
// now you have a byte represetantion of the string.
Since bytes are all 0-255 decimal values, you can easily print them as hex values in console, like so:
for (i = 0; i < buf.length; i++) {
const byte = buf[i];
const hexChar = byte.toString(16); // convert the decimal `byte` to hex string;
// do something with hexChar, e.g. console.log(hexChar);
}

File API - HEX conversion - Javascript

I am trying to read a local text file with the help of the File API, and convert it to an HEX file using a similar function to "bin2hex()" (using CharCodeAt() function), and then finally process the HEX numbers to obtain my results. All this in Javascript.
To convert my file to an HEX array, I scan each character of the file via a for loop file and then use the bin2hex() function to obtain the HEX value. I would expect a result between 0x00 and 0xFF corresponding to whatever character I am trying to convert. But It seems that sometimes I am obtaining 0xfffd or 0x00 for no apparent reasons. Is there a limitations in terms of which characters you can process through the charcodeat() function or read with the File API? Or is there maybe easier way to do it (PHP, Ajax)?
Many thanks,
Jerome
Go straight into Bytes rather than via String
var file = new Blob(['hello world']); // your file
var fr = new FileReader();
fr.addEventListener('load', function () {
var u = new Uint8Array(this.result),
a = new Array(u.length),
i = u.length;
while (i--) // map to hex
a[i] = (u[i] < 16 ? '0' : '') + u[i].toString(16);
u = null; // free memory
console.log(a); // work with this
});
fr.readAsArrayBuffer(file);

Javascript encoding breaking & combining multibyte characters?

I'm planning to use a client-side AES encryption for my web-app.
Right now, I've been looking for ways to break multibyte characters into one byte-'non-characters' ,encrypt (to have the same encrypted text length),
de-crypt them back, convert those one-byte 'non-characters' back to multibyte characters.
I've seen the wiki for UTF-8 (the supposedly-default encoding for JS?) and UTF-16, but I can't figure out how to detect "fragmented" multibyte characters and how I can combine them back.
Thanks : )
JavaScript strings are UTF-16 stored in 16-bit "characters". For Unicode characters ("code points") that require more than 16 bits (some code points require 32 bits in UTF-16), each JavaScript "character" is actually only half of the code point.
So to "break" a JavaScript character into bytes, you just take the character code and split off the high byte and the low byte:
var code = str.charCodeAt(0); // The first character, obviously you'll have a loop
var lowbyte = code & 0xFF;
var highbyte = (code & 0xFF00) >> 8;
(Even though JavaScript's numbers are floating point, the bitwise operators work in terms of 32-bit integers, and of course in our case only 16 of those bits are relevant.)
You'll never have an odd number of bytes, because again this is UTF-16.
You could simply convert to UTF8... For example by using this trick
function encode_utf8(s) {
return unescape(encodeURIComponent(s));
}
function decode_utf8(s) {
return decodeURIComponent(escape(s));
}
Considering you are using crypto-js, you can use its methods to convert to utf8 and return to string. See here:
var words = CryptoJS.enc.Utf8.parse('𤭢');
var utf8 = CryptoJS.enc.Utf8.stringify(words);
The 𤭢 is probably a botched example of Utf8 character.
By looking at the other examples (see the Latin1 example), I'll say that with parse you convert a string to Utf8 (technically you convert it to Utf8 and put in a special array used by crypto-js of type WordArray) and the result can be passed to the Aes encoding algorithm and with stringify you convert a WordArray (for example obtained by decoding algorithm) to an Utf8.
JsFiddle example: http://jsfiddle.net/UpJRm/

Categories

Resources