How to determine file size from a string in ES? - javascript

Is there a way to get the size of a file based on a string?
Is it:
var size = value.length / 1024;
console.log("File is " + size + " megabytes");
I'm posting to a server and sometimes it doesn't go through so I'm guessing there may be a post limit and I'd like to get the file size of the string before trying to post and show a message if the size is too large.

You're looking for the equivalent of Buffer.byteLength in ActionScript:
Returns the actual byte length of a string. This is not the same as String.prototype.length since that returns the number of characters in a string.

Related

How to generate a Shift_JIS(SJIS) percent encoded string in JavaScript

I'm new to both JavaScript and Google Apps Script and having a problem to convert texts written in a cell to the Shift-JIS (SJIS) encoded letters.
For example, the Japanese string "あいう" should be encoded as "%82%A0%82%A2%82%A4" not as "%E3%81%82%E3%81%84%E3%81%86" which is UTF-8 encoded.
I tried EncodingJS and the built-in urlencode() function but it both returns the UTF-8 encoded one.
Would any one tell me how to get the SJIS-encoded letters properly in GAS? Thank you.
You want to do the URL encode from あいう to %82%A0%82%A2%82%A4 as Shift-JIS of the character set.
%E3%81%82%E3%81%84%E3%81%86 is the result converted as UTF-8.
You want to achieve this using Google Apps Script.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Points of this answer:
In order to use Shift-JIS of the character set at Google Apps Script, it is required to use it as the binary data. Because, when the value of Shift-JIS is retrieved as the string by Google Apps Script, the character set is automatically changed to UTF-8. Please be careful this.
Sample script 1:
In order to convert from あいう to %82%A0%82%A2%82%A4, how about the following script? In this case, this script can be used for HIRAGANA characters.
function muFunction() {
var str = "あいう";
var bytes = Utilities.newBlob("").setDataFromString(str, "Shift_JIS").getBytes();
var res = bytes.map(function(byte) {return "%" + ("0" + (byte & 0xFF).toString(16)).slice(-2)}).join("").toUpperCase();
Logger.log(res)
}
Result:
You can see the following result at the log.
%82%A0%82%A2%82%A4
Sample script 2:
If you want to convert the values including the KANJI characters, how about the following script? In this case, 本日は晴天なり is converted to %96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8.
function muFunction() {
var str = "本日は晴天なり";
var conv = Utilities.newBlob("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz*-.#_").getBytes().map(function(e) {return ("0" + (e & 0xFF).toString(16)).slice(-2)});
var bytes = Utilities.newBlob("").setDataFromString(str, "Shift_JIS").getBytes();
var res = bytes.map(function(byte) {
var n = ("0" + (byte & 0xFF).toString(16)).slice(-2);
return conv.indexOf(n) != -1 ? String.fromCharCode(parseInt(n[0], 16).toString(2).length == 4 ? parseInt(n, 16) - 256 : parseInt(n, 16)) : ("%" + n).toUpperCase();
}).join("");
Logger.log(res)
}
Result:
You can see the following result at the log.
%96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8
When 本日は晴天なり is converted with the sample script 1, it becomes like %96%7B%93%FA%82%CD%90%B0%93%56%82%C8%82%E8. This can also decoded. But it seems that the result value converted with the sample script 2 is generally used.
Flow:
The flow of this script is as follows.
Create new blob as the empty data.
Put the text value of あいう to the blob. At that time, the text value is put as Shift-JIS of the the character set.
In this case, even when blob.getDataAsString("Shift_JIS") is used, the result becomes UTF-8. So the blob is required to be used as the binary data without converting to the string data. This is the important point in this answer.
Convert the blob to the byte array.
Convert the bytes array of the signed hexadecimal to the unsigned hexadecimal.
At Google Apps Script, the byte array is uses as he signed hexadecimal. So it is required to convert to the unsigned hexadecimal.
When the value is the KANJI character, when the characters of 2 bytes can be converted to the string value as the ascii code, the string value is required to be used. The script of "Sample script 2" can be used for this situation.
At above sample, 天 becomes %93V.
Add % to the top character of each byte.
References:
newBlob(data)
setDataFromString(string, charset)
getBytes()
map()
If I misunderstood your question and this was not the direction you want, I apologize.
Let libraries do the hard work! EncodingJS, which you mentioned, can produce URL-encoded Shift-JIS strings from ordinary String objects.
Loading the library in Apps Script is a bit tricky, but nonetheless possible as demonstrated in this answer:
/**
* Specific to Apps Script. See:
* https://stackoverflow.com/a/33315754/13301046
*
* You can instead use <script>, import or require()
* depending on the environment the code runs in.
*/
eval(UrlFetchApp.fetch('https://cdnjs.cloudflare.com/ajax/libs/encoding-japanese/2.0.0/encoding.js').getContentText());
URL encoding is achieved is as follows:
function muFunction() {
const utfString = '本日は晴天なり';
const sjisArray = Encoding.convert(utfString, {
to: 'SJIS',
from: 'UNICODE'
})
const sjisUrlEncoded = Encoding.urlEncode(sjisArray)
Logger.log(sjisUrlEncoded)
}
This emits an URL-encoded Shift-JIS string to the log:
'%96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8'

How do I determine how many kilobytes there are in a string of code using TypeScript?

Let's say I have the following TypeScript code (represented as a string):
function greet(name: string): void {
console.log(`Hello ${name}!`);
}
How would I programmatically determine how many kilobytes there are in this string?
I'm currently using the following equation:
// NOTE: "string.length" represents the number of bytes in the string
const KB: number = (string.length / 1024).toFixed(2);
The problem is that the number often appears to be far too big or far too small to be correct.
When I put the string in an empty file and save it, my file manager's properties output a completely different size, sometimes it's off by 2-20 KB.
What am I doing wrong, should I be using 1000 bytes to represent a kilobyte instead of 1024?
A character in JavaScript string is encoded using Unicode, every engine has their own character set, the most popular one being UTF-16. Therefore, each character holds 2 bytes of data. To find the total kilobytes being used by a string, find the number of bytes being used and divide it by 1024
const string = "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc";
const b = string.length * 2;
const kb = (b / 1024).toFixed(2);
console.log(`${kb}KB`);

Image file size from data URI in JavaScript

I am not sure it's even possible but - can I get the image file size from data URI?
For example, let's say there is an IMG element where src goes:
src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...
Based on the src, can I get the image file size by using plain JavaScript? (without server request)
If you want file size, simply decode your base64 string and check the length.
var src ="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP/// yH5BAEAAAAALAAAAAABAAEAAAIBRAA7";
var base64str = src.substr(22);
var decoded = atob(base64str);
console.log("FileSize: " + decoded.length);
If you're okay with a (very good) estimate, the file size is 75% of the size of the base64 string. The true size is no larger than this estimate, and, at most, two bytes smaller.
If you want to write one line and be done with it, use atob() and check the length, as in the other answers.
If you want an exact answer with maximum performance (in the case of gigantic files or millions of files or both), use the estimate but account for the padding to get the exact size:
let base64Length = src.length - (src.indexOf(',') + 1);
let padding = (src.charAt(src.length - 2) === '=') ? 2 : ((src.charAt(src.length - 1) === '=') ? 1 : 0);
let fileSize = base64Length * 0.75 - padding;
This avoids parsing the entire string, and is entirely overkill unless you're hunting for microoptimizations or are short on memory.
Your best option is to calculate the length of the base64 string itself.
What is a base64 length in bytes?
You have to convert the base64 string to a normal string using atob() and then check it length, it will return a value that you can say is close to the actual size of the image. Also you don't need the data:image/jpeg;base64, part from the data URI to check the size.
This is a universal solution for all types of base64 strings based on Daniel Trans's code.
var src ="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP/// yH5BAEAAAAALAAAAAABAAEAAAIBRAA7";
var base64str = src.split('base64,')[1];
var decoded = atob(base64str);
console.log("FileSize: " + decoded.length);
The other solutions make use of atob, which has now been deprecated. Here is an up-to-date example, using Buffer instead.
const src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...";
const base64str = src.split('base64,')[1]; //remove the image type metadata.
const imageFile = Buffer.from(base64str, 'base64'); //encode image into bytes
console.log('FileSize: ' + imageFile.length);

javascript hex codes with TCP/IP communication

I’m using a node module ‘net’ to create a client application that sends data through a TCP socket. The server-side application accepts this message if it starts and ends with a correct hex code, just for example the data packet would start with a hex “0F” and ends with a hex “0F1C”. How would I create these hex codes with javascript ? I found this code to convert a UTF-8 string into a hex code, not sure if this is what I need as I don’t have much experience with TCP/IP socket connections. Heres some javascript I've used to convert a utf-8 to a hex code. But I'm not sure this is what I'm looking for? Does anyone have experience with TCP/IP transfers and/or javascript hex codes?.
function toHex(str,hex){
try{
hex = unescape(encodeURIComponent(str))
.split('').map(function(v){
return v.charCodeAt(0).toString(16)
}).join('')
}
catch(e){
hex = str
console.log('invalid text input: ' + str)
}
return hex
}
First of all, you do not need to convert your data string into hex values, in order to send it over TCP. Every string in node.js is converted to bytes when sent over the network.
Normally, you'd send over a string like so:
var data = "ABC";
socket.write(data); // will send bytes 65 66 67, or in hex: 44 45 46
Node.JS also allows you to pass Buffer objects to functions like .write().
So, probably the easiest way to achieve what you wish, is to create an appropriate buffer to hold your data.
var data = "ABC";
var prefix = 0x0F; // JavaScript allows hex numbers.
var suffix = 0x0FC1;
var dataSize = Buffer.byteLength(data);
// compute the required buffer length
var bufferSize = 1 + dataSize + 2;
var buffer = new Buffer(bufferSize);
// store first byte on index 0;
buffer.writeUInt8(prefix, 0);
// store string starting at index 1;
buffer.write(data, 1, dataSize);
// stores last two bytes, in big endian format for TCP/IP.
buffer.writeUInt16BE(suffix, bufferSize - 2);
socket.write(buffer);
Explanation:
The prefix hex value 0F requires 1 byte of space. The suffix hex value 0FC1 actually requires two bytes (a 16-bit integer).
When computing the number of required bytes for a string (JavaScript strings are UTF-16 encoded!), str.length is not accurate most of the times, especially when your string has non-ASCII characters in it. For this, the proper way of getting the byte size of a string is to use Buffer.byteLength().
Buffers in node.js have static allocations, meaning you can't resize them after you created them. Hence, you'll need to compute the size of the buffer -in bytes- before creating it. Looking at our data, that is 1 (for our prefix) + Buffer.byteLength(data) (for our data) + 2 (for our suffix).
After that -imagine buffers as arrays of bytes (8-bit values)-, we'll populate the buffer, like so:
write the first byte (the prefix) using writeUInt8(byte, offset) with offset 0 in our buffer.
write the data string, using .write(string[, offset[, length]][, encoding]), starting at offset 1 in our buffer, and length dataSize.
write the last two bytes, using .writeUInt16BE(value, offset) with offset bufferSize - 2. We're using writeUInt16BE to write the 16-bit value in big-endian encoding, which is what you'd need for TCP/IP.
Once we've filled our buffer with the correct data, we can send it over the network, using socket.write(buffer);
Additional tip:
If you really want to convert a large string to bytes, (e.g. to later print as hex), then Buffer is also great:
var buf = Buffer.from('a very large string');
// now you have a byte represetantion of the string.
Since bytes are all 0-255 decimal values, you can easily print them as hex values in console, like so:
for (i = 0; i < buf.length; i++) {
const byte = buf[i];
const hexChar = byte.toString(16); // convert the decimal `byte` to hex string;
// do something with hexChar, e.g. console.log(hexChar);
}

How to extract pixel information from PNG using javascript (getImageData alternatives)

I am trying to get pixel data from PNG images for processing. The current way is by using canvas.drawImage followed canvas.getImageData (example here). I am looking for alternatives.
The problem with the current approach is that browsers modify pixel values influenced by alpha, as discussed here and here.
This question has been asked before, but no satisfactory answers are available.
The only way to do this without using canvas and getImageData() is to load the PNG file as a binary typed array and parse the file in code "manually".
Prerequisites:
For this you need the PNG specification which you can find here.
You need to know how to use typed arrays (for this a DataView is the most suitable view).
PNG files are chunk based and you will need to know how to parse chunks
A typical chunk based file has a four byte header called a FourCC identifier, followed by the size and misc. data depending on the file format definition.
Then chunks are placed right after this containing often a FOURCC (or four character code) and then the size of the chunk without the chunk header. In principle:
MAGIC FOURCC
SIZE/MISC - depending on definition
...
CHK1 - Chunk FourCC
SIZE - unsigned long
.... data
CHK2
SIZE
.... data
This format principle came originally from the Commodore Amiga platform and EA/IFF (Interleaved File Format) back in mid 80's.
But in modern days some vendors has extended or vary the chunk format, so for PNG chunks it will actually look like this:
Header (always 8 bytes and the same byte values):
‰PNG (first byte is 0x89, see specs for reason)
CR + LF 0x0C0A
EOC + LF 0x1A0A
Chunks:
SIZE (4 bytes, may be 0 (f.ex. IEND). Excl. chunk header and crc)
FOURCC (4 bytes, ie. "IHDR", "IDAT")
[...data] (length: SIZE x bytes)
CRC32 (4 bytes representing the CRC-32 checksum of the data)
(see the referenced specification link above for details).
And the byte-order (endianess) for PNG is always big-endian ("network" order).
This makes it easy to parse through the file supporting only some (or all) chunks. For PNG you would need to support at least (source):
IHDR must be the first chunk; it contains (in this order) the image's width, height, bit depth and color type.
IDAT contains the image, which may be split between multiple IDAT chunks. Such splitting increases the file size slightly, but makes it easier to stream the PNG. The IDAT chunk contains the actual image data, which is the output stream of the compression algorithm.
IEND marks the file end.
If you intend to support palette (color indexed) files you would also need to support the PLTE chunk. When you parse the IHDR chunk you will be able to see what color format is used (type 2 for RGB data, or 6 for RGBA and so on).
Parsing is itself easy so your biggest challenge would be supporting things like ICC profiles (when present in the iCCP chunk) to adjust the image color data. A typical chunk is the gamma chunk (gAMA) which contains a single gamma value you can apply to convert the data to linear format so that it displays correctly when display gamma is applied (there are also other special chunks related to colors).
The second biggest challenge would be the decompression which uses INFLATE. You can use a project such as PAKO zlib port to do this job for you and this port has performance close to native zlib. In addition to that, if you want to do error checking on the data (recommended) CRC-32 checking should also be supported.
For security reason you should always check that fields contain the data they're suppose to as well as that reserved space are initialized with either 0 or the defined data.
Hope this helps!
Example chunk parser: (note: won't run in IE).
function pngParser(buffer) {
var view = new DataView(buffer),
len = buffer.byteLength,
magic1, magic2,
chunks = [],
size, fourCC, crc, offset,
pos = 0; // current offset in buffer ("file")
// check header
magic1 = view.getUint32(pos); pos += 4;
magic2 = view.getUint32(pos); pos += 4;
if (magic1 === 0x89504E47 && magic2 === 0x0D0A1A0A) {
// parse chunks
while (pos < len) {
// chunk header
size = view.getUint32(pos);
fourCC = getFourCC(view.getUint32(pos + 4));
// data offset
offset = pos + 8;
pos = offset + size;
// crc
crc = view.getUint32(pos);
pos += 4;
// store chunk
chunks.push({
fourCC: fourCC,
size: size,
offset: offset,
crc: crc
})
}
return {chunks: chunks}
}
else {
return {error: "Not a PNG file."}
}
function getFourCC(int) {
var c = String.fromCharCode;
return c(int >>> 24) + c(int >>> 16 & 0xff) + c(int >>> 8 & 0xff) + c(int & 0xff);
}
}
// USAGE: ------------------------------------------------
fetch("//i.imgur.com/GP6Q3v8.png")
.then(function(resp) {return resp.arrayBuffer()}).then(function(buffer) {
var info = pngParser(buffer);
// parse each chunk here...
for (var i = 0, chunks = info.chunks, chunk; chunk = chunks[i++];) {
out("CHUNK : " + chunk.fourCC);
out("SIZE : " + chunk.size + " bytes");
out("OFFSET: " + chunk.offset + " bytes");
out("CRC : 0x" + (chunk.crc>>>0).toString(16).toUpperCase());
out("-------------------------------");
}
function out(txt) {document.getElementById("out").innerHTML += txt + "<br>"}
});
body {font: 14px monospace}
<pre id="out"></pre>
From here you can extract the IHDR to find image size and color type, then IDAT chunk(s) to deflate (PNG uses filters per scanline which do complicate things a bit, as well as a interlace mode, see specs) and your almost done ;)

Categories

Resources