How atob doesn't convert from Buffer with base64? - javascript

I have data that I encrypt using lz-string package.
I also convert the result to base64 and using atob() function to convert from base64.
The problem is atob() doesn't work as expected but Buffer.from(b64, 'base64').toString(); does.
Why? how do I fix that? I need to use atob in the client side (the Buffer is not exist in the browser).
StackBlitz example

Use decodeURIComponent and escape to convert to UTF-8.
const non64 = decodeURIComponent(escape(window.atob( b64 )));

The more effective (see below) option would be, if your LZ library supports it, to not interpret the base64-encoded buffer as a string and pass it to the library as a Uint8Array directly. You can do that with
const buffer = Uint8Array.from(atob(b64), c => c.charCodeAt(0))
And then if you really need a string, you can use a TextDecoder, which is a bit less hacky than Shlomi's admittedly very nice solution:
const text = new TextDecoder().decode(buffer)
There are a couple reasons why using a TypedArray is more effective and an implementation of LZ should really work on them rather than strings (and probably use WebAssembly). Obviously you skip the UTF-8 decoding, but the more significant reason is because in JavaScript, strings are represented in memory as UTF-16, so each character takes at least 2 bytes (exactly 2 bytes in the case of a binary string) whereas the Uint8Array — as the name suggests — only uses one byte per item.

Related

CryptoJS.enc.Base64.parse vs Base64.decodeBase64, what's the difference?

Want to understand how these two are different? Or they are same?
var key2 = CryptoJS.enc.Base64.parse(apiKey);
&
byte[] decodedBase64APIKeyByteArray = Base64.decodeBase64(apiKey);
I have gone through the APIs of both but seems like both are doing conversions but my question is would the conversion be same for same input?
Will the output for both would be same?
Both decode normal base64 with the default base64 alphabet including possible padding characters at the end.
There are a few differences however.
Documentation: The commons-codec one is at least somewhat documented.
The input: The commons-codec allows base64 and removes line endings and such (required for e.g. MIME decoding). A quick look at the CryptoJS code shows that it requires base64 without whitespace. So the Java based decoder allows different forms of input.
The implementation: The CryptoJS parsing brings tears to my eyes, and not of joy. It has terrible performance, if just on how it handles the base 64 without streaming. It even is stupid enough to use an indexOf to lookup possible padding characters up front, which is both woefully bad and non-performant. Apache's implementation is only slightly better. Both should only be used for relatively small amounts of data.
The output: The CryptoJS returns a word-array while the commons-codec one returns a byte array. For keys this doesn't matter much, as Java usually expects a byte array for SecretKeySpec while CryptoJS directly uses a word array as key.

javascript string internal representation

As far as I know java uses UTF-16 to represent chars and string internally,
so if we load a text file from a file it is automatically decoded to its original encoding to utf-16.
Now the same can be said also for javascript
it also uses utf-16 as the internal string representation.
Suppose we load a string x encoded in utf-8 using ajax,
a converion takes place in order for javascript to be able to represent internally that string in UTF-16.
Please tell me if any of what I stated is correct or not,
because the real question is yet to come...
Now suppose the browser is rendering a page using utf-8 encoding,
and using javascript we want the browser to render also the ajax string x (as you normally do)
Would, in this case, a further conversion be needed from utf-16 to utf-8 ?
Thanks in advance.
According to this article, it is USC-2 or UTF-16

How to do a regex replace on a JavaScript ArrayBuffer?

How do I do a regular expression replacement on an ArrayBuffer in JavaScript?
From what I can tell .replace in JavaScript expects a String as the input and doesn't support ArrayBuffer: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace
My thought would be to convert the ArrayBuffer into a String and then do the replace, then convert it back to an ArrayBuffer - will there be any data loss if this is done?
With the information that the buffer contains arbitrary data, than: yes, there is a possibility of data loss.
A JavaScript string is encoded in 16-bit Unicode, so this simplified example
("asd\u0000asd").length
returns 7 despite having 8 bytes all over (it could even return 3 in browsers other than Firefox). You can work around that with a bit of care, of course, but I would deem it safer (and probably easier) to do the replacing earlier if possible or do it by hand if the regex is not too complicated.
I think this is one of the places where the old saying
Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.
-- Jamie Zawinski
holds a tiny bit of truth ;-)

Javascript string compression for URL hash parameter

I'm looking to store a lot of data in a URL hash parameter without exceeding URL character limits.
Are there any conventional ways of compressing string length which could be then decoded on another page load?
I've seen LZW encoding used for similar solutions, however would special characters be valid for this use?
LZW encoding technically works; you'll just need to convert the LZW-encoded binary into URL-safe base64, so that the output doesn't contain special characters. Here's an MDN article on base64 in JavaScript; the URL-safe variant of base64 just replaces + with - and / with _. Of course, you're not likely to reduce the size of your string by much by doing this, unless the data you want to store is extremely compressible.
You can look at smaz or shoco, which are designed for the compression of short strings. Most compression methods don't really get rolling until well after your URL length limit, so you need a specialized compressor for this case if you expect to get any gain. You can then encode the binary result using a scheme like Base 64 or a more efficient coding that uses all of the URI-safe characters.

using regexp on raw binary data

I'm embedding JavaScript in my C++ app (via V8) and I get some raw binary data which I want to pass to JavaScript. Now, in the JavaScript, I plan to do some regular expressions on the data.
When using just the standard JavaScript String object for my data, everything is quite straight-forward. However, as far as I understand it, it uses an UTF16 representation and expects the data to be valid Unicode. But I have arbitrary data (might contain '\0' and other raw data - although it is just text for the most part).
How should I handle this? I searched a bit around and maybe ArrayBuffer or something like this is the object I need to store my raw data. However, I didn't found how to do the usual regular expression methods on that object. (Basically I need RegExp.test and RegExp.exec).
I just checked out the Node.js code and it seems as if they support binary data and just put it into a string via v8::String::NewFromOneByte. See here and here. So that would answer my question (i.e., I can just use String), wouldn't it? Any downsides?
(I still don't see why my question is bad. Please explain the downvote.)
From all my current tests, it seems like it works just as expected with normal String.
You can even specify that in JavaScript directly, e.g.
var s = "\x00\x01\x02\x03"
and regular expressions on that string work like expected.
On the C++ side, if you want to get your binary data into a JS String object:
v8::Local<v8::String> jsBinary(const uint8_t* data, uint32_t len) {
assert(int(len) >= 0);
return String::NewFromOneByte(v8::Isolate::GetCurrent(), data, String::kNormalString, len);
}

Categories

Resources