CryptoJS.enc.Base64.parse vs Base64.decodeBase64, what's the difference? - javascript

Want to understand how these two are different? Or they are same?
var key2 = CryptoJS.enc.Base64.parse(apiKey);
&
byte[] decodedBase64APIKeyByteArray = Base64.decodeBase64(apiKey);
I have gone through the APIs of both but seems like both are doing conversions but my question is would the conversion be same for same input?
Will the output for both would be same?

Both decode normal base64 with the default base64 alphabet including possible padding characters at the end.
There are a few differences however.
Documentation: The commons-codec one is at least somewhat documented.
The input: The commons-codec allows base64 and removes line endings and such (required for e.g. MIME decoding). A quick look at the CryptoJS code shows that it requires base64 without whitespace. So the Java based decoder allows different forms of input.
The implementation: The CryptoJS parsing brings tears to my eyes, and not of joy. It has terrible performance, if just on how it handles the base 64 without streaming. It even is stupid enough to use an indexOf to lookup possible padding characters up front, which is both woefully bad and non-performant. Apache's implementation is only slightly better. Both should only be used for relatively small amounts of data.
The output: The CryptoJS returns a word-array while the commons-codec one returns a byte array. For keys this doesn't matter much, as Java usually expects a byte array for SecretKeySpec while CryptoJS directly uses a word array as key.

Related

How atob doesn't convert from Buffer with base64?

I have data that I encrypt using lz-string package.
I also convert the result to base64 and using atob() function to convert from base64.
The problem is atob() doesn't work as expected but Buffer.from(b64, 'base64').toString(); does.
Why? how do I fix that? I need to use atob in the client side (the Buffer is not exist in the browser).
StackBlitz example
Use decodeURIComponent and escape to convert to UTF-8.
const non64 = decodeURIComponent(escape(window.atob( b64 )));
The more effective (see below) option would be, if your LZ library supports it, to not interpret the base64-encoded buffer as a string and pass it to the library as a Uint8Array directly. You can do that with
const buffer = Uint8Array.from(atob(b64), c => c.charCodeAt(0))
And then if you really need a string, you can use a TextDecoder, which is a bit less hacky than Shlomi's admittedly very nice solution:
const text = new TextDecoder().decode(buffer)
There are a couple reasons why using a TypedArray is more effective and an implementation of LZ should really work on them rather than strings (and probably use WebAssembly). Obviously you skip the UTF-8 decoding, but the more significant reason is because in JavaScript, strings are represented in memory as UTF-16, so each character takes at least 2 bytes (exactly 2 bytes in the case of a binary string) whereas the Uint8Array — as the name suggests — only uses one byte per item.

Javascript - Alternative to lzw compression for Database entry

I have strings (about 1-5Kb) of the form:
FF,A3V,X7Y,aA4,....
lzw compresses these really nicely, but includes Turkish characters. These are then submitted to a MySQL database.
Sometimes MySQL can 'play-up' and not submit these properly, putting question marks '?' in place of the Turkish characters. They can do this even when you have your text areas properly defined. Exporting and reimporting the table can sort this out. This is fine for my test database, but not something I am happy with when this goes live.
Consequently I am looking for an alternative to lzw, which will compress but only using normal letters/numbers etc.
Does anyone know of a PUBLIC DOMAIN compression method that avoid Turkish Characters (and any other non-standard characters)? Can anyone point me to some code in javascript (or c++ or c# which I can convert)?
To expand a bit on what's been said in the comments... Storing strings of bytes, such as the output from a compression algorithm typically contains, in a VARCHAR or CHAR or TEXT column is not valid usage.
These column types are not for byte strings, they are for strings of valid characters only. Not every string of bytes contains valid strings of characters in any given character set... and MySQL isn't going to allow invalid characters (which, for some character sets, the correlation between "character" and "byte" isn't 1:1).
In the good ol' days™, the two were interchangeable but this is not the case any more (and hasn't been, to one degree or another, for a while).
If your column type, instead, were BINARY or VARBINARY or BLOB, the issue should disappear, because those data types are for binary data.

Javascript string compression for URL hash parameter

I'm looking to store a lot of data in a URL hash parameter without exceeding URL character limits.
Are there any conventional ways of compressing string length which could be then decoded on another page load?
I've seen LZW encoding used for similar solutions, however would special characters be valid for this use?
LZW encoding technically works; you'll just need to convert the LZW-encoded binary into URL-safe base64, so that the output doesn't contain special characters. Here's an MDN article on base64 in JavaScript; the URL-safe variant of base64 just replaces + with - and / with _. Of course, you're not likely to reduce the size of your string by much by doing this, unless the data you want to store is extremely compressible.
You can look at smaz or shoco, which are designed for the compression of short strings. Most compression methods don't really get rolling until well after your URL length limit, so you need a specialized compressor for this case if you expect to get any gain. You can then encode the binary result using a scheme like Base 64 or a more efficient coding that uses all of the URI-safe characters.

Get the browser's highlighted text into a UTF8 encoded javascript string

I'm new to javascript and do not have a good grasp of its unicode handling. If I understand correctly it's kind of like C/C++ where a string contains a binary sequence without any encoding info.
When I use something like var str=window.getSelection().toString() to get the highlighted text, will the resulting string have the same encoding as the web-page? If so, what's the best way of finding out that encoding and converting it to a unicode one (e.g. UTF8)?
Strings in Javascript are not like "strings" in C or PHP, which are actually byte arrays and have encoding semantics. Strings in Javascript are quite different than that and are like strings in Java/C# or Python's unicode type.
They are strings of abstract characters, at least if you don't try to have non-BMP characters. In practice, you don't have to worry about that, I am just mentioning it for completeness.
As per above, var str=window.getSelection().toString() does not have any encoding semantics, it's just a string of the characters that are selected. You don't state any actual problem in your question, but if you are wondering if "special" characters will just work in Javascript, well, they do just work.

Insert EBCDIC character into javascript string

I need to create an EBCDIC string within my javascript and save it into an EBCDIC database. A process on the EBCDIC system then uses the data. I haven't had any problems until I came across the character '¬'. In EBCDIC it is hex value of 5F. All of the usual letters and symbols seem to automagically convert with no problem. Any idea how I can create the EBCDIC value for '¬' within javascript so I can store it properly in the EBCDIC db?
Thanks!
If "all of the usual letters and symbols seem to automagically convert", then I very strongly suspect that you do not have to create an EBCDIC string in Javascript. The character codes for Latin letters and digits are completely different in EBCDIC than they are in Unicode, so something in your server code is already converting the strings.
Thus what you need to determine is how that process works, and specifically you need to find out how the translation maps character codes from Unicode source into the EBCDIC equivalents. Once you know that, you'll know what Unicode character to use in your Javascript code.
As a further note: every single time I've been told by an IT organization that their mainframe software requires that data be supplied in EBCDIC, that advice has been dead wrong. The fact that there's some external interface means that something in the pile of iron that makes up the mainframe and it's tentacles, something the IT people have forgotten about and probably couldn't find if they needed to, is already mapping "real world" character encodings like Unicode into EBCDIC. How does it work? Well, it may be impossible to figure out.
You might try whether this works: var notSign = "\u00AC";
edit: also: here's a good reference for HTML entities and Unicode glyphs: http://www.elizabethcastro.com/html/extras/entities.html The HTML/XML syntax uses decimal numbers for the character codes. For Javascript, you have to convert those to hex, and the notation in Javascript strings is "\u" followed by a 4-digit hex constant. (That reference isn't complete, but it's pretty easy to read and it's got lots of useful symbols.)

Categories

Resources