Postman CryptoJS alternative to crypto.createHash(‘md5’).update(input).digest(); - javascript

I’m trying to generate a UUID from a string, I basically want to recreate what UUID.nameUUIDFromBytes does in Java. I found this article that works perfect outside of postman, Java's UUID.nameUUIDFromBytes to written in JavaScript? - Stack Overflow
but crypto is not available in Postman. I’ve also tried CryptoJS in postman and have gotten really close, the generated UUID is off by a single character…
function javaHash(test) {
var md5Bytes = CryptoJS.MD5(test);
console.log(md5Bytes);
md5Bytes[6] &= 0x0f; /* clear version */
md5Bytes[6] |= 0x30; /* set to version 3 */
md5Bytes[8] &= 0x3f; /* clear variant */
md5Bytes[8] |= 0x80; /* set to IETF variant */
console.log(md5Bytes);
return md5Bytes.toString(CryptoJS.enc.Hex).replace(/-/g, "").replace(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, "$1-$2-$3-$4-$5");
}
console.log(javaHash('123456789'));
From looking at the value in the console it doesnt look like the value is being changed by whatever magic (setting the version and variabnt) is supposed to happen in the middle of the method.
I’ve also tried importing crypto from here: https://cdnjs.com/libraries/crypto-js, with this method: Adding External Libraries in Postman | Postman Blog
function javaHash(test) {
eval(pm.collectionVariables.get("crypto_library"));
let md5Bytes = this.crypto.createHash('md5').update(test).digest();
console.log(md5Bytes);
md5Bytes[6] &= 0x0f; /* clear version */
md5Bytes[6] |= 0x30; /* set to version 3 */
md5Bytes[8] &= 0x3f; /* clear variant */
md5Bytes[8] |= 0x80; /* set to IETF variant */
console.log(md5Bytes);
return md5Bytes.toString(CryptoJS.enc.Hex).replace(/-/g, "").replace(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, "$1-$2-$3-$4-$5");
}
but I get an error “There was an error in evaluating the Pre-request Script:TypeError: Cannot read properties of undefined (reading ‘lib’)”
Any ideas?

The problem in your CryptoJS code is that md5Bytes is not an array of bytes, so the subsequent byte manipulations fall flat.
CryptoJS internally uses the WordArray type, i.e. an array consisting of 4-byte words (s. here). This type is also returned by CryptoJS.MD5(). Because of the MD5 output size of 16 bytes the WordArray consists of 4 words (a 4 bytes).
Since the UUID algorithm modifies individual bytes, it is necessary to access and modify the bytes within the words. This is feasible e.g. by converting WordArray to Uint8Array and vice versa:
var hashWA = CryptoJS.MD5('12345');
// Conversion WordArray -> Uint8Array
var binString = hashWA.toString(CryptoJS.enc.Latin1);
var md5Bytes = Uint8Array.from(binString, x => x.charCodeAt(0))
// Byte manipulations
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
// Conversion Uint8Array -> WordArray
var uuidWA = CryptoJS.lib.WordArray.create(md5Bytes);
var uuidHex = uuidWA.toString();
var uuidFormatted = uuidHex.replace(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, "$1-$2-$3-$4-$5");
console.log(uuidFormatted)
<script src="https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js"></script>
More efficient is an MD5 implementation that returns the hash as an array of bytes (Array, ArrayBuffer, Uint8Array) so that the bytes can be accessed directly, e.g. js-md5:
var md5Bytes = md5.array('12345');
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
var uuidHex = Array.prototype.map.call(new Uint8Array(md5Bytes), x => ('00' + x.toString(16)).slice(-2)).join('');
var uuidFormatted = uuidHex.replace(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, "$1-$2-$3-$4-$5");
console.log(uuidFormatted)
<script src="https://cdn.jsdelivr.net/npm/js-md5#0.7.3/src/md5.min.js"></script>
However, I am not sure if with Postman js-md5 is an option or if there are similar libraries.
Comparison with Java: UUID.nameUUIDFromBytes("12345".getBytes(StandardCharsets.UTF_8)).toString() returns the same output: 827ccb0e-ea8a-306c-8c34-a16891f84e7b.
Note that the hex encoded MD5 hash for 12345 is 827ccb0e-ea8a-706c-4c34-a16891f84e7b (with equivalent formatting). The differences due to the byte manipulations are found at indexes 6 and 8 (0x30 instead of 0x70 and 0x8c instead of 0x4c).

Related

BigQuery + Javascript UDF - Not able to manipulate byte array from input

I'm noticing a discrepancy between a javascript function run in Node and a javascript function in a UDF in BigQuery.
I am running the following in BigQuery:
CREATE TEMP FUNCTION testHash(md5Bytes BYTES)
RETURNS BYTES
LANGUAGE js AS """
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
return md5Bytes
""";
SELECT TO_HEX(testHash(MD5("test_phrase")));
and the output ends up being cb5012e39277d48ef0b5c88bded48591. (This is incorrect)
Running the same code in Node gets cb5012e39277348eb0b5c88bded48591 (which is the expected value) - notice how 2 of the characters are different.
I've narrowed down the issue to the fact that BigQuery doesn't actually apply the bitwise operators, since the output of not running these bitwise operators in Node is the same incorrect output from BQ:
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
Any ideas why the bitwise operators are not being applied to the md5Bytes input to the UDF?
Ths bitwise operations in JavaScript UDF in BigQuery can only be applied to most significant 32 bits as mentioned in the limitations of the JavaScript UDF in this documentation. The MD5 is a hash function algorithm that takes an input and convert it into fixed-length messages of 16 bytes which is equivalent to 128 bits. Since the JavaScript UDF bitwise operations can only be applied to 32 bits that’s why it is giving unexpected output.

Swift and Javascript different Bitwise calculation results

I am trying to port over a Javascript hashing function to Swift:
In Javascript:
68207947269 ^ 51 = -511529418
where as the same calculation in Swift 5
68207947269 ^ 51 = 68207947318
How, why are they different?
Edit, added the JS hashing function
function hash(str) {
var hash = 5381,
i = str.length;
while(i) {
hash = (hash * 33) ^ str.charCodeAt(--i);
}
/* JavaScript does bitwise operations (like XOR, above) on 32-bit signed
* integers. Since we want the results to be always positive, convert the
* signed int to an unsigned by doing an unsigned bitshift. */
return hash >>> 0;
}
Edit 2, added current broken Swift:
extension String {
subscript(i: Int) -> String {
return String(self[index(startIndex, offsetBy: i)])
}
}
extension Character {
func unicodeScalarCodePoint() -> UInt32 {
let characterString = String(self)
let scalars = characterString.unicodeScalars
return scalars[scalars.startIndex].value
}
}
func hashString(_ string: String) -> UInt32 {
var hash: Int32 = 5381
var i = string.count - 1
while(i >= 0) {
let char = Int32(Character(string[i]).unicodeScalarCodePoint())
// Crash here with error: `Swift runtime failure: arithmetic overflow`
let hashMultiply = Int32(truncating: NSNumber(value: hash * 33))
hash = hashMultiply ^ char
i -= 1
}
return UInt32(bitPattern: hash)
}
hashString("0x8f1083db77b5F556E46Ac46A29DE86e01031Bb14")
According to here, bitwise operators in JavaScript use 32-bit operands. 68207947269 is too large to be represented in 32 bits, so it gets truncated first automatically, then the bitwise operation is carried out.
Swift integer literals are of type Int by default, and the size of Int is platform-dependent. It is most likely 64-bits for you, which is why the different result is produced.
To produce the same result as the JavaScript code, convert it to Int32 by truncating first:
Int32(truncatingIfNeeded: 68207947269) ^ 51
Note that you get a Int32 as a result. You might need to do more type conversions later on.
About your Swift translation, I see two main problems.
Firstly, hash * 33 will overflow and cause a crash. The JavaScript version doesn't need to worry about this because the result will simply "wrap around" in JavaScript. Fortunately, there is an operator that also "wraps around" if the result overflows (rather than crashing) in Swift. So you can do:
hash &* 33
Secondly, you are handling strings differently from the JavaScript version. In JavaScript, charCodeAt returns a UTF-16 code unit, but your Swift code gets the unicode scalar instead.
To get the same behaviour, you should do:
extension String.UTF16View {
subscript(i: Int) -> Element {
return self[index(startIndex, offsetBy: i)]
}
}
...
Int32(string.utf16[i])
Note that in JavaScript:
The operands are converted to 32-bit integers and expressed by a series of bits (zeroes and ones). Numbers with more than 32 bits get their most significant bits discarded.
Integer literals in Swift are of type Int by default, which is your architecture's word width. If you tried this on any recent Apple device, it'll be 64-bit.
So to mimic JavaScript behavior, you would need to cast to an Int32:
Int32(truncatingIfNeeded: 68207947269) ^ Int32(51)
which gives the desired result -511529418.
Note that your JavaScript code discards bits of the integer 68207947269 as it cannot be represented as a 32-bit integer.

Convert hex value to unicode character

I'm trying to convert the hex value 1f600, which is the smiley emoji to its character representation by:
String.fromCharCode(parseInt("1f600", 16));
but this just generates a square symbol.
Most emojis require two code units, including that one. fromCharCode works in code units (JavaScript's "characters" are UTF-16 code units except invalid surrogate pairs are tolerated), not code points (actual Unicode characters).
In modern environments, you'd use String.fromCodePoint or just a Unicode codepoint escape sequence (\u{XXXXX} rather than \uXXXX, which is for code units). There's also no need for parseInt:
console.log(String.fromCodePoint(0x1f600));
console.log("\u{1f600}");
In older environments, you have to supply the surrogate pair, which in that case is 0xD83D 0xDE00:
console.log("\uD83D\uDE00");
...or use a polyfill for fromCodePoint.
If for some reason you don't want to use a polyfill in older environments, and your starting point is a code point, you have to figure out the code units. You can see how to do that in MDN's polyfill linked above, or here's how the Unicode UTF-16 FAQ says to do it:
Using the following type definitions
typedef unsigned int16 UTF16;
typedef unsigned int32 UTF32;
the first snippet calculates the high (or leading) surrogate from a character code C.
const UTF16 HI_SURROGATE_START = 0xD800
UTF16 X = (UTF16) C;
UTF32 U = (C >> 16) & ((1 << 5) - 1);
UTF16 W = (UTF16) U - 1;
UTF16 HiSurrogate = HI_SURROGATE_START | (W << 6) | X >> 10;
where X, U and W correspond to the labels used in Table 3-5 UTF-16 Bit Distribution. The next snippet does the same for the low surrogate.
const UTF16 LO_SURROGATE_START = 0xDC00
UTF16 X = (UTF16) C;
UTF16 LoSurrogate = (UTF16) (LO_SURROGATE_START | X & ((1 << 10) - 1));
JavaScript uses UTF-16, so instead of U+1F600 you need to get U+D83D U+DE00 - that is, String.fromCharCode(0xd83d, 0xde00)
Note that you can use 0x#### instead of parseInt("####",16).
To convert a UTF-8 position to its UTF-16 equivalent, here's the steps:
var input = 0x1f600;
var code = input - 0x10000;
var high = (code >> 10) + 0xD800;
var low = (code & 0x3FF) + 0xDC00;
var output = String.fromCharCode(high, low);
use fromCodepoint function instead of fromCharCode
String.fromCodePoint(0x1f600)

Is there an equivalent to C's *(unsigned int*)(char) = 123 in Javascript?

I'm dealing with some C source code I'm trying to convert over to Javascript, I've hit a snag at this line
char ddata[512];
*(unsigned int*)(ddata+0)=123;
*(unsigned int*)(ddata+4)=add-8;
memset(ddata+8,0,add-8);
I'm not sure exactly what is happening here, I understand they're casting the char to an unsigned int, but what is the ddata+0 and stuff doing here? Thanks.
You can't say.
That's because the behaviour on casting a char* to an unsigned* is undefined unless the pointer started off as an unsigned*, which, in your case it didn't.
ddata + 0 is equivalent to ddata.
ddata + 4 is equivalent to &ddata[4], i.e. the address of the 5th element of the array.
For what it's worth, it looks like the C programmer is attempting to serialise a couple of unsigned literals into a byte array. But the code is a mess; aside from what I've already said they appear to be assuming that an unsigned occupies 4 bytes, which is not necessarily the case.
The code fragment is storing a record id (123) as 4 byte integer in the first 4 bytes of a char buffer ddata. It then stores a length (add-8) in the following 4 bytes and finally initializes the following add-8 bytes to 0.
Translating this to javascript can be done in different ways, but probably not by constructing a string with the same contents. The reason is strings a not byte buffers in javascript, they contain unicode code points, so writing the string to storage might perform some unwanted conversions.
The best solution depends on your actual target platform, where byte arrays may be available to more closely match the intended semantics of your C code.
Note that the above code is not portable and has undefined behavior for various reasons, notably because ddata might not be properly aligned to be used as the address to store an unsigned int via the cast *(unsigned int*)ddata = 123;, because it assumes int to be 4 bytes and because it relies on unspecified byte ordering.
On the Redhat linux box it probably works as expected, and the same C code would probably perform correctly on MacOS, that uses the same Intel architecture with little endian ordering. How best to translate this to Javascript requires more context and specifications.
In the mean time, the code would best be rewritten this way:
unsigned char ddata[512];
if (add <= 512) {
ddata[0] = 123;
ddata[1] = 0;
ddata[2] = 0;
ddata[3] = 0;
ddata[4] = ((add-8) >> 0) & 255;
ddata[5] = ((add-8) >> 8) & 255;
ddata[6] = ((add-8) >> 16) & 255;
ddata[7] = ((add-8) >> 24) & 255;
memset(ddata + 8, 0, add - 8);
}

How many bytes in a JavaScript string?

I have a javascript string which is about 500K when being sent from the server in UTF-8. How can I tell its size in JavaScript?
I know that JavaScript uses UCS-2, so does that mean 2 bytes per character. However, does it depend on the JavaScript implementation? Or on the page encoding or maybe content-type?
You can use the Blob to get the string size in bytes.
Examples:
console.info(
new Blob(['😂']).size, // 4
new Blob(['👍']).size, // 4
new Blob(['😂👍']).size, // 8
new Blob(['👍😂']).size, // 8
new Blob(['I\'m a string']).size, // 12
// from Premasagar correction of Lauri's answer for
// strings containing lone characters in the surrogate pair range:
// https://stackoverflow.com/a/39488643/6225838
new Blob([String.fromCharCode(55555)]).size, // 3
new Blob([String.fromCharCode(55555, 57000)]).size // 4 (not 6)
);
This function will return the byte size of any UTF-8 string you pass to it.
function byteCount(s) {
return encodeURI(s).split(/%..|./).length - 1;
}
Source
JavaScript engines are free to use UCS-2 or UTF-16 internally. Most engines that I know of use UTF-16, but whatever choice they made, it’s just an implementation detail that won’t affect the language’s characteristics.
The ECMAScript/JavaScript language itself, however, exposes characters according to UCS-2, not UTF-16.
Source
If you're using node.js, there is a simpler solution using buffers :
function getBinarySize(string) {
return Buffer.byteLength(string, 'utf8');
}
There is a npm lib for that : https://www.npmjs.org/package/utf8-binary-cutter (from yours faithfully)
String values are not implementation dependent, according the ECMA-262 3rd Edition Specification, each character represents a single 16-bit unit of UTF-16 text:
4.3.16 String Value
A string value is a member of the type String and is a
finite ordered sequence of zero or
more 16-bit unsigned integer values.
NOTE Although each value usually
represents a single 16-bit unit of
UTF-16 text, the language does not
place any restrictions or requirements
on the values except that they be
16-bit unsigned integers.
These are 3 ways I use:
TextEncoder
new TextEncoder().encode("myString").length
Blob
new Blob(["myString"]).size
Buffer
Buffer.byteLength("myString", 'utf8')
Try this combination with using unescape js function:
const byteAmount = unescape(encodeURIComponent(yourString)).length
Full encode proccess example:
const s = "1 a ф № # ®"; // length is 11
const s2 = encodeURIComponent(s); // length is 41
const s3 = unescape(s2); // length is 15 [1-1,a-1,ф-2,№-3,#-1,®-2]
const s4 = escape(s3); // length is 39
const s5 = decodeURIComponent(s4); // length is 11
Note that if you're targeting node.js you can use Buffer.from(string).length:
var str = "\u2620"; // => "☠"
str.length; // => 1 (character)
Buffer.from(str).length // => 3 (bytes)
The size of a JavaScript string is
Pre-ES6: 2 bytes per character
ES6 and later: 2 bytes per character,
or 5 or more bytes per character
Pre-ES6
Always 2 bytes per character. UTF-16 is not allowed because the spec says "values must be 16-bit unsigned integers". Since UTF-16 strings can use 3 or 4 byte characters, it would violate 2 byte requirement. Crucially, while UTF-16 cannot be fully supported, the standard does require that the two byte characters used are valid UTF-16 characters. In other words, Pre-ES6 JavaScript strings support a subset of UTF-16 characters.
ES6 and later
2 bytes per character, or 5 or more bytes per character. The additional sizes come into play because ES6 (ECMAScript 6) adds support for Unicode code point escapes. Using a unicode escape looks like this: \u{1D306}
Practical notes
This doesn't relate to the internal implemention of a particular engine. For
example, some engines use data structures and libraries with full
UTF-16 support, but what they provide externally doesn't have to be
full UTF-16 support. Also an engine may provide external UTF-16
support as well but is not mandated to do so.
For ES6, practically speaking characters will never be more than 5
bytes long (2 bytes for the escape point + 3 bytes for the Unicode
code point) because the latest version of Unicode only has 136,755
possible characters, which fits easily into 3 bytes. However this is
technically not limited by the standard so in principal a single
character could use say, 4 bytes for the code point and 6 bytes
total.
Most of the code examples here for calculating byte size don't seem to take into account ES6 Unicode code point escapes, so the results could be incorrect in some cases.
UTF-8 encodes characters using 1 to 4 bytes per code point. As CMS pointed out in the accepted answer, JavaScript will store each character internally using 16 bits (2 bytes).
If you parse each character in the string via a loop and count the number of bytes used per code point, and then multiply the total count by 2, you should have JavaScript's memory usage in bytes for that UTF-8 encoded string. Perhaps something like this:
getStringMemorySize = function( _string ) {
"use strict";
var codePoint
, accum = 0
;
for( var stringIndex = 0, endOfString = _string.length; stringIndex < endOfString; stringIndex++ ) {
codePoint = _string.charCodeAt( stringIndex );
if( codePoint < 0x100 ) {
accum += 1;
continue;
}
if( codePoint < 0x10000 ) {
accum += 2;
continue;
}
if( codePoint < 0x1000000 ) {
accum += 3;
} else {
accum += 4;
}
}
return accum * 2;
}
Examples:
getStringMemorySize( 'I' ); // 2
getStringMemorySize( '❤' ); // 4
getStringMemorySize( '𠀰' ); // 8
getStringMemorySize( 'I❤𠀰' ); // 14
The answer from Lauri Oherd works well for most strings seen in the wild, but will fail if the string contains lone characters in the surrogate pair range, 0xD800 to 0xDFFF. E.g.
byteCount(String.fromCharCode(55555))
// URIError: URI malformed
This longer function should handle all strings:
function bytes (str) {
var bytes=0, len=str.length, codePoint, next, i;
for (i=0; i < len; i++) {
codePoint = str.charCodeAt(i);
// Lone surrogates cannot be passed to encodeURI
if (codePoint >= 0xD800 && codePoint < 0xE000) {
if (codePoint < 0xDC00 && i + 1 < len) {
next = str.charCodeAt(i + 1);
if (next >= 0xDC00 && next < 0xE000) {
bytes += 4;
i++;
continue;
}
}
}
bytes += (codePoint < 0x80 ? 1 : (codePoint < 0x800 ? 2 : 3));
}
return bytes;
}
E.g.
bytes(String.fromCharCode(55555))
// 3
It will correctly calculate the size for strings containing surrogate pairs:
bytes(String.fromCharCode(55555, 57000))
// 4 (not 6)
The results can be compared with Node's built-in function Buffer.byteLength:
Buffer.byteLength(String.fromCharCode(55555), 'utf8')
// 3
Buffer.byteLength(String.fromCharCode(55555, 57000), 'utf8')
// 4 (not 6)
A single element in a JavaScript String is considered to be a single UTF-16 code unit. That is to say, Strings characters are stored in 16-bit (1 code unit), and 16-bit is equal to 2 bytes (8-bit = 1 byte).
The charCodeAt() method can be used to return an integer between 0 and 65535 representing the UTF-16 code unit at the given index.
The codePointAt() can be used to return the entire code point value for Unicode characters, e.g. UTF-32.
When a UTF-16 character can't be represented in a single 16-bit code unit, it will have a surrogate pair and therefore use two code units( 2 x 16-bit = 4 bytes)
See Unicode encodings for different encodings and their code ranges.
The Blob interface's size property returns the size of the Blob or File in bytes.
const getStringSize = (s) => new Blob([s]).size;
I'm working with an embedded version of the V8 Engine.
I've tested a single string. Pushing each step 1000 characters. UTF-8.
First test with single byte (8bit, ANSI) Character "A" (hex: 41).
Second test with two byte character (16bit) "Ω" (hex: CE A9) and the
third test with three byte character (24bit) "☺" (hex: E2 98 BA).
In all three cases the device prints out of memory at
888 000 characters and using ca. 26 348 kb in RAM.
Result: The characters are not dynamically stored. And not with only 16bit. - Ok, perhaps only for my case (Embedded 128 MB RAM Device, V8 Engine C++/QT) - The character encoding has nothing to do with the size in ram of the javascript engine. E.g. encodingURI, etc. is only useful for highlevel data transmission and storage.
Embedded or not, fact is that the characters are not only stored in 16bit.
Unfortunally I've no 100% answer, what Javascript do at low level area.
Btw. I've tested the same (first test above) with an array of character "A".
Pushed 1000 items every step. (Exactly the same test. Just replaced string to array) And the system bringt out of memory (wanted) after 10 416 KB using and array length of 1 337 000.
So, the javascript engine is not simple restricted. It's a kind more complex.
You can try this:
var b = str.match(/[^\x00-\xff]/g);
return (str.length + (!b ? 0: b.length));
It worked for me.

Categories

Resources