BigQuery + Javascript UDF - Not able to manipulate byte array from input

BigQuery + Javascript UDF - Not able to manipulate byte array from input - javascript

I'm noticing a discrepancy between a javascript function run in Node and a javascript function in a UDF in BigQuery.
I am running the following in BigQuery:
CREATE TEMP FUNCTION testHash(md5Bytes BYTES)
RETURNS BYTES
LANGUAGE js AS """
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
return md5Bytes
""";
SELECT TO_HEX(testHash(MD5("test_phrase")));
and the output ends up being cb5012e39277d48ef0b5c88bded48591. (This is incorrect)
Running the same code in Node gets cb5012e39277348eb0b5c88bded48591 (which is the expected value) - notice how 2 of the characters are different.
I've narrowed down the issue to the fact that BigQuery doesn't actually apply the bitwise operators, since the output of not running these bitwise operators in Node is the same incorrect output from BQ:
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
Any ideas why the bitwise operators are not being applied to the md5Bytes input to the UDF?

Ths bitwise operations in JavaScript UDF in BigQuery can only be applied to most significant 32 bits as mentioned in the limitations of the JavaScript UDF in this documentation. The MD5 is a hash function algorithm that takes an input and convert it into fixed-length messages of 16 bytes which is equivalent to 128 bits. Since the JavaScript UDF bitwise operations can only be applied to 32 bits that’s why it is giving unexpected output.

Related

Postman CryptoJS alternative to crypto.createHash(‘md5’).update(input).digest();

I’m trying to generate a UUID from a string, I basically want to recreate what UUID.nameUUIDFromBytes does in Java. I found this article that works perfect outside of postman, Java's UUID.nameUUIDFromBytes to written in JavaScript? - Stack Overflow
but crypto is not available in Postman. I’ve also tried CryptoJS in postman and have gotten really close, the generated UUID is off by a single character…
function javaHash(test) {
var md5Bytes = CryptoJS.MD5(test);
console.log(md5Bytes);
md5Bytes[6] &= 0x0f; /* clear version */
md5Bytes[6] |= 0x30; /* set to version 3 */
md5Bytes[8] &= 0x3f; /* clear variant */
md5Bytes[8] |= 0x80; /* set to IETF variant */
console.log(md5Bytes);
return md5Bytes.toString(CryptoJS.enc.Hex).replace(/-/g, "").replace(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, "$1-$2-$3-$4-$5");
}
console.log(javaHash('123456789'));
From looking at the value in the console it doesnt look like the value is being changed by whatever magic (setting the version and variabnt) is supposed to happen in the middle of the method.
I’ve also tried importing crypto from here: https://cdnjs.com/libraries/crypto-js, with this method: Adding External Libraries in Postman | Postman Blog
function javaHash(test) {
eval(pm.collectionVariables.get("crypto_library"));
let md5Bytes = this.crypto.createHash('md5').update(test).digest();
console.log(md5Bytes);
md5Bytes[6] &= 0x0f; /* clear version */
md5Bytes[6] |= 0x30; /* set to version 3 */
md5Bytes[8] &= 0x3f; /* clear variant */
md5Bytes[8] |= 0x80; /* set to IETF variant */
console.log(md5Bytes);
return md5Bytes.toString(CryptoJS.enc.Hex).replace(/-/g, "").replace(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, "$1-$2-$3-$4-$5");
}
but I get an error “There was an error in evaluating the Pre-request Script:TypeError: Cannot read properties of undefined (reading ‘lib’)”
Any ideas?

The problem in your CryptoJS code is that md5Bytes is not an array of bytes, so the subsequent byte manipulations fall flat.
CryptoJS internally uses the WordArray type, i.e. an array consisting of 4-byte words (s. here). This type is also returned by CryptoJS.MD5(). Because of the MD5 output size of 16 bytes the WordArray consists of 4 words (a 4 bytes).
Since the UUID algorithm modifies individual bytes, it is necessary to access and modify the bytes within the words. This is feasible e.g. by converting WordArray to Uint8Array and vice versa:
var hashWA = CryptoJS.MD5('12345');
// Conversion WordArray -> Uint8Array
var binString = hashWA.toString(CryptoJS.enc.Latin1);
var md5Bytes = Uint8Array.from(binString, x => x.charCodeAt(0))
// Byte manipulations
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
// Conversion Uint8Array -> WordArray
var uuidWA = CryptoJS.lib.WordArray.create(md5Bytes);
var uuidHex = uuidWA.toString();
var uuidFormatted = uuidHex.replace(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, "$1-$2-$3-$4-$5");
console.log(uuidFormatted)
<script src="https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js"></script>
More efficient is an MD5 implementation that returns the hash as an array of bytes (Array, ArrayBuffer, Uint8Array) so that the bytes can be accessed directly, e.g. js-md5:
var md5Bytes = md5.array('12345');
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
var uuidHex = Array.prototype.map.call(new Uint8Array(md5Bytes), x => ('00' + x.toString(16)).slice(-2)).join('');
var uuidFormatted = uuidHex.replace(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, "$1-$2-$3-$4-$5");
console.log(uuidFormatted)
<script src="https://cdn.jsdelivr.net/npm/js-md5#0.7.3/src/md5.min.js"></script>
However, I am not sure if with Postman js-md5 is an option or if there are similar libraries.
Comparison with Java: UUID.nameUUIDFromBytes("12345".getBytes(StandardCharsets.UTF_8)).toString() returns the same output: 827ccb0e-ea8a-306c-8c34-a16891f84e7b.
Note that the hex encoded MD5 hash for 12345 is 827ccb0e-ea8a-706c-4c34-a16891f84e7b (with equivalent formatting). The differences due to the byte manipulations are found at indexes 6 and 8 (0x30 instead of 0x70 and 0x8c instead of 0x4c).

Why left-shift in JS and Dart are different?

In Javascript:
255 << 24 = -16777216
In dart:
255 << 24 = 4278190080
Is there any way by which I get the same answer in Dart similar to JS ?

To get precisely the same result in Dart as in JavaScript (whether on the web or not), do:
var jsValue = (255 << 24).toSigned(32);
JavaScript converts all bitwise operations to 32-bit integers, and to signed integers for all operators except >>>.
So, do .toSigned(32) on the result to do precisely what JavaScript does.

Swift and Javascript different Bitwise calculation results

I am trying to port over a Javascript hashing function to Swift:
In Javascript:
68207947269 ^ 51 = -511529418
where as the same calculation in Swift 5
68207947269 ^ 51 = 68207947318
How, why are they different?
Edit, added the JS hashing function
function hash(str) {
var hash = 5381,
i = str.length;
while(i) {
hash = (hash * 33) ^ str.charCodeAt(--i);
}
/* JavaScript does bitwise operations (like XOR, above) on 32-bit signed
* integers. Since we want the results to be always positive, convert the
* signed int to an unsigned by doing an unsigned bitshift. */
return hash >>> 0;
}
Edit 2, added current broken Swift:
extension String {
subscript(i: Int) -> String {
return String(self[index(startIndex, offsetBy: i)])
}
}
extension Character {
func unicodeScalarCodePoint() -> UInt32 {
let characterString = String(self)
let scalars = characterString.unicodeScalars
return scalars[scalars.startIndex].value
}
}
func hashString(_ string: String) -> UInt32 {
var hash: Int32 = 5381
var i = string.count - 1
while(i >= 0) {
let char = Int32(Character(string[i]).unicodeScalarCodePoint())
// Crash here with error: `Swift runtime failure: arithmetic overflow`
let hashMultiply = Int32(truncating: NSNumber(value: hash * 33))
hash = hashMultiply ^ char
i -= 1
}
return UInt32(bitPattern: hash)
}
hashString("0x8f1083db77b5F556E46Ac46A29DE86e01031Bb14")

According to here, bitwise operators in JavaScript use 32-bit operands. 68207947269 is too large to be represented in 32 bits, so it gets truncated first automatically, then the bitwise operation is carried out.
Swift integer literals are of type Int by default, and the size of Int is platform-dependent. It is most likely 64-bits for you, which is why the different result is produced.
To produce the same result as the JavaScript code, convert it to Int32 by truncating first:
Int32(truncatingIfNeeded: 68207947269) ^ 51
Note that you get a Int32 as a result. You might need to do more type conversions later on.
About your Swift translation, I see two main problems.
Firstly, hash * 33 will overflow and cause a crash. The JavaScript version doesn't need to worry about this because the result will simply "wrap around" in JavaScript. Fortunately, there is an operator that also "wraps around" if the result overflows (rather than crashing) in Swift. So you can do:
hash &* 33
Secondly, you are handling strings differently from the JavaScript version. In JavaScript, charCodeAt returns a UTF-16 code unit, but your Swift code gets the unicode scalar instead.
To get the same behaviour, you should do:
extension String.UTF16View {
subscript(i: Int) -> Element {
return self[index(startIndex, offsetBy: i)]
}
}
...
Int32(string.utf16[i])

Note that in JavaScript:
The operands are converted to 32-bit integers and expressed by a series of bits (zeroes and ones). Numbers with more than 32 bits get their most significant bits discarded.
Integer literals in Swift are of type Int by default, which is your architecture's word width. If you tried this on any recent Apple device, it'll be 64-bit.
So to mimic JavaScript behavior, you would need to cast to an Int32:
Int32(truncatingIfNeeded: 68207947269) ^ Int32(51)
which gives the desired result -511529418.
Note that your JavaScript code discards bits of the integer 68207947269 as it cannot be represented as a 32-bit integer.

difference between JavaScript bit-wise operator code and Python bit-wise operator code

I have converted JavaScript code which uses bit-wise operators in that code to Python code, but there is one problem when i do this in JavaScript and Python
412287 << 10
then I get this 422181888 same results in both languages. but when i do this in both
424970184 << 10
then i get different results in both of the languages 1377771520 in JavaScript and 435169468416 in Python
can anybody help me with this?
any help would be appreciated.

If you want the JavaScript equivalent value then what you can do is :
import ctypes
print(ctypes.c_int(424970184 << 10 ^ 0).value)
Output:
1377771520

As stated in this SO answer, in javascript the bitwise operators and shift operators operate on 32-bit ints, and your second example overflows the 32 bit capacity, so the python equivalent would be:
(424970184 << 10) & 0x7FFFFFFF
(you get a "modulo"/"masked" value with the signed 32 bit integer mask, not the actual value)
In Python there's no limit in capacity for integers, so you get the actual value.

Why is this bitwise AND yielding incorrect numbers?

Why does Javascript incorrectly evaluate the following?
0xAABBCCDD & 0xFF00FF00
In Javascript:
console.log((0xAABBCCDD & 0xFF00FF00).toString(16)) // -55ff3400
console.log((0xAABBCCDD & 0xFF00FF00) === 0xAA00CC00) // false
In C++:
cout << hex << (0xAABBCCDD & 0xFF00FF00) << endl; // 0xAA00CC00

As Pointy pointed out in his answer, javascript uses signed 32-bit values. You can use >>> 0 to indicate that the operation is to be unsigned.
console.log(((0xAABBCCDD & 0xFF00FF00) >>> 0).toString(16)) // Prints aa00cc00

JavaScript bitwise operations involve a coercion to 32-bit values. Your values are being truncated.
edit — sorry; as the comment pointed out it's the sign bit that's the problem.

Develop Reference

JavaScript is the programming language of the Web.

BigQuery + Javascript UDF - Not able to manipulate byte array from input - javascript

Related

Postman CryptoJS alternative to crypto.createHash(‘md5’).update(input).digest();

Why left-shift in JS and Dart are different?

Swift and Javascript different Bitwise calculation results

difference between JavaScript bit-wise operator code and Python bit-wise operator code

Why is this bitwise AND yielding incorrect numbers?

Categories

Resources