Working with string (array?) of bits of an unspecified length - javascript

I'm a javascript code monkey, so this is virgin territory for me.
I have two "strings" that are just zeros and ones:
var first = "00110101011101010010101110100101010101010101010";
var second = "11001010100010101101010001011010101010101010101";
I want to perform a bitwise & (which I've never before worked with) to determine if there's any index where 1 appears in both strings.
These could potentially be VERY long strings (in the thousands of characters). I thought about adding them together as numbers, then converting to strings and checking for a 2, but javascript can't hold precision in large intervals and I get back numbers as strings like "1.1111111118215729e+95", which doesn't really do me much good.
Can I take two strings of unspecified length (they may not be the same length either) and somehow use a bitwise & to compare them?
I've already built the loop-through-each-character solution, but 1001^0110 would strike me as a major performance upgrade. Please do not give the javascript looping solution as an answer, this question is about using bitwise operators.

As you already noticed yourself, javascript has limited capabilities if it's about integer values. You'll have to chop your strings into "edible" portions and work your way through them. Since the parseInt() function accepts a base, you could convert 64 characters to an 8 byte int (or 32 to a 4 byte int) and use an and-operator to test for set bits (if (a & b != 0))

var first = "00110101011101010010101110100101010101010101010010001001010001010100011111",
second = "10110101011101010010101110100101010101010101010010001001010001010100011100",
firstInt = parseInt(first, 2),
secondInt = parseInt(second, 2),
xorResult = firstInt ^ secondInt, //524288
xorString = xorResult.toString(2); //"10000000000000000000"

Related

How to generate a GUID with a custom alphabet, that behaves similar to an MD5 hash (in JavaScript)?

I am wondering how to generate a GUID given an input string, such that the same input string results in the same GUID (sort of like an MD5 hash). The problem with MD5 hashes is they just guarantee low collision rate, rather than uniqueness. Instead I would like something like this:
guid('v1.0.0') == 1231231231123123123112312312311231231231
guid('v1.0.1') == 6154716581615471658161547165816154716581
guid('v1.0.2') == 1883939319188393931918839393191883939319
How would you go about implementing this sort of thing (ideally in JavaScript)? Is it even possible to do? I am not sure where to start. Things like the uuid module don't take a seed string, and they don't let you use a custom format/alphabet.
I am not looking for the canonical UUID format, but rather a GUID, ideally one made up of just integers.
What you would need is define a one-to-one mapping of text strings (such as "v1.0.0") onto 40 digit long strings (such as "123123..."). This is also known as a bijection, although in your case an injection (a simple one-to-one mapping from inputs to outputs, not necessarily onto) may be enough. As you note, hash functions don't necessarily ensure this mapping, but there are other possibilities, such as full-period linear congruential generators (if they take a seed that you can map one-to-one onto input string values), or other reversible functions.
However, if the set of possible input strings is larger than the set of possible output strings, then you can't map all input strings one-to-one with all output strings (without creating duplicates), due to the pigeonhole principle.
For example, you can't generally map all 120-character strings one-to-one with all 40-digit strings unless you restrict the format of the 120-character strings in some way. However, your problem of creating 40-digit output strings can be solved if you can accept limiting input strings to no more than 1040 values (about 132 bits), or if you can otherwise exploit redundancy in the input strings so that they are guaranteed to compress losslessly to 40 decimal digits (about 132 bits) or less, which may or may not be possible. See also this question.
The algorithm involves two steps:
First, transform the string to a BigInt by building up the string's charCodeAt() values similarly to the stringToInt method given in another answer. Throw an error if any charCodeAt() is 0x80 or greater, or if the resulting BigInt is equal to or greater than BigInt(alphabet_length)**BigInt(output_length).
Then, transform the integer to another string by taking the mod of the BigInt and the output alphabet's size and replacing each remainder with the corresponding character in the output alphabet, until the BigInt reaches 0.
One approach would be to use the method from that answer:
/*
* uuid-timestamp (emitter)
* UUID v4 based on timestamp
*
* Created by tarkh
* tarkh.com (C) 2020
* https://stackoverflow.com/a/63344366/1261825
*/
const uuidEmit = () => {
// Get now time
const n = Date.now();
// Generate random
const r = Math.random(); // <- swap this
// Stringify now time and generate additional random number
const s = String(n) + String(~~(r*9e4)+1e4);
// Form UUID and return it
return `${s.slice(0,8)}-${s.slice(8,12)}-4${s.slice(12,15)}-${[8,9,'a','b'][~~(r*3)]}${s.slice(15,18)}-${s.slice(s.length-12)}`;
};
// Generate 5 UUIDs
console.log(`${uuidEmit()}
${uuidEmit()}
${uuidEmit()}
${uuidEmit()}
${uuidEmit()}`);
And simply swap out the Math.random() call to a different random function which can take your seed value. (There are numerous algorithms out there for creating a seedable random method, so I won't try prescribing a particular one).
Most random seeds expect numeric, so you could convert a seed string to an integer by just adding up the character values (multiplying each by 10^position so you'll always get a unique number):
const stringToInt = str =>
Array.prototype.slice.call(str).reduce((result, char, index) => result += char.charCodeAt(0) * (10**(str.length - index)), 0);
console.log(stringToInt("v1.0.0"));
console.log(stringToInt("v1.0.1"));
console.log(stringToInt("v1.0.2"));
If you want to generate the same extract string every time, you can take a similar approach to tarkh's uuidEmit() method but get rid of the bits that change:
const strToInt = str =>
Array.prototype.slice.call(str).reduce((result, char, index) => result += char.charCodeAt(0) * (10**(str.length - index)), 0);
const strToId = (str, len = 40) => {
// Generate random
const r = strToInt(str);
// Multiply the number by some things to get it to the right number of digits
const rLen = `${r}`.length; // length of r as a string
// If you want to avoid any chance of collision, you can't provide too long of a string
// If a small chance of collision is okay, you can instead just truncate the string to
// your desired length
if (rLen > len) throw new Error('String too long');
// our string length is n * (r+m) + e = len, so we'll do some math to get n and m
const mMax = 9; // maximum for the exponent, too much longer and it might be represented as an exponent. If you discover "e" showing up in your string, lower this value
let m = Math.floor(Math.min(mMax, len / rLen)); // exponent
let n = Math.floor(len / (m + rLen)); // number of times we repeat r and m
let e = len - (n * (rLen + m)); // extra to pad us to the right length
return (new Array(n)).fill(0).map((_, i) => String(r * (i * 10**m))).join('')
+ String(10**e);
};
console.log(strToId("v1.0.0"));
console.log(strToId("v1.0.1"));
console.log(strToId("v1.0.2"));
console.log(strToId("v1.0.0") === strToId("v1.0.0")); // check they are the same
console.log(strToId("v1.0.0") === strToId("v1.0.1")); // check they are different
Note, this will only work with smaller strings, (probably about 10 characters top) but it should be able to avoid all collisions. You could tweak it to handle larger strings (remove the multiplying bit from stringToInt) but then you risk collisions.
I suggest using MD5...
Following the classic birthday problem, all things being equal, the odds of 2 people sharing a birthday out of a group of 23 people is ( see https://en.wikipedia.org/wiki/Birthday_problem )...
For estimating MD5 collisions, I'm going to simplify the birthday problem formula, erring in the favor of predicting a higher chance of a collision...
Note though that whereas in the birthday problem, a collision is a positive result, in the MD5 problem, a collision is a negative result, and therefore providing higher than expected collision odds provides a conservative estimate of the chance of a MD5 collision. Plus this higher predicted chance can in some way be considered a fudge factor for any uneven distribution in the MD5 output, although I do not believe there is anyway to quantify this without a God computer...
An MD5 hash is 16 bytes long, resulting in a range of 256^16 possible values. Assuming that the MD5 algorithm is generally uniform in its results, lets suppose we create one quadrillion (ie, a million billion or 10^15) unique strings to run through the hash algorithm. Then using the modified formula (to ease the collision calculations and to add a conservative fudge factor), the odds of a collision are...
So, after 10^15 or one quadrillion unique input strings, the estimated odds of a hash collision are on par with the odds of winning the Powerball or the Mega Millions Jackpot (which are on order of 1 in ~300,000,000 per https://www.engineeringbigdata.com/odds-winning-powerball-grand-prize-r/ ).
Note too that 256^16 is 340282366920938463463374607431768211456, which is 39 digits, falling within the desired range of 40 digits.
So, suggest using the MD5 hash ( converting to BigInt ), and if you do run into a collision, I will be more than glad to spot you a lottery ticket, just to have a chance to tap into your luck and split the proceeds...
( Note: I used https://keisan.casio.com/calculator for the calculations. )
While UUID v4 is just used for random ID generation, UUID v5 is more like a hash for a given input string and namespace. It's perfect for what you describe.
As you already mentioned, You can use this npm package:
npm install uuid
And it's pretty easy to use.
import {v5 as uuidv5} from 'uuid';
// use a UUIDV4 as a unique namespace for your application.
// you can generate one here: https://www.uuidgenerator.net/version4
const UUIDV5_NAMESPACE = '...';
// Finally, provide the input and namespace to get your unique id.
const uniqueId = uuidv5(input, namespace);

Reassembling negative Python marshal int's into Javascript numbers

I'm writing a client-side Python bytecode interpreter in Javascript (specifically Typescript) for a class project. Parsing the bytecode was going fine until I tried out a negative number.
In Python, marshal.dumps(2) gives 'i\x02\x00\x00\x00' and marshal.dumps(-2) gives 'i\xfe\xff\xff\xff'. This makes sense as Python represents integers using two's complement with at least 32 bits of precision.
In my Typescript code, I use the equivalent of Node.js's Buffer class (via a library called BrowserFS, instead of ArrayBuffers and etc.) to read the data. When I see the character 'i' (i.e. buffer.readUInt8(offset) == 105, signalling that the next thing is an int), I then call readInt32LE on the next offset to read a little-endian signed long (4 bytes). This works fine for positive numbers but not for negative numbers: for 1 I get '1', but for '-1' I get something like '-272777233'.
I guess that Javascript represents numbers in 64-bit (floating point?). So, it seems like the following should work:
var longval = buffer.readInt32LE(offset); // reads a 4-byte long, gives -272777233
var low32Bits = longval & 0xffff0000; //take the little endian 'most significant' 32 bits
var newval = ~low32Bits + 1; //invert the bits and add 1 to negate the original value
//but now newval = 272826368 instead of -2
I've tried a lot of different things and I've been stuck on this for days. I can't figure out how to recover the original value of the Python integer from the binary marshal string using Javascript/Typescript. Also I think I deeply misunderstand how bits work. Any thoughts would be appreciated here.
Some more specific questions might be:
Why would buffer.readInt32LE work for positive ints but not negative?
Am I using the correct method to get the 'most significant' or 'lowest' 32 bits (i.e. does & 0xffff0000 work how I think it does?)
Separate but related: in an actual 'long' number (i.e. longer than '-2'), I think there is a sign bit and a magnitude, and I think this information is stored in the 'highest' 2 bits of the number (i.e. at number & 0x000000ff?) -- is this the correct way of thinking about this?
The sequence ef bf bd is the UTF-8 sequence for the "Unicode replacement character", which Unicode encoders use to represent invalid encodings.
It sounds like whatever method you're using to download the data is getting accidentally run through a UTF-8 decoder and corrupting the raw datastream. Be sure you're using blob instead of text, or whatever the equivalent is for the way you're downloading the bytecode.
This got messed up only for negative values because positive values are within the normal mapping space of UTF-8 and thus get translated 1:1 from the original byte stream.

Is the smallest usable memory size 2 bytes, In javascript?

To represent a single character in JS, we use " or ', i.e use a string of length 1.
But since JavaScript uses the UTF-16 encoding, the character would be 16 bit as well.
Would it be safe to thus say, 2 bytes is the smallest re presentable size, that is possible in JS.
Or would there be some data structure, which can represent a single byte (8 bit) which I am missing.
EDIT:
I understand using the datatype with word size (usually 4 bytes) is always the most efficient, but I just wondered if the creators of JS would have cared to include something in.
Actionscript 3 derives from ECMAScript as well & it includes a variable size data type i.e an integer in AS3 is 1 to 4 bytes of storage.
Note : I'm not really looking for a hack to fit in multiple bytes into a larger datatype.
Guess the answer is most probably no.
Well, not in ECMAScript but in node.js you have Buffer and in browsers you have Uint8Array
var b = new UInt8Array(16); //Holds 16 bytes
Technically, you can have an array of 8 booleans - but there might be significant overhead in storing the array itself.
Taking a page from c-style unions, sure, you can create your own data structure which does just that. Using a large int as a bitmap.
var memory = 0;
function setbyte(addr, value) {
memory = memory | (value << addr);
}
function getbyte(addr) {
return 0xFF & (memory>>addr)
}

Bitwise check in Javascript

I play a game, and in the database we set 100663296 to be a GM Leader but also this field in the database gets written to for different things, so it changes that number to 100794368
i was told to possible use a bit-wise check to check whether the first number is the same as the second number, and I have googled on using bit-wise checks but got confused as to what to use for my check.
Here are some other numbers that change, including the one from above.
predefined number new changed number/ever changing number.
100663296 = 100794368
67108864 = 67239936
117440512 = 2231767040
so how should i go about checking these numbers?
And here is part of my code that i was using before i noticed the change in the numbers.
if (playerData[i].nameflags == 67108864)
{
playerRows += '<img src ="icons/GM-Icon.png" alt="GM" title="GM"></img>';
}
thx to Bergi, for the answer.
if (playerData[i].nameflags & 0x400000 /* === 0x400000 */)
this seams to work great.
also thx to vr1911428
and every one else for the help on this.
So let's convert those numbers to binary representation (unsigned integer):
> 100663296
00000110000000000000000000000000
> 100794368
00000110000000100000000000000000
> 67108864
00000100000000000000000000000000
> 67239936
00000100000000100000000000000000
> 117440512
00000111000000000000000000000000
> 2231767040
10000101000001100001000000000000
Notice that the last number is out of the scope of JavaScripts bitwise arithmetic, which only works with 32-bit signed integers - you won't be able to use the leftmost bit.
So which bits do you want to compare now? There are lots of possibilities, the above scheme doesn't make it clear, yet it looks like you are looking for the 27th bit from the right (226 = 67108864). To match against it, you can apply a binary AND bitmask:
x & Math.pow(2, 26)
which should evaluate to 226 again or zero - so you can just check for truthiness. Btw, instead of using pow you could use hexadecimal notation: 0x4000000. With that, your condition will look like this:
if (playerData[i].nameflags & 0x400000 /* === 0x400000 */)
If you need to check for full bitwise equality of two integers, all you need is just '==' operator, but to use it, you should guarantee that both operands are integers:
left = 12323;
right = 12323;
if (left == right)
alert("Operands are binary equal; I'll guarantee that. :-)");
Be very careful though; if at least one of operands is string representing number, not a number, both operands will be considered strings and you can get confusing results:
left = "012323";
right = 12323;
if (left != right)
alert("Operands are not equal, even though they represent 'equal values', as they are compared as strings.");
In general, these days, the attempt to operate with strings representing data instead of data itself is a real curse of the beginners; and it's hard to explain to them. It is especially difficult to explain in JavaScript, with its loose-type typing concept, which is itself very complex and hard to understand, behind the illusory simplicity.
Finally, if you need to compare separate bits (and, from your question, I don't see this need), you can use binary operators:
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Operators/Bitwise_Operators
That's it, basically.
...................

Dealing With Binary / Bitshifts in JavaScript

I am trying to perform some bitshift operations and dealing with binary numbers in JavaScript.
Here's what I'm trying to do. A user inputs a value and I do the following with it:
// Square Input and mod with 65536 to keep it below that value
var squaredInput = (inputVal * inputVal) % 65536;
// Figure out how many bits is the squared input number
var bits = Math.floor(Math.log(squaredInput) / Math.log(2)) + 1;
// Convert that number to a 16-bit number using bitshift.
var squaredShifted = squaredInput >>> (16 - bits);
As long as the number is larger than 46, it works. Once it is less than 46, it does not work.
I know the problem is the in bitshift. Now coming from a C background, I know this would be done differently, since all numbers will be stored in 32-bit format (given it is an int). Does JavaScript do the same (since it vars are not typed)?
If so, is it possible to store a 16-bit number? If not, can I treat it as 32-bits and do the required calculations to assume it is 16-bits?
Note: I am trying to extract the middle 4-bits of the 16-bit value in squaredInput.
Another note: When printing out the var, it just prints out the value without the padding so I couldn't figure it out. Tried using parseInt and toString.
Thanks
Are you looking for this?
function get16bitnumber( inputVal ){
return ("0000000000000000"+(inputVal * inputVal).toString(2)).substr(-16);
}
This function returns last 16 bits of (inputVal*inputVal) value.By having binary string you could work with any range of bits.
Don't use bitshifting in JS if you don't absolutely have to. The specs mention at least four number formats
IEEE 754
Int32
UInt32
UInt16
It's really confusing to know which is used when.
For example, ~ applies a bitwise inversion while converting to Int32. UInt16 seems to be used only in String.fromCharCode. Using bitshift operators converts the operands to either UInt32 or to Int32.
In your case, the right shift operator >>> forces conversion to UInt32.
When you type
a >>> b
this is what you get:
ToUInt32(a) >>> (ToUInt32(b) & 0x1f)

Categories

Resources