What is the fastest method to calculate substring - javascript

I have a huge "binary" string, like: 1110 0010 1000 1111 0000 1100 1010 0111....
It's length is 0 modulo 4, and may reach 500,000.
I have also a corresponding array: {14, 2, 8, 15, 0, 12, 10, 7, ...}
(every number in the array corresponds to 4 bits in the string)
Given this string, this array, and a number N, I need to calculate the following substring string.substr(4*N, 4), i.e.:
for N=0 the result should be 1110
for N=1 the result should be 0010
I need to perform this task many many times, and my question is what would be the fastest method to calculate this substring ?
One method is to calculate the substring straight forward: string.substr(4*N, 4). I'm afraid this one is not efficient for such huge strings.
Another method is to use array[N].toString(2) and then wrap the result with zeros if needed. I'm not sure how fast is this.
May be you have any other ideas ?

Where does the string come from? Why not represent the string not as binary, but as hex, and then you can store each four-binary-digit section as a single character? (You could obviously pack it twice that densely if you wanted, or actually now that I think of it, 4 times, since Javascript strings are 16-bit Unicode). Then finding a single group would be a single call to "charAt()", and you'd just have to expand to the binary form via a lookup table.
edit — oh well duhh, you already have an array. In that case don't do the substring work at all; it's crazy. Just grab the array element and translate it through a lookup array into the 4-binary-digit string.

You could consider representing your huge string as a Rope data structure. A rope is basically a binary tree whose leaves are arrays of characters. A node in the tree has a left child and a right child, the left child being the first part of the string, while the right child the final part.
By using a rope, substring operations become logarithmic in complexity, rather then linear, as they are for regular strings.

If you want it padded, you could do this:
var elem = array[N]
var str = "" + ((elem>>3)&1) + ((elem>>2)&1) + ((elem>>1)&1) + (elem&1);

The array already has exactly what you need, does it not, save that you need to print it in binary format. Fortunately, sprintf for javascript is available.

Related

Implementing extendible hash table in javascript: how to use binary number as index

I'm studying data structures and trying to implement extendible hashing from scratch in Javascript and I'm confused. Here is an example I'm using as reference hash table with binary labels
Example: to store "john":35 in a table of size: 8 indexes / depth 3 (last 3 digits of binary hash)
"john" gets converted to a hash, example: 13,
13 is converted to a binary: 1101
find which index of the table 1101 belongs to, by looking at the last 3 digits "101"
This is where I'm stuck. Am I suppose to convert 101 back to decimal form (which would be 5), to then access the index by doing array[5]? Is there a way to label the array indexes in binary format like array[101] (but then wouldn't it be better to use an object?)? This seems like a lot of unnecessary extra steps to avoid just using modulo (13%8), am I missing something? Is this implementation useful in not-javascript language?
First post - thanks in advance!
Internally, all data in the computer is stored in binary, so you can't "convert" from decimal to binary since everything is already binary (it's just shown to use as decimal). If you want to print out a number as binary for debugging purposes, you can do:
console.log((5).toString(2)); // will print "101"
The .toString(2) method converts the number to a string with the binary representation of the number.
You can also write numbers in binary by starting it with 0b:
let x = 0b1101; // == 13
If you want to get the last few binary digits of a number, use the modulo operator to 2 to the power of the number of digits you want:
(0b1101 % (2**3)).toString(2) // "101"
With the table selected, you probably want to use the rest of the number that you haven't used already as the index in the table. We can use the bitshift operator, >>, to do this:
(0b1101 >> 3).toString(2) // "1", right three bits cut off
With a longer number:
// Note that underscores don't mean anything, they are just used for spacing
(0b1101_1101 >> 3).toString(2) // "11011" you can see that the right three bits have been cut off
Keep in mind that you probably shouldn't be using .toString(2) to actually store anything in the table; it should only be used for debugging.

Getting specific digits in a binary number in javascript

I need to be able to take the first 8 digits in a binary number, and save that value to a variable, then save the next 8, and so on. I read this https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators on bitwise operations, but didn't see anything about getting a specific digit or set of digits. I suppose I could just AND the number in question with another number that is all zeros except for the digits in question, which would be ones. For instance if the number in question was 10110011010111, and I wanted the first 5 digits, I could do 1000110011010111 & 0000000000011111 which would return 0000000000010111, which would be fine, but if there's a better or more direct way to do this, I would prefer that.
Edit: I'm doing this to be able to store a number as a number in base 256, so I can use color to encode information. I don't need to know the actual ones and zeros in those locations, but what number they would be taken in groups of 8, and saving that number.
You could use splice:
var str = '10110011010111';
var arr = str.split('');
console.log(arr.splice(arr.length - 5, 5).join('')); // prints 10111

Grab multiple numbers 1-10 from string

I am parsing a string of multiple numbers between 1 and 10 with the eventual goal of adding them to a set.
There will be multiple concatenated numbers after a text identifier such as {text}12345678910.
I am currently using match(/\d/g) to grab the numbers but it separates 1 and 0 in 10. I then look for 0 in my String Array, see if there's a 1 in the element before it, turn it into a 10 and delete the other entry. Not very elegant.
How can I clean up my matching code? I definitely don't need to use regex for this, but it makes grabbing the numbers fairly easy.
You could just match with this regex:
/10|\d/g
(instead of the one you use currently, not additionally)
Regex is executed left-to-right, so first it finds any occurrences of 10, and then of other digits (so using, for example /\d|10/g or even /\d|(10)/g won't work either).

Working with string (array?) of bits of an unspecified length

I'm a javascript code monkey, so this is virgin territory for me.
I have two "strings" that are just zeros and ones:
var first = "00110101011101010010101110100101010101010101010";
var second = "11001010100010101101010001011010101010101010101";
I want to perform a bitwise & (which I've never before worked with) to determine if there's any index where 1 appears in both strings.
These could potentially be VERY long strings (in the thousands of characters). I thought about adding them together as numbers, then converting to strings and checking for a 2, but javascript can't hold precision in large intervals and I get back numbers as strings like "1.1111111118215729e+95", which doesn't really do me much good.
Can I take two strings of unspecified length (they may not be the same length either) and somehow use a bitwise & to compare them?
I've already built the loop-through-each-character solution, but 1001^0110 would strike me as a major performance upgrade. Please do not give the javascript looping solution as an answer, this question is about using bitwise operators.
As you already noticed yourself, javascript has limited capabilities if it's about integer values. You'll have to chop your strings into "edible" portions and work your way through them. Since the parseInt() function accepts a base, you could convert 64 characters to an 8 byte int (or 32 to a 4 byte int) and use an and-operator to test for set bits (if (a & b != 0))
var first = "00110101011101010010101110100101010101010101010010001001010001010100011111",
second = "10110101011101010010101110100101010101010101010010001001010001010100011100",
firstInt = parseInt(first, 2),
secondInt = parseInt(second, 2),
xorResult = firstInt ^ secondInt, //524288
xorString = xorResult.toString(2); //"10000000000000000000"

compressing numbers in javascript to binary format

I need to convert the following to a binary format (and later recoup) in the smallest amount of data possible.
my_arr = [
[128,32 ,22,23],
[104,53 ,21,25],
[150,55 ,79,23],
[104,101,23,8 ],
[57 ,117,13,21],
[37 ,135,21,20],
[81 ,132,23,6 ],
[81 ,138,7 ,8 ],
[97 ,138,7 ,8 ]...
the numbers don't exceed 399
If I use a 0 for each digit (8 0's in a row = 8) and a 1 as separator, the first line looks like this:
010010000000011000100110010011001000
This is really long for numbers like 99
If I pad each number to three digits and convert each in turn to actual binary the first line looks like this:
000100101000000000110010000000100010000000100011
This works out as 12 chars per number.
As the first char won't ever be a 4 or above I can save two digts by treating 0 as 00, 1 as 01, 2 as 10 and 3 as 11. Hence 10 chars per number
On the whole this reduces the size down to about 90% of the first option (on average) but is there a shorter way?
edit: yes as a string of 1's and 0's... and it doesn’t need to be shorter than the original integers... just the shortest possible way of writing it using only 2 symbols
If the values are evenly distributed between 0 and 399, then a pretty good encoding would be to take three values and encode them as a base 400 three-digit integer. I.e. val1 + 400*val2 + 400*400*val3. Then that integer will fit nicely in 26 bits. Four successive 26-bit values will fit in 13 bytes. Then you get an average of 13/12 bytes per value.
That's about as good as you're going to be able to do, unless the distribution of values is biased or if there is repetition or correlation, in which case you would be able to compress them more.
To deal with the details, you can use the number of bytes in the encoded sequence to determine the number of values, which may not be a multiple of three. If it is not a multiple of three, then there will be one or two values on the end, coded simply as nine bits each. Since it takes eight bits to go from 18 to 26 bits to add a value, there is no ambiguity in the count.
A good starting point would be to create constant-length blocks of ones and zeroes, which gives you easy to decode strings.
400 in binary is 110010000, which requires 9 characters to encode each number as its binary representation zero-padded to constant length.
encoding the first row:
var padTo9 = function( bin ){
while( bin.length<9 ){ bin = "0" + bin; }
return bin;
}
[128,32 ,22,23].map( function(i){ return padTo9( i.toString(2) ) }).join('');
/* result:
"010000000000100000000010110000010111"
*/
decoding
"010000000000100000000010110000010111".match(/[0-1]{9}/g).map( function(i){ return parseInt( i, 2 ) });
/* result:
[128, 32, 22, 23]
*/
I think the only way to get shorter string is using variable block length, which would require adding some control symbols to tell the decoder that following numbers are encoded in a specific number of characters. But these symbols have to be in >400 and still 9 characters long, so I think it wouldn't help given random distribution of data.
max 399:
2**9 is the smallest instance of (2**n)>=399, each number can be stored as 9 bits;
convert each to binary, and concat

Categories

Resources