Reading File, Segregating and Tesseract.js Performance - javascript

I'm trying to read a file whose content is a bunch of Multiple Choice Questions jot down together like below:
SCHOOL NAME NAME OF EXAMINATION sto-vit TIME ;2 HOURS MAXIMUM MARKS 50
LFill in the blanks Marks -15 1 =0 2 (D157 39k IL True of false Marks
-5 1. Integers are closed under subtraction 2. Difference of two negative integers cannot be a positive integer. IIL. Evaluate using
distributivity property:- Marks - 10 1. -39x99 2. 8543+ 43x-15
3.53%9--109x53 4. 6817+ 683 IV. Solve the problems:- Marks - 10 1. A vehicle covers a distance of 43.2 km in 2.4 litre of petrol. How much
distance will it cover in 1 litre of petrol? V. Evaluate:- Marks - 10
11005105 2. 10101 %001
I have got this by reading an image (a question paper) with the help of Tesseract.js.
First of all, some of the mathematical digits including decimal points are omitted. Can I improve the performance?
Is there a way to identify the questions with their options separately so that it can be stored in a database for people to answer?
The object can be of this format:
{
[
q: 'Which website is this?',
options: ['Github', 'Stackoverflow', 'Google']
],
[
...
]
}

Related

Why does this code run into an infinite loop in JS?

I am trying out Advent Of Code 2021, A bunch of challenges in coding basically.
On Day 3 2021 Part 2, I run into a problem with recursion and loop-based code.
Here is the JS code:
let input = `00100
11110
10110
10111
10101
01111
00111
11100
10000
11001
00010
01010`
let numbers = input.split("\n")
let onBits = []
let offBits = []
for (let i = 0; i < numbers[0].length; i++) {
if (numbers.length == 1) {
console.log(numbers[0])
break;
}
for (let j = 0; j < numbers.length; j++) {
let bit = parseInt(numbers[j].split("")[i])
if (bit == 1) {
onBits.push(numbers[j])
} else if (bit == 0) {
offBits.push(numbers[j])
}
}
if (onBits.length > offBits.length) {
numbers = onBits
console.log(`${onBits.length} > ${offBits.length}`)
} else if (onBits.length == offBits.length) {
numbers = onBits
console.log(`${onBits.length} == ${offBits.length} so OnBits.`)
} else if (onBits.length < offBits.length) {
numbers = offBits
console.log(`${onBits.length} < ${offBits.length}`)
}
}
I am basically getting the input and splitting it, Then I get the most common first bit and get every number that has that as their first bit.
So, for example,
The first common bit is 1, that means the numbers array will become an array that has every number that has 1 as their first bit.
And then I loop this with the resulted array until I get one specified number.
But, for some reason this goes into an infinite loop?
The offBits apparently increase rapidly with no reason into an infinite loop.
Any help is appreciated.
In this case, I am trying to find the oxygen generator rating.
Sorry for making this complicated.
Full problem (part 1, part 2)
--- Day 3: Binary Diagnostic ---
The submarine has been making some odd creaking noises, so you ask it to produce a diagnostic report just in case.
The diagnostic report (your puzzle input) consists of a list of binary numbers which, when decoded properly, can tell you many useful things about the conditions of the submarine. The first parameter to check is the power consumption.
You need to use the binary numbers in the diagnostic report to generate two new binary numbers (called the gamma rate and the epsilon rate). The power consumption can then be found by multiplying the gamma rate by the epsilon rate.
Each bit in the gamma rate can be determined by finding the most common bit in the corresponding position of all numbers in the diagnostic report. For example, given the following diagnostic report:
00100
11110
10110
10111
10101
01111
00111
11100
10000
11001
00010
01010
Considering only the first bit of each number, there are five 0 bits and seven 1 bits. Since the most common bit is 1, the first bit of the gamma rate is 1.
The most common second bit of the numbers in the diagnostic report is 0, so the second bit of the gamma rate is 0.
The most common value of the third, fourth, and fifth bits are 1, 1, and 0, respectively, and so the final three bits of the gamma rate are 110.
So, the gamma rate is the binary number 10110, or 22 in decimal.
The epsilon rate is calculated in a similar way; rather than use the most common bit, the least common bit from each position is used. So, the epsilon rate is 01001, or 9 in decimal. Multiplying the gamma rate (22) by the epsilon rate (9) produces the power consumption, 198.
Use the binary numbers in your diagnostic report to calculate the gamma rate and epsilon rate, then multiply them together. What is the power consumption of the submarine? (Be sure to represent your answer in decimal, not binary.)
--- Part Two ---
Next, you should verify the life support rating, which can be determined by multiplying the oxygen generator rating by the CO2 scrubber rating.
Both the oxygen generator rating and the CO2 scrubber rating are values that can be found in your diagnostic report - finding them is the tricky part. Both values are located using a similar process that involves filtering out values until only one remains. Before searching for either rating value, start with the full list of binary numbers from your diagnostic report and consider just the first bit of those numbers. Then:
Keep only numbers selected by the bit criteria for the type of rating value for which you are searching. Discard numbers which do not match the bit criteria.
If you only have one number left, stop; this is the rating value for which you are searching.
Otherwise, repeat the process, considering the next bit to the right.
The bit criteria depends on which type of rating value you want to find:
To find oxygen generator rating, determine the most common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 1 in the position being considered.
To find CO2 scrubber rating, determine the least common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 0 in the position being considered.
For example, to determine the oxygen generator rating value using the same example diagnostic report from above:
Start with all 12 numbers and consider only the first bit of each number. There are more 1 bits (7) than 0 bits (5), so keep only the 7 numbers with a 1 in the first position: 11110, 10110, 10111, 10101, 11100, 10000, and 11001.
Then, consider the second bit of the 7 remaining numbers: there are more 0 bits (4) than 1 bits (3), so keep only the 4 numbers with a 0 in the second position: 10110, 10111, 10101, and 10000.
In the third position, three of the four numbers have a 1, so keep those three: 10110, 10111, and 10101.
In the fourth position, two of the three numbers have a 1, so keep those two: 10110 and 10111.
In the fifth position, there are an equal number of 0 bits and 1 bits (one each). So, to find the oxygen generator rating, keep the number with a 1 in that position: 10111.
As there is only one number left, stop; the oxygen generator rating is 10111, or 23 in decimal.
Then, to determine the CO2 scrubber rating value from the same example above:
Start again with all 12 numbers and consider only the first bit of each number. There are fewer 0 bits (5) than 1 bits (7), so keep only the 5 numbers with a 0 in the first position: 00100, 01111, 00111, 00010, and 01010.
Then, consider the second bit of the 5 remaining numbers: there are fewer 1 bits (2) than 0 bits (3), so keep only the 2 numbers with a 1 in the second position: 01111 and 01010.
In the third position, there are an equal number of 0 bits and 1 bits (one each). So, to find the CO2 scrubber rating, keep the number with a 0 in that position: 01010.
As there is only one number left, stop; the CO2 scrubber rating is 01010, or 10 in decimal.
Finally, to find the life support rating, multiply the oxygen generator rating (23) by the CO2 scrubber rating (10) to get 230.
Use the binary numbers in your diagnostic report to calculate the oxygen generator rating and CO2 scrubber rating, then multiply them together. What is the life support rating of the submarine? (Be sure to represent your answer in decimal, not binary.)

Can one encrypt an n-digit number, returning a unique n-digit number?

StackOverflow warns me that I may be down-voted for this question, but I'd appreciate your not doing so, as I post this simply to try to understand a programming exercise I've been posed with, and over which I've been puzzling a while now.
I'm doing some javascript coding exercises and one of the assignments was to devise an "encryption function", encipher, which would encrypt a 4-digit number by multiplying it by a number sufficiently low such that none of its digits exceeds 9, so that a 4-digit number is returned. Thus
encipher(0204)
might yield
0408
where the multiplier would have been 2. -- This is very basic material, simply to practice the Javascript. -- But as far as I can see, the numbers returned can never be deciphered (which is the next part of the exercise). Even if you store a dictionary internal to encipher, along the lines of
{'0408':'2'}, etc
so that you could do a lookup on 0408 and return 0204, these entries could not be assured to be unique. If one for example were to get the number 9999 to be deciphered, one would never know whether the original number was 9999 (multiplied by 1), 3333 (multiplied by 3) or 1111 (multiplied by 9). Is that correct? I realise this is a fairly silly and artificial problem, but I'm trying to understand if the instructions to the exercise are not quite right, or if I'm missing something. Here is the original problem:
Now, let's add one more level of security. After changing the position of the digits, we will multiply each member by a number whose multiplication does not exceed 10. (If it is higher than 10, we will get a two-digit multiplication and the code will no longer be 4 values). Now, implement in another function the decrypter (), which will receive as an argument an encrypted code (and correspondingly multiplied in the section above and return the decrypted code.
Leaving the exercise behind, I'm just curious whether there exists any way to "encrypt" (when I say "encrypt", I mean at a moderate javascript level, as I'm not a cryptography expert) an n-digit number and return a unique n-digit number?
Thanks for any insights. --
encrypt a 4-digit number by multiplying it by a number sufficiently low such that none of its digits exceeds 9, so that a 4-digit number is returned
If your input is 9999, there is no integer other than 1 or 0 that you can multiply your input by and get a positive number with a maximum of 4 digits. Therefore, there is no solution that involves only integer multiplication. However, integer multiplication can be used as part of an algorithm such as rotating digits (see below).
If instead you're looking for some sort of bijective algorithm (one that uniquely maps A to B and B to A), you can look at something like rotating the digits left or right, reversing the order of the digits, or using a unique mapping of each individual digit to another. Those can also be mixed.
Examples
Rotate
1234 -> 2341
Reverse
1234 -> 4321
Remap digits e.g. 2 mapped to 8, 3 mapped to 1
2323 -> 8181
Note that none of these are cryptographically sound methods to encrypt information, but they do seem to more-or-less meet the objectives of the exercise.

Splitting a Long String Into Multiple Lines using javascript [duplicate]

This question already has answers here:
split string into array of n words per index
(5 answers)
Closed 6 years ago.
I have the following string:
The water content is considered acceptable for this voltage class. Dielectric Breakdown Voltage is unacceptable for transformers > 288 KV. Power factors, Interfacial Tension and Neutralization Number are acceptable for continued use in-service.".
I want to split the string into lines so that every line will contain at max 5 words in each line.
I want to control the number of words in each line dynamically, so that tomorrow I will be able to split the string into lines where each line contain at max N sentences in each line.
var string="The water content is considered acceptable for this voltage class. Dielectric Breakdown Voltage is unacceptable for transformers > 288 KV. Power factors, Interfacial Tension and Neutralization Number are acceptable for continued use in-service.";
var yourSplit=function(N,string){
var app=string.split(' '),
arrayApp=[],
stringApp="";
app.forEach(function(sentence,index){
stringApp+=sentence+' ';
if((index+1)%N===0){
arrayApp.push(stringApp);
stringApp='';
}else if(app.length===index+1 && stringApp!==''){
arrayApp.push(stringApp);
stringApp='';
}
});
return arrayApp;
};
console.log(yourSplit(5,string));
console.log(yourSplit(3,string));
console.log(yourSplit(8,string));

What is the meaning of "<<" in javascript? [duplicate]

This question already has answers here:
What are bitwise shift (bit-shift) operators and how do they work?
(10 answers)
Closed 6 years ago.
A user named ZPiDER answered a question about generating random colour strings in JS.
Random color generator in JavaScript
This is the code:
"#"+((1<<24)*Math.random()|0).toString(16)
I am trying to parse it to understand how it works but I really don't get it. Could someone please Explain what the << means?
I tried google but I suspect that the search engines interpret the characters as special somehow.
It is the left shift operator just like in many other languages like C or Java.
1<<24 means, the 1 left shifted by 24 bits, so you get 0x1000000. Multiplied by a random value (that is from 0 inclusive to 1 exclusive) you get something between 0x000000 and 0xFFFFFF. This is exactly what you want to do for a random color.
But keep in mind, that the author of this code does not respect, that this random function does not generated uniformly distributed random values. So it is likely that you do not get a "real" random color, but something very close to it.
This is a bit shift: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators
The number 1 (2^0) is being shifted left 24 bits to become 16777216 (2^24). From the documentation:
<< (Left shift)
This operator shifts the first operand the specified number of bits to the left. Excess bits shifted off to the left are discarded. Zero bits are shifted in from the right.
For example, 9 << 2 yields 36:
<< or >> is bit operation, bit shift. This takes two arguments, for example
x << y. This is x: 1 1 0 0 0 1 1 1 . Lets shift it by 3 bytes right: 1 1 1 1 1 0 0 0

Hash 32bit int to 16bit int?

What are some simple ways to hash a 32-bit integer (e.g. IP address, e.g. Unix time_t, etc.) down to a 16-bit integer?
E.g. hash_32b_to_16b(0x12345678) might return 0xABCD.
Let's start with this as a horrible but functional example solution:
function hash_32b_to_16b(val32b) {
return val32b % 0xffff;
}
Question is specifically about JavaScript, but feel free to add any language-neutral solutions, preferably without using library functions.
The context for this question is generating unique IDs (e.g. a 64-bit ID might be composed of several 16-bit hashes of various 32-bit values). Avoiding collisions is important.
Simple = good. Wacky+obfuscated = amusing.
The key to maximizing the preservation of entropy of some original 32-bit 'signal' is to ensure that each of the 32 input bits has an independent and equal ability to alter the value of the 16-bit output word.
Since the OP is requesting a bit-size which is exactly half of the original, the simplest way to satisfy this criteria is to xor the upper and lower halves, as others have mentioned. Using xor is optimal because—as is obvious by the definition of xor—independently flipping any one of the 32 input bits is guaranteed to change the value of the 16-bit output.
The problem becomes more interesting when you need further reduction beyond just half-the-size, say from a 32-bit input to, let's say, a 2-bit output. Remember, the goal is to preserve as much entropy from the source as possible, so solutions which involve naively masking off the two lowest bits with (i & 3) are generally heading in the wrong direction; doing that guarantees that there's no way for any bits except the unmasked bits to affect the result, and that generally means there's an arbitrary, possibly valuable part of the runtime signal which is being summarily discarded without principle.
Following from the earlier paragraph, you could of course iterate with xor three additional times to produce a 2-bit output with the desired property of being equally-influenced by each/any of the input bits. That solution is still optimally correct of course, but involves looping or multiple unrolled operations which, as it turns out, aren't necessary!
Fortunately, there is a nice technique of only two operations which gives the same optimal result for this situation. As with xor, it not only ensures that, for any given 32-bit value, twiddling any input bit will result in a change to the 2-bit output, but also that, given a uniform distribution of input values, the distribution of 2-bit output values will also be perfectly uniform. In the current example, the method divides the 4,294,967,296 possible input values into exactly 1,073,741,824 each of the four possible 2-bit hash results { 0, 1, 2, 3 }.
The method I mention here uses specific magic values that I discovered via exhaustive search, and which don't seem to be discussed very much elsewhere on the internet, at least for the particular use under discussion here (i.e., ensuring a uniform hash distribution that's maximally entropy-preserving). Curiously, according to this same exhaustive search, the magic values are in fact unique, meaning that for each of target bit-widths { 16, 8, 4, 2 }, the magic value I show below is the only value that, when used as I show here, satisfies the perfect hashing criteria outlined above.
Without further ado, the unique and mathematically optimal procedure for hashing 32-bits to n = { 16, 8, 4, 2 } is to multiply by the magic value corresponding to n (unsigned, discarding overflow), and then take the n highest bits of the result. To isolate those result bits as a hash value in the range [0 ... (2ⁿ - 1)], simply right-shift (unsigned!) the multiplication result by 32 - n bits.
The "magic" values, and C-like expression syntax are as follows:
Method
Maximum-entropy-preserving hash for reducing 32 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------ ----------- -----------------------
16 0x80008001 16 (i * 0x80008001) >> 16
8 0x80808081 24 (i * 0x80808081) >> 24
4 0x88888889 28 (i * 0x88888889) >> 28
2 0xAAAAAAAB 30 (i * 0xAAAAAAAB) >> 30
Maximum-entropy-preserving hash for reducing 64 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------------ ----------- -------------------------------
32 0x8000000080000001 32 (i * 0x8000000080000001) >> 32
16 0x8000800080008001 48 (i * 0x8000800080008001) >> 48
8 0x8080808080808081 56 (i * 0x8080808080808081) >> 56
4 0x8888888888888889 60 (i * 0x8888888888888889) >> 60
2 0xAAAAAAAAAAAAAAAB 62 (i * 0xAAAAAAAAAAAAAAAB) >> 62
Notes:
Use unsigned multiply and discard any overflow (64-bit multiply is not needed).
If isolating the result using right-shift (as shown), be sure to use an unsigned shift operation.
Further discussion
I find this all this quite cool. In practical terms, the key information-theoretical requirement is the guar­antee that, for any m-bit input value and its corresponding n-bit hash value result, flipping any one of the m source bits always causes some change in the n-bit result value. Now al­though there are 2ⁿ possible result values in total, one of them is already "in-use" (by the result itself) since "switching" to that one from any other result would be no change at all. This leaves 2ⁿ - 1 result values that are eligible to be used by the entire set of m input values flipped by a single bit.
Let's consider an example; in fact, to show how this technique might seem to border on spooky or downright magical, we'll consider the more extreme case where m = 64 and n = 2. With 2 output bits there are four possible result values, { 0, 1, 2, 3 }. Assuming an arbitrary 64-bit input value 0x7521d9318fbdf523, we obtain its 2-bit hash value of 1:
(0x7521d9318fbdf523 * 0xAAAAAAAAAAAAAAAB) >> 62 // result --> '1'
So the result is 1 and the claim is that no value in the set of 64 values where a single-bit of 0x7521d9318fbdf523 is toggled may have that same result value. That is, none of those 64 other results can use value 1 and all must instead use either 0, 2, or 3. So in this example it seems like every one of the 2⁶⁴ input values—to the exclusion of 64 other input values—will selfishly hog one-quarter of the output space for itself. When you consider the sheer magnitude of these interacting constraints, can a simultaneously satisfying solution overall even exist?
Well sure enough, to show that (exactly?) one does, here are the hash result values, listed in order, for inputs that flipping a single bit of 0x7521d9318fbdf523 (one at a time), from MSB (position 63) down to LSB (0).
3 2 0 3 3 3 3 3 3 0 0 0 3 0 3 3 0 3 3 3 0 0 3 3 3 0 0 3 3 0 3 3 // continued…
0 0 3 0 0 3 0 3 0 0 0 3 0 3 3 3 0 3 0 3 3 3 3 3 3 0 0 0 3 0 0 3 // notice: no '1' values
As you can see, there are no 1 values, which entails that every bit in the source "as-is" must be contributing to influence the result (or, if you prefer, the de facto state of each-and-every bit in 0x7521d9318fbdf523 is essential to keeping the entire overall result from being "not-1"). Because no matter what single-bit change you make to the 64-bit input, the 2-bit result value will no longer be 1.
Keep in mind that the "missing-value" table shown above was dumped from the analysis of just the one randomly-chosen example value 0x7521d9318fbdf523; every other possible input value has a similar table of its own, each one eerily missing its owner's actual result value while yet somehow being globally consistent across its set-membership. This property essentially corresponds to maximally preserving the available entropy during the (inherently lossy) bit-width reduction task.
So we see that every one of the 2⁶⁴ possible source values independently imposes, on exactly 64 other source values, the constraint of excluding one of the possible result values. What defies my intuition about this is that there are untold quadrillions of these 64-member sets, each of whose members also belongs to 63 other, seemingly unrelated bit-twiddling sets. Yet somehow despite this most confounding puzzle of interwoven constraints, it is nevertheless trivial to exploit the one (I surmise) resolution which simultaneously satisfies them all exactly.
All this seems related to something you may have noticed in the tables above: namely, I don't see any obvious way to extend the technique to the case of compressing down to a 1-bit result. In this case, there are only two possible result values { 0, 1 }, so if any/every given (e.g.) 64-bit input value still summarily excludes its own result from being the result for all 64 of its single-bit-flip neighbors, then that now essentially imposes the other, only remaining value on those 64. The math breakdown we see in the table seems to be signalling that a simultaneous result under such conditions is a bridge too far.
In other words, the special 'information-preserving' characteristic of xor (that is, its luxuriously reliable guarantee that, as opposed to and, or, etc., it c̲a̲n̲ and w̲i̲l̲l̲ always change a bit) not surprisingly exacts a certain cost, namely, a fiercely non-negotiable demand for a certain amount of elbow room—at least 2 bits—to work with.
I think this is the best you're going to get. You could compress the code to a single line but the var's are there for now as documentation:
function hash_32b_to_16b(val32b) {
var rightBits = val32b & 0xffff; // Left-most 16 bits
var leftBits = val32b & 0xffff0000; // Right-most 16 bits
leftBits = leftBits >>> 16; // Shift the left-most 16 bits to a 16-bit value
return rightBits ^ leftBits; // XOR the left-most and right-most bits
}
Given the parameters of the problem, the best solution would have each 16-bit hash correspond to exactly 2^16 32-bit numbers. It would also IMO hash sequential 32-bit numbers differently. Unless I'm missing something, I believe this solution does those two things.
I would argue that security cannot be a consideration in this problem, as the hashed value is just too few bits. I believe that the solution I gave provides even distribution of 32-bit numbers to 16-bit hashes
This depends on the nature of the integers.
If they can contain some bit-masks, or can differ by powers of two, then simple XORs will have high probability of collisions.
You can try something like (i>>16) ^ ((i&0xffff) * p) with p being a prime number.
Security-hashes like MD5 are all good, but they are obviously an overkill here. Anything more complex than CRC16 is overkill.
I would say just apply a standard hash like sha1 or md5 and then grab the last 16 bits of that.
Assuming that you expect the least significant bits to 'vary' the most, I think you're probably going to get a good enough distribution by just using the lower 16-bits of the value as a hash.
If the numbers you're going to hash won't have that kind of distribution, then the additional step of xor-ing in the upper 16 bits might be helpful.
Of course this suggestion is if you're intending to use the hash merely for some sort of lookup/storage scheme and aren't looking for the crypto-related properties of non-guessability and non-reversability (which the xor-ing suggestions don't really buy you either).
Something simple like this....
function hash_32b_to_16b(val32b) {
var h = hmac(secretKey, sha512);
var v = val32b;
for(var i = 0; i < 4096; ++i)
v = h(v);
return v % 0xffff;
}

Categories

Resources