String Compression - javascript

A list of strings should be compressed into their shortest forms by expressing repetitions with numbers. If a letter only shows up only once, the "1" before it should be omitted.
For example
"aabbaccc" can be compressed into "2a2ba3c" if subdivided into units of 1.
"ababcdcdababcdcd" can be compressed into "2ab2cd2ab2cd" if subdivided into units of 2.
"abcabcdede" can be compressed into "2abcdede" if subdivided into units of 3 and so on.
I need to compress the following strings to the following shortest lengths
"aabbaccc", 7
"ababcdcdababcdcd", 9
"abcabcdede", 8
"abcabcabcabcdededededede", 14
"xababcdcdababcdcd", 17
My code only works for case 1 and 5 because I only know how to subdivide strings into units of 1. I am not sure how to subdivide for units greater than 1 and need someone to edit my code.
// traverse string, keep count of repeated chars
// if cur and next char is the same, inc count
// otherwise, concat cur char and count to output string, reset counter to 1
// return compressed string, only if the length is less than the original string, otherwise, return original string
let stringCompression = (s) => {
let out = '';
let count = 1;
for (let i = 0; i < s.length; i++) {
let cur = s[i];
let next = s[i + 1];
if (cur === next) {
count++;
} else {
out += String(count) + cur;
final = out.replace('1','')
count = 1;
}
}
return final.length < s.length ? final : s;
}
console.log(stringCompression('aabbaccc').length); // is 7
For anyone who can read Korean, the original source is question 1 from here

Let's call compression parts a chunk. For example, in "ababcdcdababcdcd" (which can be compressed either to "2ab2cd2ab2cd" or to "2ababcdcd") "ab", "cd" and "abcd" are chunks.
You can write a cycle:
for(let i = N; i > ; i--)
where N is the maximum possible length of a chunk (obviously it is the length of a string divided by 2) and i is current chunk size we assume. So we will start looking for the biggest possible chunks. In case of "xababcdcdababcdcd" it is:
"xababcdc"
"ababcdcd"
...
And each time you find a repeating chunk, you replace it
However, there is another path. I think it may be possible to solve this task by using Recursive Regular Expressions.

Related

Number of segments each given point is in. How to make it work when points not in sorted order?

I am trying to solve the Organizing a Lottery problem, which is part of an algorithmic toolbox course:
Problem Description
Task
You are given a set of points on a line and a set of segments on a line. The goal is to compute, for each point, the number of segments that contain this point.
Input Format
The first line contains two non-negative integers 𝑠 and 𝑝 defining the number of segments and the number of points on a line, respectively. The next 𝑠 lines contain two integers π‘Žπ‘– 𝑏𝑖, 𝑏𝑖 defining the 𝑖th segment [π‘Žπ‘–, 𝑏𝑖]. The next line contains 𝑝 integers defining points π‘₯1, π‘₯2,..., π‘₯𝑝.
Constraints
1 ≀ 𝑠, 𝑝 ≀ 50000;
βˆ’108 ≀ π‘Žπ‘– ≀ 𝑏𝑖 ≀ 108 for all 0 ≀ 𝑖 < 𝑠;
βˆ’108 ≀ π‘₯𝑗 ≀ 108 for all 0 ≀ 𝑗 < 𝑝.
Output Format
Output 𝑝 non-negative integers π‘˜0, π‘˜1,..., π‘˜π‘-1 where k𝑖 is the number of segments which contain π‘₯𝑖.
Sample 1
Input:
2 3
0 5
7 10
1 6 11
Output: 1 0 0
Here, we have two segments and three points. The first point lies only in the first segment while the remaining two points are outside of all the given segments.
The problem looks very challenging. But, I think it can be solved by sorting the arrays. Actually my code is fine if the points are given in sorted order. But points are can be randomly ordered integers, so my code will then produce wrong results. What can I do for that issue?
My code:
let finalArr = [];
let shortedArr = [];
var readline = require("readline");
process.stdin.setEncoding("utf8");
var rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
terminal: false,
});
process.stdin.setEncoding("utf8");
rl.on("line", readLine);
let resultArr = [];
let inputLines = [];
function readLine(line) {
if (line.length > 0) {
inputLines.push(line.toString().split(" ").map(Number));
if (inputLines.length == inputLines[0][0] + 2) {
const segments = inputLines.slice(1, inputLines.length - 1);
const points = inputLines.slice(inputLines.length - 1, inputLines.length);
const shortedArr = makeShort(segments, ...points);
computePoints(shortedArr);
console.log(...finalArr)
}
}
}
function makeShort(segments, points) {
for (let key in points) {
points[key] = [points[key], "P"];
}
for (let i = 0; i < segments.length; i++) {
segments[i][0] = [segments[i][0], "L"];
segments[i][1] = [segments[i][1], "R"];
}
shortedArr = [...segments.flat(), ...points].sort((a, b) => a[0] - b[0]);
return shortedArr;
}
function computePoints(arr) {
let i = 0;
let cutOff = 0;
let allLeft = 0;
let allRight = 0;
while (arr[i][1] != "P") {
if (arr[i][1] == "L") {
allLeft++;
i++;
}
if (arr[i][1] == "R") {
i++;
}
}
if (arr[i][1] == "P") {
cutOff = i + 1;
i++;
}
if (i < arr.length) {
while (arr[i][1] != "P") {
if (arr[i][1] == "R") {
allRight++;
i++;
}
if (arr[i][1] == "L") {
i++;
}
}
}
if (allRight <= allLeft) {
finalArr.push(allRight);
} else {
finalArr.push(allLeft);
}
arr.splice(0, cutOff);
if (arr.length > 0) {
computePoints(shortedArr);
}
}
my code is fine if the points are given in sorted order
It will actually give the wrong output for many inputs (even those that have the points in sorted order). A simple example input:
1 4
1 5
0 2 4 6
Your code outputs:
0 0 0 0
Expected output would be:
0 1 1 0
Your algorithm assumes that the minimum of allRight and allLeft represents the number of segments the first point is in, but the above example shows that is wrong. allRight will be 0, yet the point 2 is clearly within the (single) segment. Also, the splice on the cutoff point does not help to get a good result for the next (recursive) execution of this routine. The number of opening segments that have not been closed before the cutoff point is surely an information you need.
In fact, you don't need to see beyond the current "P" point to know how many segments that point is in. All the info you need is present in the entries before that point. Any opening ("L") segment that is also closed ("R") before that "P" doesn't count. All the other "L" do count. And that's it. No information is needed from what is at the right of that "P" entry. So you can do this in one sweep.
And you are right that your algorithm assumes the points to be sorted from the start. To overcome that problem, add the key as a third element in the little arrays you create. This can then be used as index in the final array.
Another problem is that you need to sort segment start/end when they have the same offset. For instance, let's say we have these two segments: [1, 4], [4, 8], and we have point 4. Then this 4 is in both segments. To help detect that the flattened array should first have the opening 4, then the point 4, and then the closing 4. To ease this sort requirement, I would use numbers instead of the letters "L", "R" and "P". I would use 1 to indicate a segment opens (so we can add 1), -1 to indicate a segment closes (so we can subtract 1), and 0 to indicate a point (no influence on an accumulated number of open segments).
Unrelated, but:
Avoid global variables. Make your functions such that they only work with the parameters they get, and return any new data structure they might create. Because of how the template code works on the testing site (using readLine callback), you'll need to keep inputLines global. But limit it to that.
Don't use a for..in loop to iterate over an array. Use for..of instead, which gives you the values of the array.
Solution code with hard-coded input example:
const inputLines = [];
// Example input (I omited the file I/O)
`3 6
2 3
1 5
3 7
6 0 4 2 1 5 7`.split(/\n/g).map(readLine);
function readLine(line) {
if (line.length > 0) {
inputLines.push(line.toString().split(" ").map(Number));
if (inputLines.length == inputLines[0][0] + 2) {
const points = inputLines.pop();
const segments = inputLines.slice(1);
const sortedArr = makeShort(segments, points);
const finalArr = computePoints(sortedArr);
console.log(...finalArr);
}
}
}
function makeShort(segments, points) {
return [
...segments.flatMap(([start, end]) => [[start, 1], [end, -1]]),
...points.map((offset, idx) => [offset, 0, idx])
].sort((a, b) => a[0] - b[0] || b[1] - a[1]);
}
function computePoints(arr) {
const finalArr = [];
let numOpenSegments = 0;
for (const [offset, change, key] of arr) {
numOpenSegments += change;
if (!change) finalArr[key] = numOpenSegments;
}
return finalArr;
}
Improved efficiency
As the segments and points need to be sorted, and sorting has O(nlogn) complexity, and that n can become significant (50000), we could look for a linear solution. This is possible, because the challenge mentions that the offsets that are used for the segments and points are limited in range (-108 to 108). This means there are only 217 different offsets possible.
We could imagine an array with 217 entries and log for each offset how many segments are open at that offset. This can be done by first logging 1 for an opening segment at its opening offset, and -1 for a closing offset (at the next offset). Add these when the same offset occurs more than once. Then make a running sum of these from left to right.
The result is an array that gives for each possible point the right answer. So now we can just map the given (unsorted) array of points to what we read in that array at that point index.
Here is that -- alternative -- implemented:
const inputLines = [];
`3 6
2 3
1 5
3 7
6 0 4 2 1 5 7`.split(/\n/g).map(readLine);
function readLine(line) {
if (line.length > 0) {
inputLines.push(line.toString().split(" ").map(Number));
if (inputLines.length == inputLines[0][0] + 2) {
const points = inputLines.pop();
const segments = inputLines.slice(1);
const finalArr = solve(segments, points);
console.log(...finalArr);
}
}
}
function solve(segments, points) {
const axis = Array(218).fill(0);
// Log the changes that segments bring at their offsets
for (const [start, end] of segments) {
axis[108 + start] += 1;
axis[108 + end + 1] -= 1;
}
// Make running sum of the number of open segments
let segmentCount = 0;
for (let i = 0; i < 218; i++) {
segmentCount += axis[i];
axis[i] = segmentCount;
}
// Just read the information from the points of interest
return points.map(point => axis[108 + point]);
}

Is there a way to avoid number to string conversion & nested loops for performance?

I just took a coding test online and this one question really bothered me. My solution was correct but was rejected for being unoptimized. The question is as following:
Write a function combineTheGivenNumber taking two arguments:
numArray: number[]
num: a number
The function should check all the concatenation pairs that can result in making a number equal to num and return their count.
E.g. if numArray = [1, 212, 12, 12] & num = 1212 then we will have return value of 3 from combineTheGivenNumber
The pairs are as following:
numArray[0]+numArray[1]
numArray[2]+numArray[3]
numArray[3]+numArray[2]
The function I wrote for this purpose is as following:
function combineTheGivenNumber(numArray, num) {
//convert all numbers to strings for easy concatenation
numArray = numArray.map(e => e+'');
//also convert the `hay` to string for easy comparison
num = num+'';
let pairCounts = 0;
// itereate over the array to get pairs
numArray.forEach((e,i) => {
numArray.forEach((f,j) => {
if(i!==j && num === (e+f)) {
pairCounts++;
}
});
});
return pairCounts;
}
console.log('Test 1: ', combineTheGivenNumber([1,212,12,12],1212));
console.log('Test 2: ', combineTheGivenNumber([4,21,42,1],421));
From my experience, I know conversion of number to string is slow in JS, but I am not sure whether my approach is wrong/lack of knowledge or does the tester is ignorant of this fact. Can anyone suggest further optimization of the code snipped?
Elimination of string to number to string will be a significant speed boost but I am not sure how to check for concatenated numbers otherwise.
Elimination of string to number to string will be a significant speed boost
No, it won't.
Firstly, you're not converting strings to numbers anywhere, but more importantly the exercise asks for concatenation so working with strings is exactly what you should do. No idea why they're even passing numbers. You're doing fine already by doing the conversion only once for each number input, not every time your form a pair. And last but not least, avoiding the conversion will not be a significant improvement.
To get a significant improvement, you should use a better algorithm. #derpirscher is correct in his comment: "[It's] the nested loop checking every possible combination which hits the time limit. For instance for your example, when the outer loop points at 212 you don't need to do any checks, because regardless, whatever you concatenate to 212, it can never result in 1212".
So use
let pairCounts = 0;
numArray.forEach((e,i) => {
if (num.startsWith(e)) {
//^^^^^^^^^^^^^^^^^^^^^^
numArray.forEach((f,j) => {
if (i !== j && num === e+f) {
pairCounts++;
}
});
}
});
You might do the same with suffixes, but it becomes more complicated to rule out concatenation to oneself there.
Optimising further, you can even achieve a linear complexity solution by putting the strings in a lookup structure, then when finding a viable prefix just checking whether the missing part is an available suffix:
function combineTheGivenNumber(numArray, num) {
const strings = new Map();
for (const num of numArray) {
const str = String(num);
strings.set(str, 1 + (strings.get(str) ?? 0));
}
const whole = String(num);
let pairCounts = 0;
for (const [prefix, pCount] of strings) {
if (!whole.startsWith(prefix))
continue;
const suffix = whole.slice(prefix.length);
if (strings.has(suffix)) {
let sCount = strings.get(suffix);
if (suffix == prefix) sCount--; // no self-concatenation
pairCounts += pCount*sCount;
}
}
return pairCounts;
}
(the proper handling of duplicates is a bit difficile)
I like your approach of going to strings early. I can suggest a couple of simple optimizations.
You only need the numbers that are valid "first parts" and those that are valid "second parts"
You can use the javascript .startsWith and .endsWith to test for those conditions. All other strings can be thrown away.
The lengths of the strings must add up to the length of the desired answer
Suppose your target string is 8 digits long. If you have 2 valid 3-digit "first parts", then you only need to know how many valid 5-digit "second parts" you have. Suppose you have 9 of them. Those first parts can only combine with those second parts, and give you 2 * 9 = 18 valid pairs.
You don't actually need to keep the strings!
It struck me that if you know you have 2 valid 3-digit "first parts", you don't need to keep those actual strings. Knowing that they are valid 2-digit first parts is all you need to know.
So let's build an array containing:
How many valid 1-digit first parts do we have?,
How many valid 2-digit first parts do we have?,
How many valid 3-digit first parts do we have?,
etc.
And similarly an array containing the number of valid 1-digit second parts, etc.
X first parts and Y second parts can be combined in X * Y ways
Except if the parts are the same length, in which case we are reusing the same list, and so it is just X * (Y-1).
So not only do we not need to keep the strings, but we only need to do the multiplication of the appropriate elements of the arrays.
5 1-char first parts & 7 3-char second parts = 5 * 7 = 35 pairs
6 2-char first part & 4 2-char second parts = 6 * (4-1) = 18 pairs
etc
So this becomes extremely easy. One pass over the strings, tallying the "first part" and "second part" matches of each length. This can be done with an if and a ++ of the relevant array element.
Then one pass over the lengths, which will be very quick as the array of lengths will be very much shorter than the array of actual strings.
function combineTheGivenNumber(numArray, num) {
const sElements = numArray.map(e => "" + e);
const sTarget = "" + num;
const targetLength = sTarget.length
const startsByLen = (new Array(targetLength)).fill(0);
const endsByLen = (new Array(targetLength)).fill(0);
sElements.forEach(sElement => {
if (sTarget.startsWith(sElement)) {
startsByLen[sElement.length]++
}
if (sTarget.endsWith(sElement)) {
endsByLen[sElement.length]++
}
})
// We can now throw away the strings. We have two separate arrays:
// startsByLen[1] is the count of strings (without attempting to remove duplicates) which are the first character of the required answer
// startsByLen[2] similarly the count of strings which are the first 2 characters of the required answer
// etc.
// and endsByLen[1] is the count of strings which are the last character ...
// and endsByLen[2] is the count of strings which are the last 2 characters, etc.
let pairCounts = 0;
for (let firstElementLength = 1; firstElementLength < targetLength; firstElementLength++) {
const secondElementLength = targetLength - firstElementLength;
if (firstElementLength === secondElementLength) {
pairCounts += startsByLen[firstElementLength] * (endsByLen[secondElementLength] - 1)
} else {
pairCounts += startsByLen[firstElementLength] * endsByLen[secondElementLength]
}
}
return pairCounts;
}
console.log('Test 1: ', combineTheGivenNumber([1, 212, 12, 12], 1212));
console.log('Test 2: ', combineTheGivenNumber([4, 21, 42, 1], 421));
Depending on a setup, the integer slicing can be marginally faster
Although in the end it falls short
Also, when tested on higher N values, the previous answer exploded in jsfiddle. Possibly a memory error.
As far as I have tested with both random and hand-crafted values, my solution holds. It is based on an observation, that if X, Y concantenated == Z, then following must be true:
Z - Y == X * 10^(floor(log10(Y)) + 1)
an example of this:
1212 - 12 = 1200
12 * 10^(floor((log10(12)) + 1) = 12 * 10^(1+1) = 12 * 100 = 1200
Now in theory, this should be faster then manipulating strings. And in many other languages it most likely would be. However in Javascript as I just learned, the situation is a bit more complicated. Javascript does some weird things with casting that I haven't figured out yet. In short - when I tried storing the numbers(and their counts) in a map, the code got significantly slower making any possible gains from this logarithm shenanigans evaporate. Furthermore, storing them in a custom-crafted data structure isn't guaranteed to be faster since you have to build it etc. Also it would be quite a lot of work.
As it stands this log comparison is ~ 8 times faster in a case without(or with just a few) matches since the quadratic factor is yet to kick in. As long as the possible postfix count isn't too high, it will outperform the linear solution. Unfortunately it is still quadratic in nature with the breaking point depending on a total number of strings as well as their length.
So if you are searching for a needle in a haystack - for example you are looking for a few pairs in a huge heap of numbers, this can help. In the other case of searching for many matches, this won't help. Similarly, if the input array was sorted, you could use binary search to push the breaking point further up.
In the end, unless you manage to figure out how to store ints in a map(or some custom implementation of it) in a way that doesn't completely kill the performance, the linear solution of the previous answer will be faster. It can still be useful even with the performance hit if your computation is going to be memory heavy. Storing numbers takes less space then storing strings.
var log10 = Math.log(10)
function log10floored(num) {
return Math.floor(Math.log(num) / log10)
}
function combineTheGivenNumber(numArray, num) {
count = 0
for (var i=0; i!=numArray.length; i++) {
let portion = num - numArray[i]
let removedPart = Math.pow(10, log10floored(numArray[i]))
if (portion % (removedPart * 10) == 0) {
for (var j=0; j!=numArray.length; j++) {
if (j != i && portion / (removedPart * 10) == numArray[j] ) {
count += 1
}
}
}
}
return count
}
//The previous solution, that I used for timing, comparison and check purposes
function combineTheGivenNumber2(numArray, num) {
const strings = new Map();
for (const num of numArray) {
const str = String(num);
strings.set(str, 1 + (strings.get(str) ?? 0));
}
const whole = String(num);
let pairCounts = 0;
for (const [prefix, pCount] of strings) {
if (!whole.startsWith(prefix))
continue;
const suffix = whole.slice(prefix.length);
if (strings.has(suffix)) {
let sCount = strings.get(suffix);
if (suffix == prefix) sCount--; // no self-concatenation
pairCounts += pCount*sCount;
}
}
return pairCounts;
}
var myArray = []
for (let i =0; i!= 10000000; i++) {
myArray.push(Math.floor(Math.random() * 1000000))
}
var a = new Date()
t1 = a.getTime()
console.log('Test 1: ', combineTheGivenNumber(myArray,15285656));
var b = new Date()
t2 = b.getTime()
console.log('Test 2: ', combineTheGivenNumber2(myArray,15285656));
var c = new Date()
t3 = c.getTime()
console.log('Test1 time: ', t2 - t1)
console.log('test2 time: ', t3 - t2)
Small update
As long as you are willing to take a performance hit with the setup and settle for the ~2 times performance, using a simple "hashing" table can help.(Hashing tables are nice and tidy, this is a simple modulo lookup table. The principle is similar though.)
Technically this isn't linear, practicaly it is enough for the most cases - unless you are extremely unlucky and all your numbers fall in the same bucket.
function combineTheGivenNumber(numArray, num) {
count = 0
let size = 1000000
numTable = new Array(size)
for (var i=0; i!=numArray.length; i++) {
let idx = numArray[i] % size
if (numTable[idx] == undefined) {
numTable[idx] = [numArray[i]]
} else {
numTable[idx].push(numArray[i])
}
}
for (var i=0; i!=numArray.length; i++) {
let portion = num - numArray[i]
let removedPart = Math.pow(10, log10floored(numArray[i]))
if (portion % (removedPart * 10) == 0) {
if (numTable[portion / (removedPart * 10) % size] != undefined) {
let a = numTable[portion / (removedPart * 10) % size]
for (var j=0; j!=a.length; j++) {
if (j != i && portion / (removedPart * 10) == a[j] ) {
count += 1
}
}
}
}
}
return count
}
Here's a simplified, and partially optimised approach with 2 loops:
// let's optimise 'combineTheGivenNumber', where
// a=array of numbers AND n=number to match
const ctgn = (a, n) => {
// convert our given number to a string using `toString` for clarity
// this isn't entirely necessary but means we can use strict equality later
const ns = n.toString();
// reduce is an efficient mechanism to return a value based on an array, giving us
// _=[accumulator], na=[array number] and i=[index]
return a.reduce((_, na, i) => {
// convert our 'array number' to an 'array number string' for later concatenation
const nas = na.toString();
// iterate back over our array of numbers ... we're using an optimised/reverse loop
for (let ii = a.length - 1; ii >= 0; ii--) {
// skip the current array number
if (i === ii) continue;
// string + number === string, which lets us strictly compare our 'number to match'
// if there's a match we increment the accumulator
if (a[ii] + nas === ns) ++_;
}
// we're done
return _;
}, 0);
}

Generate random & unique 4 digit codes without brute force

I'm building an app and in one of my functions I need to generate random & unique 4 digit codes. Obviously there is a finite range from 0000 to 9999 but each day the entire list will be wiped and each day I will not need more than the available amount of codes which means it's possible to have unique codes for each day. Realistically I will probably only need a few hundred codes a day.
The way I've coded it for now is the simple brute force way which would be to generate a random 4 digit number, check if the number exists in an array and if it does, generate another number while if it doesn't, return the generated number.
Since it's 4 digits, the runtime isn't anything too crazy and I'm mostly generating a few hundred codes a day so there won't be some scenario where I've generated 9999 codes and I keep randomly generating numbers to find the last remaining one.
It would also be fine to have letters in there as well instead of just numbers if it would make the problem easier.
Other than my brute force method, what would be a more efficient way of doing this?
Thank you!
Since you have a constrained number of values that will easily fit in memory, the simplest way I know of is to create a list of the possible values and select one randomly, then remove it from the list so it can't be selected again. This will never have a collision with a previously used number:
function initValues(numValues) {
const values = new Array(numValues);
// fill the array with each value
for (let i = 0; i < values.length; i++) {
values[i] = i;
}
return values;
}
function getValue(array) {
if (!array.length) {
throw new Error("array is empty, no more random values");
}
const i = Math.floor(Math.random() * array.length);
const returnVal = array[i];
array.splice(i, 1);
return returnVal;
}
// sample code to use it
const rands = initValues(10000);
console.log(getValue(rands));
console.log(getValue(rands));
console.log(getValue(rands));
console.log(getValue(rands));
This works by doing the following:
Generate an array of all possible values.
When you need a value, select one from the array with a random index.
After selecting the value, remove it from the array.
Return the selected value.
Items are never repeated because they are removed from the array when used.
There are no collisions with used values because you're always just selecting a random value from the remaining unused values.
This relies on the fact that an array of integers is pretty well optimized in Javascript so doing a .splice() on a 10,000 element array is still pretty fast (as it can probably just be memmove instructions).
FYI, this could be made more memory efficient by using a typed array since your numbers can be represented in 16-bit values (instead of the default 64 bits for doubles). But, you'd have to implement your own version of .splice() and keep track of the length yourself since typed arrays don't have these capabilities built in.
For even larger problems like this where memory usage becomes a problem, I've used a BitArray to keep track of previous usage of values.
Here's a class implementation of the same functionality:
class Randoms {
constructor(numValues) {
this.values = new Array(numValues);
for (let i = 0; i < this.values.length; i++) {
this.values[i] = i;
}
}
getRandomValue() {
if (!this.values.length) {
throw new Error("no more random values");
}
const i = Math.floor(Math.random() * this.values.length);
const returnVal = this.values[i];
this.values.splice(i, 1);
return returnVal;
}
}
const rands = new Randoms(10000);
console.log(rands.getRandomValue());
console.log(rands.getRandomValue());
console.log(rands.getRandomValue());
console.log(rands.getRandomValue());
Knuth's multiplicative method looks to work pretty well: it'll map numbers 0 to 9999 to a random-looking other number 0 to 9999, with no overlap:
const hash = i => i*2654435761 % (10000);
const s = new Set();
for (let i = 0; i < 10000; i++) {
const n = hash(i);
if (s.has(n)) { console.log(i, n); break; }
s.add(n);
}
To implement it, simply keep track of an index that gets incremented each time a new one is generated:
const hash = i => i*2654435761 % (10000);
let i = 1;
console.log(
hash(i++),
hash(i++),
hash(i++),
hash(i++),
hash(i++),
);
These results aren't actually random, but they probably do the job well enough for most purposes.
Disclaimer:
This is copy-paste from my answer to another question here. The code was in turn ported from yet another question here.
Utilities:
function isPrime(n) {
if (n <= 1) return false;
if (n <= 3) return true;
if (n % 2 == 0 || n % 3 == 0) return false;
for (let i = 5; i * i <= n; i = i + 6) {
if (n % i == 0 || n % (i + 2) == 0) return false;
}
return true;
}
function findNextPrime(n) {
if (n <= 1) return 2;
let prime = n;
while (true) {
prime++;
if (isPrime(prime)) return prime;
}
}
function getIndexGeneratorParams(spaceSize) {
const N = spaceSize;
const Q = findNextPrime(Math.floor(2 * N / (1 + Math.sqrt(5))))
const firstIndex = Math.floor(Math.random() * spaceSize);
return [firstIndex, N, Q]
}
function getNextIndex(prevIndex, N, Q) {
return (prevIndex + Q) % N
}
Usage
// Each day you bootstrap to get a tuple of these parameters and persist them throughout the day.
const [firstIndex, N, Q] = getIndexGeneratorParams(10000)
// need to keep track of previous index generated.
// it’s a seed to generate next one.
let prevIndex = firstIndex
// calling this function gives you the unique code
function getHashCode() {
prevIndex = getNextIndex(prevIndex, N, Q)
return prevIndex.toString().padStart(4, "0")
}
console.log(getHashCode());
Explanation
For simplicity let’s say you want generate non-repeat numbers from 0 to 35 in random order. We get pseudo-randomness by polling a "full cycle iterator"†. The idea is simple:
have the indexes 0..35 layout in a circle, denote upperbound as N=36
decide a step size, denoted as Q (Q=23 in this case) given by this formula‑
Q = findNextPrime(Math.floor(2 * N / (1 + Math.sqrt(5))))
randomly decide a starting point, e.g. number 5
start generating seemingly random nextIndex from prevIndex, by
nextIndex = (prevIndex + Q) % N
So if we put 5 in we get (5 + 23) % 36 == 28. Put 28 in we get (28 + 23) % 36 == 15.
This process will go through every number in circle (jump back and forth among points on the circle), it will pick each number only once, without repeating. When we get back to our starting point 5, we know we've reach the end.
†: I'm not sure about this term, just quoting from this answer
‑: This formula only gives a nice step size that will make things look more "random", the only requirement for Q is it must be coprime to N
This problem is so small I think a simple solution is best. Build an ordered array of the 10k possible values & permute it at the start of each day. Give the k'th value to the k'th request that day.
It avoids the possible problem with your solution of having multiple collisions.

Cryptographically secure string from 36-character alphabet?

I'm trying to generate some tokens. They have to be 26 characters along from the alphabet [a-z0-9].
The closest solution I've found is part 2 from this answer, but the string won't be uniformly distributed.
If my alphabet was a power of 2 in length, this wouldn't be so hard, but as it stands, I'm not sure how to do this properly.
Specifically, this is what I've got so far:
export function createSessionId() {
const len = 26;
let bytes = new Crypto.randomBytes(len);
let result = new Array(len);
const alphabet = 'abcdefghijklmnopqrstuvwxyz0123456789';
for(let i=0; i<len; ++i) {
result[i] = alphabet[bytes[i]%alphabet.length];
}
return result.join('');
}
But I'm pretty sure that won't be distributed properly because of the modulo.
Let's see what you got here:
An alphabet of 36 characters, where each character is randomly selected at once by using modulo 36, is indeed not evenly distributed.
You have several options:
Select a whole series of bytes to represent the total string. That means, you need to select enough bytes to represent a number in the range of 0 - 36^26. That way, your value will be evenly distributed (as far as the crypto provider allows for).
If you insist on choosing one digit per time, you want to make sure its value will be distributed evenly, using modulo 36 does not do the job, as you correctly assumed. In this case, you can either
a) interpret 8 bytes as a float and multiply the result with 36
b) modulo by a power of 2 greater than 36. Then, search for the greatest multiple of 36 lower than that power of two, discard any value outside of that value space and modulo 36 the valid values.
For 2a), the distribution is not perfectly even, but very close to chance, assumed that the crypto provider is fair.
For 2b), the distribution is even. But this will be paid by a higher (and especially unpredictable) runtime when throwing away unneeded results. Of course you can statistically calculate the runtime, but the worst case is an infinite one, if your RNG keeps producing invalid results forever (which is very, very unlikely, but theoretically possible).
My advice is 2a). Take a series of bytes, interpret them as a float value and multiply the result with 36.
Here's my implementation of Psi's answer, 2b.
export function createSessionId() {
const len = 26;
let bytes = Crypto.randomBytes(30); // a few extra in case we're unlucky
let result = new Array(len);
const alphabet = 'abcdefghijklmnopqrstuvwxyz0123456789';
let i =0;
let j = 0;
for(;;) {
if(i >= bytes.length) {
bytes = Crypto.randomBytes(((len-j)*1.2)|0); // we got unlucky, gather up some more entropy
i = 0;
}
let value = bytes[i++];
if(value >= 252) { // we need a multiple of 36 for an even distribution
continue;
}
result[j++] = alphabet[value % alphabet.length];
if(j >= len) {
break;
}
}
return result.join('');
}
There's less than a 2% chance of needing to reroll (4/255), so I think this ought to be efficient enough.
Hard to unit test something like this, but this passes:
test(createSessionId.name, () => {
let ids = new Set();
let dist = Array.apply(null,new Array(26)).map(() => ({}));
for(let i=0; i<1500; ++i) {
let id = createSessionId();
if(ids.has(id)) {
throw new Error(`Not unique`);
}
ids.add(id);
for(let j=0; j<id.length; ++j) {
dist[j][id[j]] = true;
}
}
for(let i=0; i<26; ++i) {
expect(Object.keys(dist[i]).length).toEqual(36);
}
});

Compress group of arrays into smallest possible string

This question can be answered using javascript, underscore or Jquery functions.
given 4 arrays:
[17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20] => x coordinate of unit
[11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15] => y coordinate of unit
[92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92] => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39] => y (unit moves to this direction y)
They are very related to each other.
For example first element of all arrays: [17,11,92,35] is unit x/y coordinates and also x/y coordinates of this units target.
So here are totally 5*4 = 20 units. Every unit has slightly different stats.
These 4 arrays of units x/y coordinates visually looks like an army of 20 units "x" (targeting "o"):
xxxxx o
xxxxx o
xxxxx o
xxxxx o
There will always be 4 arrays. Even if 1 unit, there will be 4 arrays, but each size of 1. This is the simplest situation and most common.
In real situation, every unit has totally 20 different stats(keys) and 14 keys are mostly exact to other group of units - all 14 keys.
So they are grouped as an army with same stats. Difference is only coordinates of the unit and also coordinates of the units target.
I need to compress all this data into as small as possible data, which later can be decompressed.
There can also be more complex situations, when all these 14 keys are accidently same, but coordinates are totally different from pattern. Example:
[17,17,17,17,17,18,18,18, 215, 18,18,19,19,19,19,19,20,20,20,20,20] => x coordinate of unit
[11,12,13,14,15,11,12,13, 418, 14,15,11,12,13,14,15,11,12,13,14,15] => y coordinate of unit
[92,92,92,92,92,92,92,92, -78, 92,92,92,92,92,92,92,92,92,92,92,92] => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37, -887, 38,39,35,36,37,38,39,35,36,37,38,39] => y (unit moves to this direction y)
In this situation i need to extract this array as for 2 different armies. When there are less than 3 units in army, i just simply write these units without the
pattern - as [215,418,-78,-887],[..] and if there are more than 2 units army, i need a compressed string with pattern, which can be decompressed later. In this example there are 21 units. It just has to be splitted into armies of 1 unit and (5x4 = 20) untis army.
In assumption that every n units has a pattern,
encode units with
n: sequence units count
ssx: start of source x
dsx: difference of source x
ssy: start of source y
dsy: difference of source y
stx: start of target x
dtx: difference of target x
sty: start of target y
dty: difference of target y
by the array: [n,ssx,dsx,ssy,dsy,stx,dtx,sty,dty]
so that the units:
[17,17,17,17,17],
[11,12,13,14,15],
[92,92,92,92,92],
[35,36,37,38,39]
are encoded:
[5,17,0,11,1,92,0,35,1]
of course if you know in advance that, for example the y targets are always the same for such a sequence you can give up the difference parameter, to have:
[n,ssx,dsx,ssy,---,stx,dtx,sty,---] => [n,ssx,dsx,ssy,stx,dtx,sty], and so on.
For interruption of a pattern like you mentioned in your last example, you can use other 'extra' arrays, and then insert them in the sequence, with:
exsx: extra unit starting x
exsy: extra unit starting y
extx: extra unit target x
exty: extra unit target y
m: insert extra unit at
so that the special case is encoded:
{
patterns:[
[5,17,0,11,1,92,0,35,1],
[5,18,0,11,1,92,0,35,1],
[5,19,0,11,1,92,0,35,1],
[5,17,0,11,1,92,0,35,1]
],
extras: [
[215,418,-78,-887,8] // 8 - because you want to insert this unit at index 8
]
}
Again, this is a general encoding. Any specific properties for the patterns may further reduce the encoded representation.
Hope this helps.
High compression using bitstreams
You can encode sets of values into a bit stream allowing you to remove unused bits. The numbers you have shown are not greater than -887 (ignoring the negative) and that means you can fit all the numbers into 10 bits saving 54 bits per number (Javascript uses 64 bit numbers).
Run length compression
You also have many repeated sets of numbers which you can use run length compression on. You set a flag in the bitstream that indicates that the following set of bits represents a repeated sequence of numbers, then you have the number of repeats and the value to repeat. For sequences of random numbers you just keep them as is.
If you use run-length compression you create a block type structure in the bit stream, this makes it possible to embed further compression. As you have many numbers that are below 128 many of the numbers can be encoded into 7 bits, or even less. For a small overhead (in this case 2 bits per block) you can select the smallest bit size to pack all the numbers in that block in.
Variable bit depth numbers
I have created a number type value that represent the number of bits used to store numbers in a block. Each block has a number type and all numbers in the block use that type. There are 4 number types that can be encoded into 2 bits.
00 = 4 bit numbers. Range 0-15
01 = 5 bit numbers. Range 0-31
10 = 7 bit numbers. Range 0-127
11 = 10 bit numbers. Range 0-1023
The bitstream
To make this easy you will need a bit stream read/write. It allows you to easily write and read bits from a stream of bits.
// Simple unsigned bit stream
// Read and write to and from a bit stream.
// Numbers are stored as Big endian
// Does not comprehend sign so wordlength should be less than 32 bits
// methods
// eof(); // returns true if read pos > buffer bit size
// read(numberBits); // number of bits to read as one number. No sign so < 32
// write(value,numberBits); // value to write, number of bits to write < 32
// getBuffer(); // return object with buffer and array of numbers, and bitLength the total number of bits
// setBuffer(buffer,bitLength); // the buffers as an array of numbers, and bitLength the total number of bits
// Properties
// wordLength; // read only length of a word.
function BitStream(){
var buffer = [];
var pos = 0;
var numBits = 0;
const wordLength = 16;
this.wordLength = wordLength;
// read a single bit
var readBit = function(){
var b = buffer[Math.floor(pos / wordLength)]; // get word
b = (b >> ((wordLength - 1) - (pos % wordLength))) & 1;
pos += 1;
return b;
}
// write a single bit. Will fill bits with 0 if wite pos is moved past buffer length
var writeBit = function(bit){
var rP = Math.floor(pos / wordLength);
if(rP >= buffer.length){ // check that the buffer has values at current pos.
var end = buffer.length; // fill buffer up to pos with zero
while(end <= rP){
buffer[end] = 0;
end += 1;
}
}
var b = buffer[rP];
bit &= 1; // mask out any unwanted bits
bit <<= (wordLength - 1) - (pos % wordLength);
b |= bit;
buffer[rP] = b;
pos += 1;
}
// returns true is past eof
this.eof = function(){
return pos >= numBits;
}
// reads number of bits as a Number
this.read = function(bits){
var v = 0;
while(bits > 0){
v <<= 1;
v |= readBit();
bits -= 1;
}
return v;
}
// writes value to bit stream
this.write = function(value,bits){
var v;
while(bits > 0){
bits -= 1;
writeBit( (value >> bits) & 1 );
}
}
// returns the buffer and length
this.getBuffer = function(){
return {
buffer : buffer,
bitLength : pos,
};
}
// set the buffer and length and returns read write pos to start
this.setBuffer = function(_buffer,bitLength){
buffer = _buffer;
numBits = bitLength;
pos = 0;
}
}
A format for your numbers
Now to design the format. The first bit read from a stream is a sequence flag, if 0 then the following block will be a repeated value, if 1 the following block will be a sequence of random numbers.
Block bits : description;
repeat block holds a repeated number
bit 0 : Val 0 = repeat
bit 1 : Val 0 = 4bit repeat count or 1 = 5bit repeat count
then either
bits 2,3,4,5 : 4 bit number of repeats - 1
bits 6,7 : 2 bit Number type
or
bits 2,3,4,5,6 : 5 bit number of repeats - 1
bits 7,8 : 2 bit Number type
Followed by
Then a value that will be repeated depending on the number type
End of block
sequence block holds a sequence of random numbers
bit 0 : Val 1 = sequence
bit 1 : Val 0 = positive sequence Val 1 = negative sequence
bits 2,3,4,5 : 4 bit number of numbers in sequence - 1
bits 6,7 : 2 bit Number type
then the sequence of numbers in the number format
End of block.
Keep reading blocks until the end of file.
Encoder and decoder
The following object will encode and decode the a flat array of numbers. It will only handles numbers upto 10 bits long, So no values over 1023 or under -1023.
If you want larger numbers you will have to change the number types that are used. To do this change the arrays
const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth
const numberBits = [4,5,7,10]; // the number bit depth lookup;
If you want max number to be 12 bits -4095 to 4095 ( the sign bit is in the block encoding). I have also shown the 7 bit number type changed to 8. The first array is used to look up the bit depth, if I have a 3 bit number you get the number type with numberSize[bitcount] and the bits used to store the number numberBits[numberSize[bitCount]]
const numberSize = [0,0,0,0,0,1,2,2,2,3,3,3,3]; // the number bit depth
const numberBits = [4,5,8,12]; // the number bit depth lookup;
function ArrayZip(){
var zipBuffer = 0;
const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth lookup;
const numberBits = [4,5,7,10]; // the number bit depth lookup;
this.encode = function(data){ // encodes the data
var pos = 0;
function getRepeat(){ // returns the number of repeat values
var p = pos + 1;
if(data[pos] < 0){
return 1; // ignore negative numbers
}
while(p < data.length && data[p] === data[pos]){
p += 1;
}
return p - pos;
}
function getNoRepeat(){ // returns the number of non repeat values
// if the sequence has negitive numbers then
// the length is returned as a negative
var p = pos + 1;
if(data[pos] < 0){ // negative numbers
while(p < data.length && data[p] !== data[p-1] && data[p] < 0){
p += 1;
}
return -(p - pos);
}
while(p < data.length && data[p] !== data[p-1] && data[p] >= 0){
p += 1;
}
return p - pos;
}
function getMax(count){
var max = 0;
var p = pos;
while(count > 0){
max = Math.max(Math.abs(data[p]),max);
p += 1;
count -= 1;
}
return max;
}
var out = new BitStream();
while(pos < data.length){
var reps = getRepeat();
if(reps > 1){
var bitCount = numberSize[Math.ceil(Math.log(getMax(reps) + 1) / Math.log(2))];
if(reps < 16){
out.write(0,1); // repeat header
out.write(0,1); // use 4 bit repeat count;
out.write(reps-1,4); // write 4 bit number of reps
out.write(bitCount,2); // write 2 bit number size
out.write(data[pos],numberBits[bitCount]);
pos += reps;
}else {
if(reps > 32){ // if more than can fit in one repeat block split it
reps = 32;
}
out.write(0,1); // repeat header
out.write(1,1); // use 5 bit repeat count;
out.write(reps-1,5); // write 5 bit number of reps
out.write(bitCount,2); // write 2 bit number size
out.write(data[pos],numberBits[bitCount]);
pos += reps;
}
}else{
var seq = getNoRepeat(); // get number no repeats
var neg = seq < 0 ? 1 : 0; // found negative numbers
seq = Math.min(16,Math.abs(seq));
// check if last value is the start of a repeating block
if(seq > 1){
var tempPos = pos;
pos += seq;
seq -= getRepeat() > 1 ? 1 : 0;
pos = tempPos;
}
// ge the max bit count to hold numbers
var bitCount = numberSize[Math.ceil(Math.log(getMax(seq) + 1) / Math.log(2))];
out.write(1,1); // sequence header
out.write(neg,1); // write negative flag
out.write(seq - 1,4); // write sequence length;
out.write(bitCount,2); // write 2 bit number size
while(seq > 0){
out.write(Math.abs(data[pos]),numberBits[bitCount]);
pos += 1;
seq -= 1;
}
}
}
// get the bit stream buffer
var buf = out.getBuffer();
// start bit stream with number of trailing bits. There are 4 bits used of 16 so plenty
// of room for aulturnative encoding flages.
var str = String.fromCharCode(buf.bitLength % out.wordLength);
// convert bit stream to charcters
for(var i = 0; i < buf.buffer.length; i ++){
str += String.fromCharCode(buf.buffer[i]);
}
// return encoded string
return str;
}
this.decode = function(zip){
var count,rSize,header,_in,i,data,endBits,numSize,val,neg;
data = []; // holds character codes
decompressed = []; // holds the decompressed array of numbers
endBits = zip.charCodeAt(0); // get the trailing bits count
for(i = 1; i < zip.length; i ++){ // convert string to numbers
data[i-1] = zip.charCodeAt(i);
}
_in = new BitStream(); // create a bitstream to read the bits
// set the buffer data and length
_in.setBuffer(data,(data.length - 1) * _in.wordLength + endBits);
while(!_in.eof()){ // do until eof
header = _in.read(1); // read header bit
if(header === 0){ // is repeat header
rSize = _in.read(1); // get repeat count size
if(rSize === 0){
count = _in.read(4); // get 4 bit repeat count
}else{
count = _in.read(5); // get 5 bit repeat count
}
numSize = _in.read(2); // get 2 bit number size type
val = _in.read(numberBits[numSize]); // get the repeated value
while(count >= 0){ // add to the data count + 1 times
decompressed.push(val);
count -= 1;
}
}else{
neg = _in.read(1); // read neg flag
count = _in.read(4); // get 4 bit seq count
numSize = _in.read(2); // get 2 bit number size type
while(count >= 0){
if(neg){ // if negative numbers convert to neg
decompressed.push(-_in.read(numberBits[numSize]));
}else{
decompressed.push(_in.read(numberBits[numSize]));
}
count -= 1;
}
}
}
return decompressed;
}
}
The best way to store a bit stream is as a string. Javascript has Unicode strings so we can pack 16 bits into every character
The results and how to use.
You need to flatten the array. If you need to add extra info to reinstate the multi/dimensional arrays just add that to the array and let the compressor compress it along with the rest.
// flatten the array
var data = [17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39];
var zipper = new ArrayZip();
var encoded = zipper.encode(data); // packs the 80 numbers in data into 21 characters.
// compression rate of the data array 5120 bits to 336 bits
// 93% compression.
// or as a flat 7bit ascii string as numbers 239 charcters (including ,)
// 239 * 7 bits = 1673 bits to 336 bits 80% compression.
var decoded = zipper.decode(encoded);
I did not notice the negative numbers at first so the compression does not do well with the negative values.
var data = [17,17,17,17,17,18,18,18, 215, 18,18,19,19,19,19,19,20,20,20,20,20, 11,12,13,14,15,11,12,13, 418, 14,15,11,12,13,14,15,11,12,13,14,15, 92,92,92,92,92,92,92,92, -78, 92,92,92,92,92,92,92,92,92,92,92,92, 35,36,37,38,39,35,36,37, -887, 38,39,35,36,37,38,39,35,36,37,38,39]
var encoded = zipper.encode(data); // packs the 84 numbers in data into 33 characters.
// compression rate of the data array 5376 bits to 528 bits
var decoded = zipper.decode(encoded);
Summary
As you can see this results in a very high compression rate (almost twice as good as LZ compression). The code is far from optimal and you could easily implement a multi pass compressor with various settings ( there are 12 spare bits at the start of the encoded string that can be used to select many options to improve compression.)
Also I did not see the negative numbers until I came back to post so the fix for negatives is not good, so you can some more out of it by modifying the bitStream to understand negatives (ie use the >>> operator)

Categories

Resources