Trying to convert a C++ code to Javascript code - javascript

Well, I'm trying to convert this code in C++ (a custom LZSS decompressor):
// Modified LZSS routine
// - starting position at 0 rather than 0xFEE
// - optionally, additional byte for repetition count
// - dictionary writing in two passes
static bstr custom_lzss_decompress(
const bstr &input, size_t output_size, const size_t dict_capacity)
{
std::vector<u8> dict(dict_capacity);
size_t dict_size = 0;
size_t dict_pos = 0;
bstr output(output_size);
auto output_ptr = output.get<u8>();
auto output_end = output.end<const u8>();
auto input_ptr = input.get<const u8>();
auto input_end = input.end<const u8>();
u16 control = 0;
while (output_ptr < output_end)
{
control >>= 1;
if (!(control & 0x100))
control = *input_ptr++ | 0xFF00;
if (control & 1)
{
dict[dict_pos++] = *output_ptr++ = *input_ptr++;
dict_pos %= dict_capacity;
if (dict_size < dict_capacity)
dict_size++;
}
else
{
auto tmp = *reinterpret_cast<const u16*>(input_ptr);
input_ptr += 2;
auto look_behind_pos = tmp >> 4;
auto repetitions = tmp & 0xF;
if (repetitions == 0xF)
repetitions += *input_ptr++;
repetitions += 3;
auto i = repetitions;
while (i-- && output_ptr < output_end)
{
*output_ptr++ = dict[look_behind_pos++];
look_behind_pos %= dict_size;
}
auto source = &output_ptr[-repetitions];
while (source < output_ptr)
{
dict[dict_pos++] = *source++;
dict_pos %= dict_capacity;
if (dict_size < dict_capacity)
dict_size++;
}
}
}
return output;
}
Into a javascript version, and i'm actually on this:
function custom_lzss_decompress(input,osize,dictcap){
var dict = new Uint8Array(dictcap);
var output = new Uint8Array(osize);
var data = str2ab(input);
var dict_size = 0;
var dict_pos = 0;
var control = 0;
var iptr = 0;
var optr = 0;
while (iptr < osize){
control >>= 1;
if(!(control & 0x100)){
control = data[iptr] | 0xFF00;
iptr++;
}
if(control & 1){
dict[dict_pos] = output[optr] = data[iptr];
dict_pos++;
iptr++;
optr++;
dict_pos %= dictcap;
if(dict_size < dictcap)
dict_size++;
}
else{
var tmp = new Uint16Array(1);
tmp[0] = (data[iptr] << 8) + data[iptr+1];
iptr += 2;
var loop_behind_pos = new Uint16Array(1);
loop_behind_pos[0] = tmp >> 4;
var repetitions = new Uint16Array(1);
repetitions[0] = tmp & 0xF;
if(repetitions[0] == 0xF){
repetitions[0] += data[iptr];
}
iptr++;
repetitions[0] += 3;
var ai = new Uint16Array(1);
ai[0] = repetitions[0];
while(ai[0] && (optr < osize)){
ai[0]--;
output[optr] = dict[loop_behind_pos];
optr++;
loop_behind_pos++;
loop_behind_pos %= dict_size;
}
var source = new Uint16Array(1);
source[0] = output[-repetitions[0]];
while(source[0] < optr){
dict[dict_pos] = output[source];
dict_pos++;
source++;
dict_pos %= dictcap;
if(dict_size < dictcap)
dict_size++;
}
}
}
return output;
}
It runs... but doesn't works... And i can't figure out what is going on... Someone can see some bizarre error that I'm not seeing?

I see one problem, and one potential problem. The problem is in the 'repetitions[0] == 0xFhandler. You're incrementingiptr` all the time, but the original code only does it in the body of the if.
The potential problem is in the 16 bit word read, tmp[0] = (data[iptr] << 8) + data[iptr+1];. This works if the word is big-endian, but will not give the right value if the source is little-endian.

Related

i receive data type Uint8Array from port serial how can i transfer to decimal value [ web serial port ] [duplicate]

I have some UTF-8 encoded data living in a range of Uint8Array elements in Javascript. Is there an efficient way to decode these out to a regular javascript string (I believe Javascript uses 16 bit Unicode)? I dont want to add one character at the time as the string concaternation would become to CPU intensive.
TextEncoder and TextDecoder from the Encoding standard, which is polyfilled by the stringencoding library, converts between strings and ArrayBuffers:
var uint8array = new TextEncoder().encode("someString");
var string = new TextDecoder().decode(uint8array);
This should work:
// http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt
/* utf.js - UTF-8 <=> UTF-16 convertion
*
* Copyright (C) 1999 Masanao Izumo <iz#onicos.co.jp>
* Version: 1.0
* LastModified: Dec 25 1999
* This library is free. You can redistribute it and/or modify it.
*/
function Utf8ArrayToStr(array) {
var out, i, len, c;
var char2, char3;
out = "";
len = array.length;
i = 0;
while(i < len) {
c = array[i++];
switch(c >> 4)
{
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
// 0xxxxxxx
out += String.fromCharCode(c);
break;
case 12: case 13:
// 110x xxxx 10xx xxxx
char2 = array[i++];
out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
break;
case 14:
// 1110 xxxx 10xx xxxx 10xx xxxx
char2 = array[i++];
char3 = array[i++];
out += String.fromCharCode(((c & 0x0F) << 12) |
((char2 & 0x3F) << 6) |
((char3 & 0x3F) << 0));
break;
}
}
return out;
}
It's somewhat cleaner as the other solutions because it doesn't use any hacks nor depends on Browser JS functions, e.g. works also in other JS environments.
Check out the JSFiddle demo.
Also see the related questions: here and here
Here's what I use:
var str = String.fromCharCode.apply(null, uint8Arr);
In Node "Buffer instances are also Uint8Array instances", so buf.toString() works in this case.
In NodeJS, we have Buffers available, and string conversion with them is really easy. Better, it's easy to convert a Uint8Array to a Buffer. Try this code, it's worked for me in Node for basically any conversion involving Uint8Arrays:
let str = Buffer.from(uint8arr.buffer).toString();
We're just extracting the ArrayBuffer from the Uint8Array and then converting that to a proper NodeJS Buffer. Then we convert the Buffer to a string (you can throw in a hex or base64 encoding if you want).
If we want to convert back to a Uint8Array from a string, then we'd do this:
let uint8arr = new Uint8Array(Buffer.from(str));
Be aware that if you declared an encoding like base64 when converting to a string, then you'd have to use Buffer.from(str, "base64") if you used base64, or whatever other encoding you used.
This will not work in the browser without a module! NodeJS Buffers just don't exist in the browser, so this method won't work unless you add Buffer functionality to the browser. That's actually pretty easy to do though, just use a module like this, which is both small and fast!
Found in one of the Chrome sample applications, although this is meant for larger blocks of data where you're okay with an asynchronous conversion.
/**
* Converts an array buffer to a string
*
* #private
* #param {ArrayBuffer} buf The buffer to convert
* #param {Function} callback The function to call when conversion is complete
*/
function _arrayBufferToString(buf, callback) {
var bb = new Blob([new Uint8Array(buf)]);
var f = new FileReader();
f.onload = function(e) {
callback(e.target.result);
};
f.readAsText(bb);
}
The solution given by Albert works well as long as the provided function is invoked infrequently and is only used for arrays of modest size, otherwise it is egregiously inefficient. Here is an enhanced vanilla JavaScript solution that works for both Node and browsers and has the following advantages:
• Works efficiently for all octet array sizes
• Generates no intermediate throw-away strings
• Supports 4-byte characters on modern JS engines (otherwise "?" is substituted)
var utf8ArrayToStr = (function () {
var charCache = new Array(128); // Preallocate the cache for the common single byte chars
var charFromCodePt = String.fromCodePoint || String.fromCharCode;
var result = [];
return function (array) {
var codePt, byte1;
var buffLen = array.length;
result.length = 0;
for (var i = 0; i < buffLen;) {
byte1 = array[i++];
if (byte1 <= 0x7F) {
codePt = byte1;
} else if (byte1 <= 0xDF) {
codePt = ((byte1 & 0x1F) << 6) | (array[i++] & 0x3F);
} else if (byte1 <= 0xEF) {
codePt = ((byte1 & 0x0F) << 12) | ((array[i++] & 0x3F) << 6) | (array[i++] & 0x3F);
} else if (String.fromCodePoint) {
codePt = ((byte1 & 0x07) << 18) | ((array[i++] & 0x3F) << 12) | ((array[i++] & 0x3F) << 6) | (array[i++] & 0x3F);
} else {
codePt = 63; // Cannot convert four byte code points, so use "?" instead
i += 3;
}
result.push(charCache[codePt] || (charCache[codePt] = charFromCodePt(codePt)));
}
return result.join('');
};
})();
Uint8Array to String
let str = Buffer.from(key.secretKey).toString('base64');
String to Uint8Array
let uint8arr = new Uint8Array(Buffer.from(data,'base64'));
I was frustrated to see that people were not showing how to go both ways or showing that things work on none trivial UTF8 strings. I found a post on codereview.stackexchange.com that has some code that works well. I used it to turn ancient runes into bytes, to test some crypo on the bytes, then convert things back into a string. The working code is on github here. I renamed the methods for clarity:
// https://codereview.stackexchange.com/a/3589/75693
function bytesToSring(bytes) {
var chars = [];
for(var i = 0, n = bytes.length; i < n;) {
chars.push(((bytes[i++] & 0xff) << 8) | (bytes[i++] & 0xff));
}
return String.fromCharCode.apply(null, chars);
}
// https://codereview.stackexchange.com/a/3589/75693
function stringToBytes(str) {
var bytes = [];
for(var i = 0, n = str.length; i < n; i++) {
var char = str.charCodeAt(i);
bytes.push(char >>> 8, char & 0xFF);
}
return bytes;
}
The unit test uses this UTF-8 string:
// http://kermitproject.org/utf8.html
// From the Anglo-Saxon Rune Poem (Rune version)
const secretUtf8 = `ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾ
ᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬`;
Note that the string length is only 117 characters but the byte length, when encoded, is 234.
If I uncomment the console.log lines I can see that the string that is decoded is the same string that was encoded (with the bytes passed through Shamir's secret sharing algorithm!):
Do what #Sudhir said, and then to get a String out of the comma seperated list of numbers use:
for (var i=0; i<unitArr.byteLength; i++) {
myString += String.fromCharCode(unitArr[i])
}
This will give you the string you want,
if it's still relevant
If you can't use the TextDecoder API because it is not supported on IE:
You can use the FastestSmallestTextEncoderDecoder polyfill recommended by the Mozilla Developer Network website;
You can use this function also provided at the MDN website:
function utf8ArrayToString(aBytes) {
var sView = "";
for (var nPart, nLen = aBytes.length, nIdx = 0; nIdx < nLen; nIdx++) {
nPart = aBytes[nIdx];
sView += String.fromCharCode(
nPart > 251 && nPart < 254 && nIdx + 5 < nLen ? /* six bytes */
/* (nPart - 252 << 30) may be not so safe in ECMAScript! So...: */
(nPart - 252) * 1073741824 + (aBytes[++nIdx] - 128 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 247 && nPart < 252 && nIdx + 4 < nLen ? /* five bytes */
(nPart - 248 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 239 && nPart < 248 && nIdx + 3 < nLen ? /* four bytes */
(nPart - 240 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 223 && nPart < 240 && nIdx + 2 < nLen ? /* three bytes */
(nPart - 224 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 191 && nPart < 224 && nIdx + 1 < nLen ? /* two bytes */
(nPart - 192 << 6) + aBytes[++nIdx] - 128
: /* nPart < 127 ? */ /* one byte */
nPart
);
}
return sView;
}
let str = utf8ArrayToString([50,72,226,130,130,32,43,32,79,226,130,130,32,226,135,140,32,50,72,226,130,130,79]);
// Must show 2H₂ + O₂ ⇌ 2H₂O
console.log(str);
Try these functions,
var JsonToArray = function(json)
{
var str = JSON.stringify(json, null, 0);
var ret = new Uint8Array(str.length);
for (var i = 0; i < str.length; i++) {
ret[i] = str.charCodeAt(i);
}
return ret
};
var binArrayToJson = function(binArray)
{
var str = "";
for (var i = 0; i < binArray.length; i++) {
str += String.fromCharCode(parseInt(binArray[i]));
}
return JSON.parse(str)
}
source: https://gist.github.com/tomfa/706d10fed78c497731ac, kudos to Tomfa
I'm using this function, which works for me:
function uint8ArrayToBase64(data) {
return btoa(Array.from(data).map((c) => String.fromCharCode(c)).join(''));
}
For ES6 and UTF8 string
decodeURIComponent(escape(String.fromCharCode(...uint8arrData)))
By far the easiest way that has worked for me is:
//1. Create or fetch the Uint8Array to use in the example
const bufferArray = new Uint8Array([10, 10, 10])
//2. Turn the Uint8Array into a regular array
const array = Array.from(bufferArray);
//3. Stringify it (option A)
JSON.stringify(array);
//3. Stringify it (option B: uses #serdarsenay code snippet to decode each item in array)
let binArrayToString = function(binArray) {
let str = "";
for (let i = 0; i < binArray.length; i++) {
str += String.fromCharCode(parseInt(binArray[i]));
}
return str;
}
binArrayToString(array);
class UTF8{
static encode(str:string){return new UTF8().encode(str)}
static decode(data:Uint8Array){return new UTF8().decode(data)}
private EOF_byte:number = -1;
private EOF_code_point:number = -1;
private encoderError(code_point) {
console.error("UTF8 encoderError",code_point)
}
private decoderError(fatal, opt_code_point?):number {
if (fatal) console.error("UTF8 decoderError",opt_code_point)
return opt_code_point || 0xFFFD;
}
private inRange(a:number, min:number, max:number) {
return min <= a && a <= max;
}
private div(n:number, d:number) {
return Math.floor(n / d);
}
private stringToCodePoints(string:string) {
/** #type {Array.<number>} */
let cps = [];
// Based on http://www.w3.org/TR/WebIDL/#idl-DOMString
let i = 0, n = string.length;
while (i < string.length) {
let c = string.charCodeAt(i);
if (!this.inRange(c, 0xD800, 0xDFFF)) {
cps.push(c);
} else if (this.inRange(c, 0xDC00, 0xDFFF)) {
cps.push(0xFFFD);
} else { // (inRange(c, 0xD800, 0xDBFF))
if (i == n - 1) {
cps.push(0xFFFD);
} else {
let d = string.charCodeAt(i + 1);
if (this.inRange(d, 0xDC00, 0xDFFF)) {
let a = c & 0x3FF;
let b = d & 0x3FF;
i += 1;
cps.push(0x10000 + (a << 10) + b);
} else {
cps.push(0xFFFD);
}
}
}
i += 1;
}
return cps;
}
private encode(str:string):Uint8Array {
let pos:number = 0;
let codePoints = this.stringToCodePoints(str);
let outputBytes = [];
while (codePoints.length > pos) {
let code_point:number = codePoints[pos++];
if (this.inRange(code_point, 0xD800, 0xDFFF)) {
this.encoderError(code_point);
}
else if (this.inRange(code_point, 0x0000, 0x007f)) {
outputBytes.push(code_point);
} else {
let count = 0, offset = 0;
if (this.inRange(code_point, 0x0080, 0x07FF)) {
count = 1;
offset = 0xC0;
} else if (this.inRange(code_point, 0x0800, 0xFFFF)) {
count = 2;
offset = 0xE0;
} else if (this.inRange(code_point, 0x10000, 0x10FFFF)) {
count = 3;
offset = 0xF0;
}
outputBytes.push(this.div(code_point, Math.pow(64, count)) + offset);
while (count > 0) {
let temp = this.div(code_point, Math.pow(64, count - 1));
outputBytes.push(0x80 + (temp % 64));
count -= 1;
}
}
}
return new Uint8Array(outputBytes);
}
private decode(data:Uint8Array):string {
let fatal:boolean = false;
let pos:number = 0;
let result:string = "";
let code_point:number;
let utf8_code_point = 0;
let utf8_bytes_needed = 0;
let utf8_bytes_seen = 0;
let utf8_lower_boundary = 0;
while (data.length > pos) {
let _byte = data[pos++];
if (_byte == this.EOF_byte) {
if (utf8_bytes_needed != 0) {
code_point = this.decoderError(fatal);
} else {
code_point = this.EOF_code_point;
}
} else {
if (utf8_bytes_needed == 0) {
if (this.inRange(_byte, 0x00, 0x7F)) {
code_point = _byte;
} else {
if (this.inRange(_byte, 0xC2, 0xDF)) {
utf8_bytes_needed = 1;
utf8_lower_boundary = 0x80;
utf8_code_point = _byte - 0xC0;
} else if (this.inRange(_byte, 0xE0, 0xEF)) {
utf8_bytes_needed = 2;
utf8_lower_boundary = 0x800;
utf8_code_point = _byte - 0xE0;
} else if (this.inRange(_byte, 0xF0, 0xF4)) {
utf8_bytes_needed = 3;
utf8_lower_boundary = 0x10000;
utf8_code_point = _byte - 0xF0;
} else {
this.decoderError(fatal);
}
utf8_code_point = utf8_code_point * Math.pow(64, utf8_bytes_needed);
code_point = null;
}
} else if (!this.inRange(_byte, 0x80, 0xBF)) {
utf8_code_point = 0;
utf8_bytes_needed = 0;
utf8_bytes_seen = 0;
utf8_lower_boundary = 0;
pos--;
code_point = this.decoderError(fatal, _byte);
} else {
utf8_bytes_seen += 1;
utf8_code_point = utf8_code_point + (_byte - 0x80) * Math.pow(64, utf8_bytes_needed - utf8_bytes_seen);
if (utf8_bytes_seen !== utf8_bytes_needed) {
code_point = null;
} else {
let cp = utf8_code_point;
let lower_boundary = utf8_lower_boundary;
utf8_code_point = 0;
utf8_bytes_needed = 0;
utf8_bytes_seen = 0;
utf8_lower_boundary = 0;
if (this.inRange(cp, lower_boundary, 0x10FFFF) && !this.inRange(cp, 0xD800, 0xDFFF)) {
code_point = cp;
} else {
code_point = this.decoderError(fatal, _byte);
}
}
}
}
//Decode string
if (code_point !== null && code_point !== this.EOF_code_point) {
if (code_point <= 0xFFFF) {
if (code_point > 0)result += String.fromCharCode(code_point);
} else {
code_point -= 0x10000;
result += String.fromCharCode(0xD800 + ((code_point >> 10) & 0x3ff));
result += String.fromCharCode(0xDC00 + (code_point & 0x3ff));
}
}
}
return result;
}
`
Using base64 as the encoding format works quite well. This is how it was implemented for passing secrets via urls in Firefox Send. You will need the base64-js package. These are the functions from the Send source code:
const b64 = require("base64-js")
function arrayToB64(array) {
return b64.fromByteArray(array).replace(/\+/g, "-").replace(/\//g, "_").replace(/=/g, "")
}
function b64ToArray(str) {
return b64.toByteArray(str + "===".slice((str.length + 3) % 4))
}
With vanilla, browser side, recording from microphone, base64 functions worked for me (I had to implement an audio sending function to a chat).
const ui8a = new Uint8Array(e.target.result);
const string = btoa(ui8a);
const ui8a_2 = atob(string).split(',');
Full code now. Thanks to Bryan Jennings & breakspirit#py4u.net for the code.
https://medium.com/#bryanjenningz/how-to-record-and-play-audio-in-javascript-faa1b2b3e49b
https://www.py4u.net/discuss/282499
index.html
<html>
<head>
<title>Record Audio Test</title>
<meta name="encoding" charset="utf-8" />
</head>
<body>
<h1>Audio Recording Test</h1>
<script src="index.js"></script>
<button id="action" onclick="start()">Start</button>
<button id="stop" onclick="stop()">Stop</button>
<button id="play" onclick="play()">Listen</button>
</body>
</html>
index.js:
const recordAudio = () =>
new Promise(async resolve => {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream);
const audioChunks = [];
mediaRecorder.addEventListener("dataavailable", event => {
audioChunks.push(event.data);
});
const start = () => mediaRecorder.start();
const stop = () =>
new Promise(resolve => {
mediaRecorder.addEventListener("stop", () => {
const audioBlob = new Blob(audioChunks);
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
const play = () => audio.play();
resolve({ audioBlob, audioUrl, play });
});
mediaRecorder.stop();
});
resolve({ start, stop });
});
let recorder = null;
let audio = null;
const sleep = time => new Promise(resolve => setTimeout(resolve, time));
const start = async () => {
recorder = await recordAudio();
recorder.start();
}
const stop = async () => {
audio = await recorder.stop();
read(audio.audioUrl);
}
const play = ()=> {
audio.play();
}
const read = (blobUrl)=> {
var xhr = new XMLHttpRequest;
xhr.responseType = 'blob';
xhr.onload = function() {
var recoveredBlob = xhr.response;
const reader = new FileReader();
// This fires after the blob has been read/loaded.
reader.addEventListener('loadend', (e) => {
const ui8a = new Uint8Array(e.target.result);
const string = btoa(ui8a);
const ui8a_2 = atob(string).split(',');
playByteArray(ui8a_2);
});
// Start reading the blob as text.
reader.readAsArrayBuffer(recoveredBlob);
};
// get the blob through blob url
xhr.open('GET', blobUrl);
xhr.send();
}
window.onload = init;
var context; // Audio context
var buf; // Audio buffer
function init() {
if (!window.AudioContext) {
if (!window.webkitAudioContext) {
alert("Your browser does not support any AudioContext and cannot play back this audio.");
return;
}
window.AudioContext = window.webkitAudioContext;
}
context = new AudioContext();
}
function playByteArray(byteArray) {
var arrayBuffer = new ArrayBuffer(byteArray.length);
var bufferView = new Uint8Array(arrayBuffer);
for (i = 0; i < byteArray.length; i++) {
bufferView[i] = byteArray[i];
}
context.decodeAudioData(arrayBuffer, function(buffer) {
buf = buffer;
play2();
});
}
// Play the loaded file
function play2() {
// Create a source node from the buffer
var source = context.createBufferSource();
source.buffer = buf;
// Connect to the final output node (the speakers)
source.connect(context.destination);
// Play immediately
source.start(0);
}
var decodedString = decodeURIComponent(escape(String.fromCharCode(...new Uint8Array(err))));
var obj = JSON.parse(decodedString);
I am using this Typescript snippet:
function UInt8ArrayToString(uInt8Array: Uint8Array): string
{
var s: string = "[";
for(var i: number = 0; i < uInt8Array.byteLength; i++)
{
if( i > 0 )
s += ", ";
s += uInt8Array[i];
}
s += "]";
return s;
}
Remove the type annotations if you need the JavaScript version.
Hope this helps!

Python and Javascript Pseudo Random Number Generator PRNG

I am looking for a way to generate the same sequence of pseudo random integer numbers from both python and javascript.
When I seed in python like this I get the below results:
random.seed(3909461935)
random.randint(0, 2147483647) = 162048056
random.randint(0, 2147483647) = 489743869
random.randint(0, 2147483647) = 1561110296
I need the same sequence in javascript.
Note: I used 2147483647 as the range in the randint method because I am assuming javascript can only handle 32 bit INTs.
Are there any libraries on both sides I can use to generate the same set of pseudo random numbers given the same seed?
I have found two implementations of Mersenne Twister that generate the same 32 bit integer values given the same seed.
This way you can generate a server side sequence in Python, and have the browser independently generate the same sequence in javascript.
Python:
from mt_random import *
r = mersenne_rng(seed = 12345)
r.get_random_number() # Prints 3992670690
r.get_random_number() # Prints 3823185381
r.get_random_number() # Prints 1358822685
Javascript:
r = new MersenneTwister();
r.init_genrand(12345);
r = mersenne_rng(seed = 12345)
r.genrand_int32(); # Prints 3992670690
r.genrand_int32(); # Prints 3823185381
r.genrand_int32(); # Prints 1358822685
The JS is here:
/*
* 疑似乱数生成機 移植
*
* Mersenne Twister with improved initialization (2002)
* http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/mt.html
* http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/mt19937ar.html
*/
// = 移植元ラインセンス =======================================================
// ======================================================================
/*
A C-program for MT19937, with initialization improved 2002/2/10.
Coded by Takuji Nishimura and Makoto Matsumoto.
This is a faster version by taking Shawn Cokus's optimization,
Matthe Bellew's simplification, Isaku Wada's real version.
Before using, initialize the state by using init_genrand(seed)
or init_by_array(init_key, key_length).
Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or another materials provided with the distribution.
3. The names of its contributors may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Any feedback is very welcome.
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
email: m-mat # math.sci.hiroshima-u.ac.jp (remove space)
*/
// ======================================================================
function MersenneTwister() {
// 整数を扱うクラス
function Int32(value) {
var bits = new Array(0, 0, 0, 0);
var i;
var v = value;
if (v != 0) {
for (i = 0; i < 4; ++i) {
bits[i] = v & 0xff;
v = v >> 8;
}
}
this.getValue = function () {
return (bits[0] | (bits[1] << 8) | (bits[2] << 16)) + ((bits[3] << 16) * 0x100);
};
this.getBits = function (i) { return bits[i & 3]; };
this.setBits = function (i, val) { return (bits[i & 3] = val & 0xff); };
this.add = function (another) {
var tmp = new Int32(0);
var i, fl = 0, b;
for (i = 0; i < 4; ++i) {
b = bits[i] + another.getBits(i) + fl;
tmp.setBits(i, b);
fl = b >> 8;
}
return tmp;
};
this.sub = function (another) {
var tmp = new Int32(0);
var bb = new Array(0, 0, 0, 0);
var i;
for (i = 0; i < 4; ++i) {
bb[i] = bits[i] - another.getBits(i);
if ((i > 0) && (bb[i - 1] < 0)) {
--bb[i];
}
}
for (i = 0; i < 4; ++i) {
tmp.setBits(i, bb[i]);
}
return tmp;
};
this.mul = function (another) {
var tmp = new Int32(0);
var bb = new Array(0, 0, 0, 0, 0);
var i, j;
for (i = 0; i < 4; ++i) {
for (j = 0; i + j < 4; ++j) {
bb[i + j] += bits[i] * another.getBits(j);
}
tmp.setBits(i, bb[i]);
bb[i + 1] += bb[i] >> 8;
}
return tmp;
};
this.and = function (another) {
var tmp = new Int32(0);
var i;
for (i = 0; i < 4; ++i) {
tmp.setBits(i, bits[i] & another.getBits(i));
}
return tmp;
};
this.or = function (another) {
var tmp = new Int32(0);
var i;
for (i = 0; i < 4; ++i) {
tmp.setBits(i, bits[i] | another.getBits(i));
}
return tmp;
};
this.xor = function (another) {
var tmp = new Int32(0);
var i;
for (i = 0; i < 4; ++i) {
tmp.setBits(i, bits[i] ^ another.getBits(i));
}
return tmp;
};
this.rshifta = function (s) {
var tmp = new Int32(0);
var bb = new Array(0, 0, 0, 0, 0);
var p = s >> 3;
var i, sg = 0;
if ((bits[3] & 0x80) > 0) {
bb[4] = sg = 0xff;
}
for (i = 0; i + p < 4; ++i) {
bb[i] = bits[i + p];
}
for (; i < 4; ++i) {
bb[i] = sg;
}
p = s & 0x7;
for (i = 0; i < 4; ++i) {
tmp.setBits(i, ((bb[i] | (bb[i + 1] << 8)) >> p) & 0xff);
}
return tmp;
};
this.rshiftl = function (s) {
var tmp = new Int32(0);
var bb = new Array(0, 0, 0, 0, 0);
var p = s >> 3;
var i;
for (i = 0; i + p < 4; ++i) {
bb[i] = bits[i + p];
}
p = s & 0x7;
for (i = 0; i < 4; ++i) {
tmp.setBits(i, ((bb[i] | (bb[i + 1] << 8)) >> p) & 0xff);
}
return tmp;
};
this.lshift = function (s) {
var tmp = new Int32(0);
var bb = new Array(0, 0, 0, 0, 0);
var p = s >> 3;
var i;
for (i = 0; i + p < 4; ++i) {
bb[i + p + 1] = bits[i];
}
p = s & 0x7;
for (i = 0; i < 4; ++i) {
tmp.setBits(i, (((bb[i] | (bb[i + 1] << 8)) << p) >> 8) & 0xff);
}
return tmp;
};
this.equals = function (another) {
var i;
for (i = 0; i < 4; ++i) {
if (bits[i] != another.getBits(i)) {
return false;
}
}
return true;
};
this.compare = function (another) {
var i;
for (i = 3; i >= 0; --i) {
if (bits[i] > another.getBits(i)) {
return 1;
} else if (bits[i] < another.getBits(i)) {
return -1;
}
}
return 0;
};
}
// End of Int32
/* Period parameters */
var N = 624;
var M = 397;
var MATRIX_A = new Int32(0x9908b0df); /* constant vector a */
var UMASK = new Int32(0x80000000); /* most significant w-r bits */
var LMASK = new Int32(0x7fffffff); /* least significant r bits */
var INT32_ZERO = new Int32(0);
var INT32_ONE = new Int32(1);
var MIXBITS = function (u, v) {
return (u.and(UMASK)).or(v.and(LMASK));
};
var TWIST = function (u, v) {
return ((MIXBITS(u, v).rshiftl(1)).xor((v.and(INT32_ONE)).equals(INT32_ZERO) ? INT32_ZERO : MATRIX_A));
};
var state = new Array(); /* the array for the state vector */
var left = 1;
var initf = 0;
var next = 0;
var i;
for (i = 0; i < N; ++i) {
state[i] = INT32_ZERO;
}
/* initializes state[N] with a seed */
var _init_genrand = function (s) {
var lt1812433253 = new Int32(1812433253);
var j;
state[0]= new Int32(s);
for (j = 1; j < N; ++j) {
state[j] = ((lt1812433253.mul(state[j - 1].xor(state[j - 1].rshiftl(30)))).add(new Int32(j)));
/* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */
/* In the previous versions, MSBs of the seed affect */
/* only MSBs of the array state[]. */
/* 2002/01/09 modified by Makoto Matsumoto */
//state[j] &= 0xffffffff; /* for >32 bit machines */
}
left = 1; initf = 1;
};
this.init_genrand = _init_genrand;
/* initialize by an array with array-length */
/* init_key is the array for initializing keys */
/* key_length is its length */
/* slight change for C++, 2004/2/26 */
this.init_by_array = function (init_key, key_length) {
var lt1664525 = new Int32(1664525);
var lt1566083941 = new Int32(1566083941);
var i, j, k;
_init_genrand(19650218);
i = 1; j = 0;
k = (N > key_length ? N : key_length);
for (; k; --k) {
state[i] = ((state[i].xor((state[i - 1].xor(state[i - 1].rshiftl(30))).mul(lt1664525))).add(
new Int32(init_key[j]))).add(new Int32(j)); /* non linear */
//state[i] &= 0xffffffff; /* for WORDSIZE > 32 machines */
i++; j++;
if (i >= N) {
state[0] = state[N - 1];
i = 1;
}
if (j >= key_length) {
j = 0;
}
}
for (k = N - 1; k; --k) {
state[i] = (state[i].xor((state[i-1].xor(state[i - 1].rshiftl(30))).mul(lt1566083941))).sub(
new Int32(i)); /* non linear */
//state[i] &= 0xffffffff; /* for WORDSIZE > 32 machines */
i++;
if (i >= N) {
state[0] = state[N - 1];
i = 1;
}
}
state[0] = new Int32(0x80000000); /* MSB is 1; assuring non-zero initial array */
left = 1; initf = 1;
};
var next_state = function () {
var p = 0;
var j;
/* if init_genrand() has not been called, */
/* a default initial seed is used */
if (initf == 0) {
_init_genrand(5489);
}
left = N;
next = 0;
for (j = N - M + 1; --j; ++p) {
state[p] = state[p + M].xor(TWIST(state[p], state[p + 1]));
}
for (j = M; --j; ++p) {
state[p] = state[p + M - N].xor(TWIST(state[p], state[p + 1]));
}
state[p] = state[p + M - N].xor(TWIST(state[p], state[0]));
};
var lt0x9d2c5680 = new Int32(0x9d2c5680);
var lt0xefc60000 = new Int32(0xefc60000);
/* generates a random number on [0,0xffffffff]-interval */
var _genrand_int32 = function () {
var y;
if (--left == 0) {
next_state();
}
y = state[next];
++next;
/* Tempering */
y = y.xor(y.rshiftl(11));
y = y.xor((y.lshift(7)).and(lt0x9d2c5680));
y = y.xor((y.lshift(15)).and(lt0xefc60000));
y = y.xor(y.rshiftl(18));
return y.getValue();
};
this.genrand_int32 = _genrand_int32;
/* generates a random number on [0,0x7fffffff]-interval */
this.genrand_int31 = function () {
var y;
if (--left == 0) {
next_state();
}
y = state[next];
++next;
/* Tempering */
y = y.xor(y.rshiftl(11));
y = y.xor((y.lshift(7)).and(lt0x9d2c5680));
y = y.xor((y.lshift(15)).and(lt0xefc60000));
y = y.xor(y.rshiftl(18));
return (y.rshiftl(1)).getValue();
};
/* generates a random number on [0,1]-real-interval */
this.genrand_real1 = function () {
var y;
if (--left == 0) {
next_state();
}
y = state[next];
++next;
/* Tempering */
y = y.xor(y.rshiftl(11));
y = y.xor((y.lshift(7)).and(lt0x9d2c5680));
y = y.xor((y.lshift(15)).and(lt0xefc60000));
y = y.xor(y.rshiftl(18));
return y.getValue() * (1.0/4294967295.0);
/* divided by 2^32-1 */
};
/* generates a random number on [0,1)-real-interval */
this.genrand_real2 = function () {
var y;
if (--left == 0) {
next_state();
}
y = state[next];
++next;
/* Tempering */
y = y.xor(y.rshiftl(11));
y = y.xor((y.lshift(7)).and(lt0x9d2c5680));
y = y.xor((y.lshift(15)).and(lt0xefc60000));
y = y.xor(y.rshiftl(18));
return y.getValue() * (1.0 / 4294967296.0);
/* divided by 2^32 */
};
/* generates a random number on (0,1)-real-interval */
this.genrand_real3 = function () {
var y;
if (--left == 0) {
next_state();
}
y = state[next];
++next;
/* Tempering */
y = y.xor(y.rshiftl(11));
y = y.xor((y.lshift(7)).and(lt0x9d2c5680));
y = y.xor((y.lshift(15)).and(lt0xefc60000));
y = y.xor(y.rshiftl(18));
return (y.getValue() + 0.5) * (1.0 / 4294967296.0);
/* divided by 2^32 */
};
/* generates a random number on [0,1) with 53-bit resolution*/
this.genrand_res53 = function () {
var a = ((new Int32(_genrand_int32())).rshiftl(5)).getValue();
var b = ((new Int32(_genrand_int32())).rshiftl(6)).getValue();
return (a * 67108864.0 + b) * (1.0 / 9007199254740992.0);
};
/* These real versions are due to Isaku Wada, 2002/01/09 added */
}
The corresponding Python implementation is here:
class mersenne_rng(object):
def __init__(self, seed = 5489):
self.state = [0]*624
self.f = 1812433253
self.m = 397
self.u = 11
self.s = 7
self.b = 0x9D2C5680
self.t = 15
self.c = 0xEFC60000
self.l = 18
self.index = 624
self.lower_mask = (1<<31)-1
self.upper_mask = 1<<31
# update state
self.state[0] = seed
for i in range(1,624):
self.state[i] = self.int_32(self.f*(self.state[i-1]^(self.state[i-1]>>30)) + i)
def twist(self):
for i in range(624):
temp = self.int_32((self.state[i]&self.upper_mask)+(self.state[(i+1)%624]&self.lower_mask))
temp_shift = temp>>1
if temp%2 != 0:
temp_shift = temp_shift^0x9908b0df
self.state[i] = self.state[(i+self.m)%624]^temp_shift
self.index = 0
def get_random_number(self):
if self.index >= 624:
self.twist()
y = self.state[self.index]
y = y^(y>>self.u)
y = y^((y<<self.s)&self.b)
y = y^((y<<self.t)&self.c)
y = y^(y>>self.l)
self.index+=1
return self.int_32(y)
def int_32(self, number):
return int(0xFFFFFFFF & number)
if __name__ == "__main__":
rng = mersenne_rng(1131464071)
for i in range(10):
print rng.get_random_number()

Javascript - String matching wrong output

I have coded Boyer-Moore horspool string matching algorithm using node.js. The program works, but always outputs -1, which is what it should output if the pattern string is not in the specified text.
I am unable to figure out for the life of me what isn't working, and I would be most appreciative of a hint for what I need to fix.
My code
var horsPool = function(sText,sPattern)
{
var m = sPattern.length;
var n = sText.length;
var i = m - 1;
while(i<=n-1)
{
var k = 0;
while ((k <= m) && (sPattern[m - 1 - k]) == sText[i - k])
{
k++;
}
if(k==m)
{
return (i - m + 1);
}
else
{
i += t[sText[i]];
}
}
return -1;
}
var shiftTable = function (sPat)
{
var i;
var j;
var m;
m = sPat.length;
for(i=0; i < MAX; i++)
{
t[i] = m;
}
for (j = 0; j<m-2; j++)
{
t[sPat[j]] = m-1 -j;
}
}
var program = function()
{
var text = 'lklkababcabab';
var pattern = 'ka';
shiftTable(pattern);
var pos = horsPool(text,pattern);
if(pos >= 0)
console.log('Pattern found in %d',pos);
else
console.log('Pattern not found');
}
var MAX = new Array(256);
var t = [MAX];
program();
Any help would be greatly appreciated. Thank You!
Let's start from down under:
var MAX = new Array(256);
var t = [MAX];
does not work at all. The first line initiates an array with 256 empty entries, the second line initiates an array with one element: the array build in the line above. That's not what you wanted to do, I presume. So
var MAX = 256;
var t = new Array(MAX);
does what you want.
The lines with t[sPat[j]] and t[sText[i]] will not work as expected, because sText[i] and sPat[j] return a character instead of a number. You might give t[sPat.charCodeAt(j)] and t[sText.charCodeAt(i)] a try.
To give you a start without helping too much, here is a straight-forward implementation of the algorithm given at Wikipedia:
var horsPool = function (haystack, needle)
{
var nl = needle.length;
var hl = haystack.length;
var skip = 0;
while (hl - skip >= nl)
{
var i = nl - 1;
while (haystack[skip + i] == needle[i])
{
if (i == 0) {
return skip;
}
i--;
}
skip = skip + t[haystack.charCodeAt(skip + nl - 1)];
}
return - 1;
}
var shiftTable = function (pattern)
{
for (var i = 0; i < MAX; i++) {
t[i] = pattern.length;
}
for (var i = 0; i < pattern.length - 1; i++) {
t[pattern.charCodeAt(i)] = pattern.length - 1 - i;
}
}
var program = function ()
{
var text = 'lklkababcabab';
var pattern = 'kab';
shiftTable(pattern);
var pos = horsPool(text, pattern);
if (pos >= 0)
console.log('Pattern found in %d', pos);
else
console.log('Pattern not found');
}
var MAX = 256;
var t = new Array(256);
program();

Create Javascript array with Javascript for loop

I have a series of information that I am looking to cut down to size by looping the information. Here is the original code that is working:
$('#M1s1').css({'visibility': M1s1v});
$('#M1s2').css({'visibility': M1s2v});
$('#M1s3').css({'visibility': M1s3v});
$('#M1s4').css({'visibility': M1s4v});
$('#M1s5').css({'visibility': M1s5v});
$('#M1s6').css({'visibility': M1s6v});
$('#M1s7').css({'visibility': M1s7v});
$('#M2s1').css({'visibility': M2s1v});
$('#M2s2').css({'visibility': M2s2v});
$('#M2s3').css({'visibility': M2s3v});
$('#M2s4').css({'visibility': M2s4v});
$('#M2s5').css({'visibility': M2s5v});
$('#M2s6').css({'visibility': M2s6v});
$('#M2s7').css({'visibility': M2s7v});
$('#M3s1').css({'visibility': M3s1v});
$('#M3s2').css({'visibility': M3s2v});
$('#M3s3').css({'visibility': M3s3v});
$('#M3s4').css({'visibility': M3s4v});
$('#M3s5').css({'visibility': M3s5v});
$('#M3s6').css({'visibility': M3s6v});
$('#M3s7').css({'visibility': M3s7v});
$('#M4s1').css({'visibility': M4s1v});
$('#M4s2').css({'visibility': M4s2v});
$('#M4s3').css({'visibility': M4s3v});
$('#M4s4').css({'visibility': M4s4v});
$('#M4s5').css({'visibility': M4s5v});
$('#M4s6').css({'visibility': M4s6v});
$('#M4s7').css({'visibility': M4s7v});
$('#M5s1').css({'visibility': M5s1v});
$('#M5s2').css({'visibility': M5s2v});
$('#M5s3').css({'visibility': M5s3v});
$('#M5s4').css({'visibility': M5s4v});
$('#M5s5').css({'visibility': M5s5v});
$('#M5s6').css({'visibility': M5s6v});
$('#M5s7').css({'visibility': M5s7v});
And here is the for loops that I created to try and cut down the length of code and possibility of typing errors:
// set smc array(#M1s1, #M1s2, #M1s3, etc.)
var smc = [];
for (m = 1; m < 6; m++) {
for (s = 1; s < 8; s++) {
var smc[] = '#M' + m + 's' + s;
}
}
// set smcv array(#M1s1v, #M1s2v, #M1s3v, etc.)
var smcv = [];
for (mv = 1; mv < 6; mv++) {
for (sv = 1; sv < 8; sv++) {
var smcv[] = '#M' + mv + 's' + sv + 'v';
}
}
// loop to set visibility of small circles
for (i = 0; i < 35; i++) {
$(smc[i]).css({'visibility': smcv[i]});
}
I am really new to javascript loops and feel like I may be overlooking something basic or even a syntax error of some kind but can't put a finger on what the problem is. Any direction or assistance would be greatly appreciated!
UPDATE
Here is the final solution to my problem:
//set smc array(#M1s1, #M1s2, #M1s3, etc.)
var smc = [];
for (m = 1; m < 6; m++) {
for (s = 1; s < 8; s++) {
smc.push('#M' + m + 's' + s);
}
}
//set smcv array(#Ms1v, #M1s2v, #M1s3v, etc.)
var smcv = [];
for (mv = 1; mv < 6; mv++) {
for (sv = 1; sv < 8; sv++) {
smcv.push('M' + mv + 's' + sv + 'v');
}
}
//loop to set visibility of small circles
for (i = 0; i < 35; i++) {
$(smc[i]).css({'visibility': window[smcv[i]]});
}
You can't push value to array using var smc[] = 'something'.
Use smc.push( 'something' )
Lets say the M1s1v,M1s2v,.... values are coming from a json variable, something like this:
var x = {
M1s1v : "hidden",
M1s2v : "visibile",
...
}
then you can cut-short the code to something like this:
for (m = 1; m < 6; m++) {
for (s = 1; s < 8; s++) {
$('#M' + m + 's' + s).css({'visiblity':x['M'+m+'s'+s+'v']});
}
}
Hope it helps.
Say you have a two dimensional array, 5 x 7 for M and s holding something that will evaluate to true/false (boolean, 0, 1, empty string...).
var data = [][];
...
for (var M=0; M < data.length; M++) {
for (var s=0; s < data[M].length; M++) {
$('#M' + (M+1) + 's' + (s+1)).css({'visibility': data[M][s] ? 'visible' : 'hidden'});
}
}
You could "optimize" by using hard coded numbers instead of the lengths if you were centian of the dimensions.

LogLog and HyperLogLog algorithms for counting of large cardinalities

Where can I find a valid implementation of LogLog algorithm? Have tried to implement it by myself but my draft implementation yields strange results.
Here it is:
function LogLog(max_error, max_count)
{
function log2(x)
{
return Math.log(x) / Math.LN2;
}
var m = 1.30 / max_error;
var k = Math.ceil(log2(m * m));
m = Math.pow(2, k);
var k_comp = 32 - k;
var l = log2(log2(max_count / m));
if (isNaN(l)) l = 1; else l = Math.ceil(l);
var l_mask = ((1 << l) - 1) >>> 0;
var M = [];
for (var i = 0; i < m; ++i) M[i] = 0;
function count(hash)
{
if (hash !== undefined)
{
var j = hash >>> k_comp;
var rank = 0;
for (var i = 0; i < k_comp; ++i)
{
if ((hash >>> i) & 1)
{
rank = i + 1;
break;
}
}
M[j] = Math.max(M[j], rank & l_mask);
}
else
{
var c = 0;
for (var i = 0; i < m; ++i) c += M[i];
return 0.79402 * m * Math.pow(2, c / m);
}
}
return {count: count};
}
function fnv1a(text)
{
var hash = 2166136261;
for (var i = 0; i < text.length; ++i)
{
hash ^= text.charCodeAt(i);
hash += (hash << 1) + (hash << 4) + (hash << 7) +
(hash << 8) + (hash << 24);
}
return hash >>> 0;
}
var words = ['aardvark', 'abyssinian', ... ,'zoology']; // about 2 300 words
var log_log = LogLog(0.01, 100000);
for (var i = 0; i < words.length; ++i) log_log.count(fnv1a(words[i]));
alert(log_log.count());
For unknown reason implementation is very sensitive to max_error parameter, it is the main factor that determines the magnitude of the result. I'm sure, there is some stupid mistake :)
UPDATE: This problem is solved in the newer version of algorithm. I will post its implementation later.
Here it is the updated version of the algorithm based on the newer paper:
var pow_2_32 = 0xFFFFFFFF + 1;
function HyperLogLog(std_error)
{
function log2(x)
{
return Math.log(x) / Math.LN2;
}
function rank(hash, max)
{
var r = 1;
while ((hash & 1) == 0 && r <= max) { ++r; hash >>>= 1; }
return r;
}
var m = 1.04 / std_error;
var k = Math.ceil(log2(m * m)), k_comp = 32 - k;
m = Math.pow(2, k);
var alpha_m = m == 16 ? 0.673
: m == 32 ? 0.697
: m == 64 ? 0.709
: 0.7213 / (1 + 1.079 / m);
var M = []; for (var i = 0; i < m; ++i) M[i] = 0;
function count(hash)
{
if (hash !== undefined)
{
var j = hash >>> k_comp;
M[j] = Math.max(M[j], rank(hash, k_comp));
}
else
{
var c = 0.0;
for (var i = 0; i < m; ++i) c += 1 / Math.pow(2, M[i]);
var E = alpha_m * m * m / c;
// -- make corrections
if (E <= 5/2 * m)
{
var V = 0;
for (var i = 0; i < m; ++i) if (M[i] == 0) ++V;
if (V > 0) E = m * Math.log(m / V);
}
else if (E > 1/30 * pow_2_32)
E = -pow_2_32 * Math.log(1 - E / pow_2_32);
// --
return E;
}
}
return {count: count};
}
function fnv1a(text)
{
var hash = 2166136261;
for (var i = 0; i < text.length; ++i)
{
hash ^= text.charCodeAt(i);
hash += (hash << 1) + (hash << 4) + (hash << 7) +
(hash << 8) + (hash << 24);
}
return hash >>> 0;
}
var words = ['aardvark', 'abyssinian', ..., 'zoology']; // 2336 words
var seed = Math.floor(Math.random() * pow_2_32); // make more fun
var log_log = HyperLogLog(0.065);
for (var i = 0; i < words.length; ++i) log_log.count(fnv1a(words[i]) ^ seed);
var count = log_log.count();
alert(count + ', error ' +
(count - words.length) / (words.length / 100.0) + '%');
Here is a slightly modified version which adds the merge operation.
Merge allows you to take the counters from several instances of HyperLogLog,
and determine the unique counters overall.
For example, if you have unique visitors collected on Monday, Tuesday and Wednesday,
then you can merge the buckets together and count the number of unique visitors
over the three day span:
var pow_2_32 = 0xFFFFFFFF + 1;
function HyperLogLog(std_error)
{
function log2(x)
{
return Math.log(x) / Math.LN2;
}
function rank(hash, max)
{
var r = 1;
while ((hash & 1) == 0 && r <= max) { ++r; hash >>>= 1; }
return r;
}
var m = 1.04 / std_error;
var k = Math.ceil(log2(m * m)), k_comp = 32 - k;
m = Math.pow(2, k);
var alpha_m = m == 16 ? 0.673
: m == 32 ? 0.697
: m == 64 ? 0.709
: 0.7213 / (1 + 1.079 / m);
var M = []; for (var i = 0; i < m; ++i) M[i] = 0;
function merge(other)
{
for (var i = 0; i < m; i++)
M[i] = Math.max(M[i], other.buckets[i]);
}
function count(hash)
{
if (hash !== undefined)
{
var j = hash >>> k_comp;
M[j] = Math.max(M[j], rank(hash, k_comp));
}
else
{
var c = 0.0;
for (var i = 0; i < m; ++i) c += 1 / Math.pow(2, M[i]);
var E = alpha_m * m * m / c;
// -- make corrections
if (E <= 5/2 * m)
{
var V = 0;
for (var i = 0; i < m; ++i) if (M[i] == 0) ++V;
if (V > 0) E = m * Math.log(m / V);
}
else if (E > 1/30 * pow_2_32)
E = -pow_2_32 * Math.log(1 - E / pow_2_32);
// --
return E;
}
}
return {count: count, merge: merge, buckets: M};
}
function fnv1a(text)
{
var hash = 2166136261;
for (var i = 0; i < text.length; ++i)
{
hash ^= text.charCodeAt(i);
hash += (hash << 1) + (hash << 4) + (hash << 7) +
(hash << 8) + (hash << 24);
}
return hash >>> 0;
}
Then you can do something like this:
// initialize one counter per day
var ll_monday = HyperLogLog(0.01);
var ll_tuesday = HyperLogLog(0.01);
var ll_wednesday = HyperLogLog(0.01);
// add 5000 unique values in each day
for(var i=0; i<5000; i++) ll_monday.count(fnv1a('' + Math.random()));
for(var i=0; i<5000; i++) ll_tuesday.count(fnv1a('' + Math.random()));
for(var i=0; i<5000; i++) ll_wednesday.count(fnv1a('' + Math.random()));
// add 5000 values which appear every day
for(var i=0; i<5000; i++) {ll_monday.count(fnv1a(''+i)); ll_tuesday.count(fnv1a('' + i)); ll_wednesday.count(fnv1a('' + i));}
// merge three days together
together = HyperLogLog(0.01);
together.merge(ll_monday);
together.merge(ll_tuesday);
together.merge(ll_wednesday);
// report
console.log('unique per day: ' + Math.round(ll_monday.count()) + ' ' + Math.round(ll_tuesday.count()) + ' ' + Math.round(ll_wednesday.count()));
console.log('unique numbers overall: ' + Math.round(together.count()));
We've open sourced a project called Stream-Lib that has a LogLog implementation. The work was based on this paper.
Using the js version #actual provided, I tried to implement the same in C#, which seems close enough. Just changed fnv1a function a little bit and renamed it to getHashCode. (Credit goes to Jenkins hash function, http://en.wikipedia.org/wiki/Jenkins_hash_function)
public class HyperLogLog
{
private double mapSize, alpha_m, k;
private int kComplement;
private Dictionary<int, int> Lookup = new Dictionary<int, int>();
private const double pow_2_32 = 4294967297;
public HyperLogLog(double stdError)
{
mapSize = (double)1.04 / stdError;
k = (long)Math.Ceiling(log2(mapSize * mapSize));
kComplement = 32 - (int)k;
mapSize = (long)Math.Pow(2, k);
alpha_m = mapSize == 16 ? (double)0.673
: mapSize == 32 ? (double)0.697
: mapSize == 64 ? (double)0.709
: (double)0.7213 / (double)(1 + 1.079 / mapSize);
for (int i = 0; i < mapSize; i++)
Lookup[i] = 0;
}
private static double log2(double x)
{
return Math.Log(x) / 0.69314718055994530941723212145818;//Ln2
}
private static int getRank(uint hash, int max)
{
int r = 1;
uint one = 1;
while ((hash & one) == 0 && r <= max)
{
++r;
hash >>= 1;
}
return r;
}
public static uint getHashCode(string text)
{
uint hash = 0;
for (int i = 0, l = text.Length; i < l; i++)
{
hash += (uint)text[i];
hash += hash << 10;
hash ^= hash >> 6;
}
hash += hash << 3;
hash ^= hash >> 6;
hash += hash << 16;
return hash;
}
public int Count()
{
double c = 0, E;
for (var i = 0; i < mapSize; i++)
c += 1d / Math.Pow(2, (double)Lookup[i]);
E = alpha_m * mapSize * mapSize / c;
// Make corrections & smoothen things.
if (E <= (5 / 2) * mapSize)
{
double V = 0;
for (var i = 0; i < mapSize; i++)
if (Lookup[i] == 0) V++;
if (V > 0)
E = mapSize * Math.Log(mapSize / V);
}
else
if (E > (1 / 30) * pow_2_32)
E = -pow_2_32 * Math.Log(1 - E / pow_2_32);
// Made corrections & smoothen things, or not.
return (int)E;
}
public void Add(object val)
{
uint hashCode = getHashCode(val.ToString());
int j = (int)(hashCode >> kComplement);
Lookup[j] = Math.Max(Lookup[j], getRank(hashCode, kComplement));
}
}
I know this is an old post but the #buryat implementation has moved, and is in any case incomplete, and a bit on the slow side (sorry o_o ).
I've taken the implementation used by the new Redis release which can be found here and ported it to PHP. The repo is here https://github.com/joegreen0991/HyperLogLog
<?php
class HyperLogLog {
private $HLL_P_MASK;
private $HLL_REGISTERS;
private $ALPHA;
private $registers;
public function __construct($HLL_P = 14)
{
$this->HLL_REGISTERS = (1 << $HLL_P); /* With P=14, 16384 registers. */
$this->HLL_P_MASK = ($this->HLL_REGISTERS - 1); /* Mask to index register. */
$this->ALPHA = 0.7213 / (1 + 1.079 / $this->HLL_REGISTERS);
$this->registers = new SplFixedArray($this->HLL_REGISTERS);
for ($i = 0; $i < $this->HLL_REGISTERS; $i++) {
$this->registers[$i] = 0;
}
}
public function add($v)
{
$h = crc32(md5($v));
$h |= 1 << 63; /* Make sure the loop terminates. */
$bit = $this->HLL_REGISTERS; /* First bit not used to address the register. */
$count = 1; /* Initialized to 1 since we count the "00000...1" pattern. */
while(($h & $bit) == 0) {
$count++;
$bit <<= 1;
}
/* Update the register if this element produced a longer run of zeroes. */
$index = $h & $this->HLL_P_MASK; /* Index a register inside registers. */
if ($this->registers[$index] < $count) {
$this->registers[$index] = $count;
}
}
public function export()
{
$str = '';
for ($i = 0; $i < $this->HLL_REGISTERS; $i++) {
$str .= chr($this->registers[$i]);
}
return $str;
}
public function import($str)
{
for ($i = 0; $i < $this->HLL_REGISTERS; $i++) {
$this->registers[$i] = isset($str[$i]) ? ord($str[$i]) : 0;
}
}
public function merge($str)
{
for ($i = 0; $i < $this->HLL_REGISTERS; $i++) {
if(isset($str[$i]))
{
$ord = ord($str[$i]);
if ($this->registers[$i] < $ord) {
$this->registers[$i] = $ord;
}
}
}
}
/**
* #static
* #param $arr
* #return int Number of unique items in $arr
*/
public function count() {
$E = 0;
$ez = 0;
for ($i = 0; $i < $this->HLL_REGISTERS; $i++) {
if ($this->registers[$i] !== 0) {
$E += (1.0 / pow(2, $this->registers[$i]));
} else {
$ez++;
$E += 1.0;
}
}
$E = (1 / $E) * $this->ALPHA * $this->HLL_REGISTERS * $this->HLL_REGISTERS;
/* Use the LINEARCOUNTING algorithm for small cardinalities.
* For larger values but up to 72000 HyperLogLog raw approximation is
* used since linear counting error starts to increase. However HyperLogLog
* shows a strong bias in the range 2.5*16384 - 72000, so we try to
* compensate for it. */
if ($E < $this->HLL_REGISTERS * 2.5 && $ez != 0) {
$E = $this->HLL_REGISTERS * log($this->HLL_REGISTERS / $ez);
}
else if ($this->HLL_REGISTERS == 16384 && $E < 72000) {
// We did polynomial regression of the bias for this range, this
// way we can compute the bias for a given cardinality and correct
// according to it. Only apply the correction for P=14 that's what
// we use and the value the correction was verified with.
$bias = 5.9119 * 1.0e-18 * ($E*$E*$E*$E)
-1.4253 * 1.0e-12 * ($E*$E*$E)+
1.2940 * 1.0e-7 * ($E*$E)
-5.2921 * 1.0e-3 * $E+
83.3216;
$E -= $E * ($bias/100);
}
return floor($E);
}
}
I implemented loglog and hyperloglog in JS and PHP and well-commented code https://github.com/buryat/loglog

Categories

Resources