Weird comparison performance on Javascript object key lookup - javascript

Presentation :
I am working on a piece of code that can compare two javascript Object by looping into the first one (called A) to perform a key lookup in the second one called B (I put value as key and occurence as value).
But when I am measuring the performance of the subkey key lookup of Object A (10 times per amount of data, with data as changing parameters for each 10 times the program runs (100 per row, 200...) I get high timing for the smallest amount of data (so potentially less key in the dict B)
Objects layout :
Object A looks like below:
{
SQL_TABLE_1:
{
column_1:
[
'2jmj7l5rSfeb/vlWAYkK/YBwIDk=',
'3MaRDFGBKvsLLhrLUdplz3wUiOI=',
'PEvUFHDR4HbOYXcj7danOvyRJqs='
'XHvERAKZ4AqU+iWlx2scZXdem80=',
'nSG0lvwlkIe5YxZGTo5binr3pAw=',
'Swuc/7YCU9Ptfrff+KHaJJ1+b7U=',
'N28qqdfezfZbPmK7CaGmj7m7IOQ=',
'ffc7skeffegT1ZytAqjco3EpwUE=',
'2XldayBefzBxsUuu6tMFYHVZEEY=',
'5rC2y6IzadQ1aEy7CvNyr30JJ2k='
]
},
SQL_TABLE_2:
{
column_1:[......]
}
}
Object B field have various size but this size never change in our tests
And Object B looks like:
[
field_1:
{
'2jmj7l5rSfeb/vlWAYkK/YBwIDk=': 1,
'3MaRDFGBKvsLLhrLUdplz3wUiOI=': 1,
'PEvUFHDR4HbOYXcj7danOvyRJqs=': 1,
'XHvERAKZ4AqU+iWlx2scZXdem80=': 4,
'nSG0lvwlkIe5YxZGTo5binr3pAw=': 1,
'Swuc/7YCU9Ptfrff+KHaJJ1+b7U=': 1,
'N28qqdfezfZbPmK7CaGmj7m7IOQ=': 27,
'ffc7skeffegT1ZytAqjco3EpwUE=': 1,
'2XldayBefzBxsUuu6tMFYHVZEEY=': 18,
'5rC2y6IzadQ1aEy7CvNyr30JJ2k=': 1 },
field_2:{......}
]
Timing measurement in the code is structured like this:
sql_field_1:
{
mongo_field_1: 0.003269665241241455, mongo_field_2: 0.0015446391105651855, mongo_field_3: 0.0009834918975830079, mongo_field_4: 0.0004488091468811035,
},
sql_field_2:
{
....
}
Goal
The goal is to perform for each sub-subkey of Object A a key lookup on the Object B subkeys.
Code
Here's the code that cause this behavior:
Object A is called sql_dict_array
Object B is called hash_mongo_dict
for(var field_name in hash_mongo_dict)
{
performance_by_mongo_field[field_name] = {};
result_by_mongo_field[field_name] = {};
// LOOP ON OBJECT A KEYS
for(var table_name in sql_dict_array)
{
// Start of time measurement
var start_time = performance.now();
// there is only one column in our test data
for(var column_name in sql_dict_array[table_name])
{
found_count = 0;
for(var value of sql_dict_array[table_name][column_name])
{
// **SUBKEY LOOPKUP HERE WITH VALUE**
var results = hash_mongo_dict[field_name][value];
// IF NOT UNDEFINED THEN IT HAS BEEN FOUND
// THIS CHECK IS NOT THE BOTTLENECK
if(results != undefined)
{
found_count+=results;
}
}
if(found_count > limit_parameter)
{
console.log("error: too many match between hashes")
process.exit(0)
}
// PERFORMANCE CALCULATION
performance_by_mongo_field[field_name][table_name] = (performance.now() - start_time)/1000;
result_by_mongo_field[field_name][table_name+'.'+column_name] = (found_count/verif_hash_count*100);
}
}
return some results...
}
Testing:
With this code, I expect to have almost constant time whatever the size of the Object B (amount of subkey) but in my code I have higher time when I have only 10 subkeys in the nested object A, and it become stable when reaching 100 keys or more (tested with 6000 keys)
Here's 10 runs for the key lookup code of one key of Object A containing 10 subkeys with 300.000+ data from Object B:
0.2824700818061829 0.2532380700111389 0.2455208191871643 0.2610406551361084 0.2840422649383545 0.2344329071044922 0.2375670108795166 0.23545906591415405 0.23111085414886476 0.2363566837310791
Here's the same comparison but with 4000+ subkeys:
0.0027927708625793456 0.0018292622566223144 0.015235211849212647 0.0036304402351379395 0.002919149875640869 0.004972007751464844 0.014655702114105225 0.003572652339935303 0.0032280778884887697 0.003232938766479492
I will appreciate every advice you can provide me,

Related

Get "leaderboard" of list of numbers

I am trying to get a kind of "leaderboard" from a list of numbers. I was thinking of making an array with all the numbers like this
var array = [];
for (a = 0; a < Object.keys(wallets.data).length; a++) { //var wallets = a JSON (parsed) response code from an API.
if (wallets.data[a].balance.amount > 0) {
array.push(wallets.data[a].balance.amount)
}
}
//Add some magic code here that sorts the array into descending numbers
This is a great option, however I need some other values to come with the numbers (one string). That's why I figured JSON would be a better option than an array.
I just have no idea how I would implement this.
I would like to get a json like this:
[
[
"ETH":
{
"balance":315
}
],
[
"BTC":
{
"balance":654
}
],
[
"LTC":
{
"balance":20
}
]
]
And then afterwards being able to call them sorted descending by balance something like this:
var jsonarray[0].balance = Highest number (654)
var jsonarray[1].balance = Second highest number (315)
var jsonarray[2].balance = Third highest number (20)
If any of you could help me out or point me in the right direction I would appreciate it greatly.
PS: I need this to happen in RAW JS without any html or libraries.
You should sort the objects before making them a JSON. You can write your own function or use a lambda. See this [https://stackoverflow.com/questions/1129216/sort-array-of-objects-by-string-property-value]
Since you are dealing with cryptocurrency you can use the currency-code as a unique identifier.
Instead of an array, you can define an object with the currency as properties like this:
const coins = {
ETH: [300, 200, 500],
BTC: [20000, 15000, 17000]
}
then you can access each one and use Math.max or Math.min to grab the highest / lowest value of that hashmap. E.G. Math.max(coins.BTC)
And if you need to iterate over the coins you have Object.keys:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/keys
Thank you all for your answer. I ended up using something like:
leaderboard = []
for (a = 0; a < Object.keys(wallets.data).length; a++) {
if (wallets.data[a].balance.amount > 0) {
leaderboard.push({"currency":wallets.data[a].balance.currency, "price":accprice}) //accprice = variable which contains the value of the userhold coins of the current coin in EUR
}
}
console.log(leaderboard.sort(sort_by('price', true, parseInt)));

Pass arbitrary Javascript data object to Node.js C++ addon

I have a Node.js addon written in C++ using Nan. Works fantastically. However, I've not been able to figure out how to have my Node Javascript code pass an arbitrary data object (ex. {attr1:42, attr2:'hi', attr3:[5,4,3,2,1]}) to the C++ addon.
Until now, I've got around this by calling JSON.stringify() on my data object and then parsing the stringified JSON on the C++ side.
Ideally, I'd like to avoid copying data and just get a reference to the data object that I can access, or at least to copy it natively and avoid stringifying/parsing...
Any help would be appreciated!
You can allow your Node.js c++ addons to take arbitrary typed arguments, but you must check and handle the types explicitly. He is a simple example function that shows how to do this:
void args(const Nan::FunctionCallbackInfo<v8::Value>& info) {
int i = 0;
while (i < info.Length()) {
if (info[i]->IsBoolean()) {
printf("boolean = %s", info[i]->BooleanValue() ? "true" : "false");
} else if (info[i]->IsInt32()) {
printf("int32 = %ld", info[i]->IntegerValue());
} else if (info[i]->IsNumber()) {
printf("number = %f", info[i]->NumberValue());
} else if (info[i]->IsString()) {
printf("string = %s", *v8::String::Utf8Value(info[i]->ToString()));
} else if (info[i]->IsObject()) {
printf("[object]");
v8::Local<v8::Object> obj = info[i]->ToObject();
v8::Local<v8::Array> props = obj->GetPropertyNames();
for (unsigned int j = 0; j < props->Length(); j++) {
printf("%s: %s",
*v8::String::Utf8Value(props->Get(j)->ToString()),
*v8::String::Utf8Value(obj->Get(props->Get(j))->ToString())
);
}
} else if (info[i]->IsUndefined()) {
printf("[undefined]");
} else if (info[i]->IsNull()) {
printf("[null]");
}
i += 1;
}
}
To actually solve the problem of handling arbitrary arguments that may contain objects with arbitrary data, I would recommend writing a function that parses an actual object similar to how I parsed function arguments in this example. Keep in mind that you may need to do this recursively if you want to be able to handle nested objects within the object.
You don't have to stringify your object to pass it to c++ addons. There are methods to accept those
arbitary objects. But it is not so arbitary. You have to write different codes to parse the object in c++ .
Think of it as a schema of a database. You can not save different format data in a single collection/table.
You will need another table/collection with the specific schema.
Let's see this example:
We will pass an object {x: 10 , y: 5} to addon, and c++ addon will return another object with sum and product of the
properties like this: {x1:15,y1: 50}
In cpp code :
NAN_METHOD(func1) {
if (info.Length() > 0) {
Local<Object> obj = info[0]->ToObject();
Local<String> x = Nan::New<String>("x").ToLocalChecked();
Local<String> y = Nan::New<String>("y").ToLocalChecked();
Local<String> sum = Nan::New<String>("sum").ToLocalChecked();
Local<String> prod = Nan::New<String>("prod").ToLocalChecked();
Local<Object> ret = Nan::New<Object>();
double x1 = Nan::Get(obj, x).ToLocalChecked()->NumberValue();
double y1 = Nan::Get(obj, y).ToLocalChecked()->NumberValue();
Nan::Set(ret, sum, Nan::New<Number>(x1 + y1));
Nan::Set(ret, prod, Nan::New<Number>(x1 * y1));
info.GetReturnValue().Set(ret);
}
}
In javascript::
const addon = require('./build/Release/addon.node');
var obj = addon.func1({ 'x': 5, 'y': 10 });
console.log(obj); // { sum: 15, prod: 50 }
Here you can only send {x: (Number), y: (number)} type object to addon only. Else it will not be able to parse or
retrieve data.
Like this for the array:
In cpp:
NAN_METHOD(func2) {
Local<Array> array = Local<Array>::Cast(info[0]);
Local<String> ss_prop = Nan::New<String>("sum_of_squares").ToLocalChecked();
Local<Array> squares = New<v8::Array>(array->Length());
double ss = 0;
for (unsigned int i = 0; i < array->Length(); i++ ) {
if (Nan::Has(array, i).FromJust()) {
// get data from a particular index
double value = Nan::Get(array, i).ToLocalChecked()->NumberValue();
// set a particular index - note the array parameter
// is mutable
Nan::Set(array, i, Nan::New<Number>(value + 1));
Nan::Set(squares, i, Nan::New<Number>(value * value));
ss += value*value;
}
}
// set a non index property on the returned array.
Nan::Set(squares, ss_prop, Nan::New<Number>(ss));
info.GetReturnValue().Set(squares);
}
In javascript:
const addon = require('./build/Release/addon.node');
var arr = [1, 2, 3];
console.log(addon.func2(arr)); //[ 1, 4, 9, sum_of_squares: 14 ]
Like this, you can handle data types. If you want complex objects or operations, you just
have to mix these methods in one function and parse the data.

2 dimensional JavaScript Array generated by for loop is being overwritten by last loop result

Im trying to make pseudo-random sequence generator that works just line Linear Feedback Shift Register.
Im doing it in JavaScript because its the only language that I know and im using HTML to create GUI.
User should type in initial value and get schematic diagram and pseudo-random sequence itself.
Here is my JavaScript code:
var UserInput = document.getElementById('ulaz');
var Output = document.getElementById('izlaz');
//variable `data` is an array of objects which I used to store pictures of circuits
// and [taps][3] necessary for shift registers to give max possible length output
// before going into loop which is 2^n-1, where n (`bit` in my code) is number of
//register blocks and number of digits in input value.
function pss (){
var data = [
{
slika:"pic/2bit.png",
tap:[0,1]
},
{
slika:"pic/3bit.png",
tap:[0,2]
},
{
slika:"pic/4bit.png",
tap:[0,3]
},
{
slika:"pic/5bit.png",
tap:[1,4]
},
{
slika:"pic/6bit.png",
tap:[0,5]
},
{
slika:"pic/7bit.png",
tap:[0,6]
},
{
slika:"pic/8bit.png",
tap:[1,2,3,7]
},
{
slika:"pic/9bit.png",
tap:[3,8]
},
{
slika:"pic/10bit.png",
tap:[2,9]
},
{
slika:"pic/11bit.png",
tap:[1,10]
},
{
slika:"pic/12bit.png",
tap:[0,3,5,11]
},
{
slika:"pic/13bit.png",
tap:[0,2,3,12]
},
{
slika:"pic/14bit.png",
tap:[0,2,4,13]
},
{
slika:"pic/15bit.png",
tap:[0,14]
},
{
slika:"pic/16bit.png",
tap:[1,2,4,15]
},
{
slika:"pic/17bit.png",
tap:[2,16]
},
{
slika:"pic/18bit.png",
tap:[6,17]
},
{
slika:"pic/19bit.png",
tap:[0,1,4,18]
},
{
slika:"pic/20bit.png",
tap:[2,19]
},
{
slika:"pic/21bit.png",
tap:[1,20]
},
{
slika:"pic/22bit.png",
tap:[0,21]
},
{
slika:"pic/23bit.png",
tap:[4,22]
},
{
slika:"pic/24bit.png",
tap:[0,2,3,23]
},
{
slika:"pic/25bit.png",
tap:[2,24]
},
{
slika:"pic/26bit.png",
tap:[0,1,5,25]
},
{
slika:"pic/27bit.png",
tap:[0,1,4,26]
},
{
slika:"pic/28bit.png",
tap:[2,27]
},
{
slika:"pic/29bit.png",
tap:[0,28]
},
{
slika:"pic/30bit.png",
tap:[0,3,5,29]
},
{
slika:"pic/31bit.png",
tap:[2,30]
},
{
slika:"pic/32bit.png",
tap:[1,5,6,31]
}
];
var first = UserInput.value.split("");
for (k=0;k<first.length;k++) first[k] = +first[k];
//first is just UserInput separated in one char strings than parsed to integers
var bit = first.length - 2;
// I subtracted 2 here so I can access objects from data
var matrix = [first];
var t = 0;
var between;
var z;
for (i=1; i<Math.pow(2, bit+2)-1; i++){ //here is that 2^n-1. +2 is because i had -2 before. For loop is starting from 1 and ending with <2^n-1 because i already have first array of matrix
for (j=0; j<data[bit].tap.length; j++){
z = data[bit].tap[j];
t = t ^ matrix[i-1][z];
} // this for makes "t" which is all taps XOR-ed. If user input was 101, tap would be [0,2] and t would be 1xor1=0
between = matrix[i-1];
console.log(between);
between.unshift(t);
between.pop();
matrix[i] = between;
t=0; // here Im "shifting registers" or just placing t in front of last generated row and removing its last digit, thus generating new row
}
console.log(matrix);
}
and here is HTML so you can run it.
variable data is an array of objects which I used to store pictures of circuits and taps necessary for shift registers to give max possible length output before going into loop which is 2^n-1, where n (bit in my code) is number of register blocks and number of digits in input value.
So problem is: console.log(between); which logs last generated row is all correct except, ofc, there is no last row because it shows last generated, but than console.log(matrix) which should log complete matrix , shows all rows overwritten by last one.
So for user input 101, matrix should be
101
010
001
100
110
111
011
but is just
011
011
011 ...
I cant figure out what is wrong with it if part before console.log(between); is all fine...
P.S. Code is not finished it wont display solution in HTML, and there still needs to be part of function that makes an array from last column of matrix, which is pseudo-random sequence.
I realized that var between refers to the same array as var matrix[i-1], rather than a new, independent array.
between = matrix[i-1];
So, if you want to store only values of matrix[i-1], not to create reference, you can do this like this:
between = JSON.parse(JSON.stringify(matrix[i-1]));
In js when you copy array in some variable, you create reference of that array by default. There are many ways to avoid this, and you can find many examples here.
I do not know why but I've come to a solution (will investigate more when I get free time).
for (i=1; i<Math.pow(2, bit+2)-1; i++){ //here is that 2^n-1. +2 is because i had -2 before. For loop is starting from 1 and ending with <2^n-1 because i already have first array of matrix
for (j=0; j<data[bit].tap.length; j++){
z = data[bit].tap[j];
t = t ^ matrix[i-1][z];
} // this for makes "t" which is all taps XOR-ed. If user input was 101, tap would be [0,2] and t would be 1xor1=0
between = matrix[i-1];
console.log(between);
between.unshift(t);
between.pop();
// MODIFICATION
var between_string = between;
matrix[i] = between_string.join(); // Turn it to a string
matrix[i] = matrix[i].split(','); // Turn it back to array to keep it working on the for loop above.
// END MODIFICATION
t=0; // here Im "shifting registers" or just placing t in front of last generated row and removing its last digit, thus generating new row
}
Now, when you print it out in the console it'll show you a bidimentional array, although it's weird, sometimes (on my console) it shows int numbers and sometimes mixed with string numbers (respecting the original value of between).
Edit: I tried only using "101" for the input.
Second edit: Okay, I feel ashamed, the reason why it returns [1, "0", "0"] (example) is because of split(',') for "1,0,0"(only tow numbers were preceded by comas). Haha. Sorry.

Fastest datastructure for filtering schema-less collections

Lets say I have a collection
var data = [
{ fieldA: 5 },
{ fieldA: 142, fieldB: 'string' },
{ fieldA: 1324, fieldC: 'string' },
{ fieldB: 'string', fieldD: 111, fieldZ: 'somestring' },
...
];
Lets assume fields are not uniform across elements but I know in advance the number of unique fields, and that the collection is not dynamic.
I want to filter it with something like _.findWhere. This is simple enough, but what if I want to prioritize speed over ease? Is there a better data structure that will always minimize the number of elements that will be checked? Perhaps some kind of tree?
Yes, there is something faster if your queries are of the type "give me all records with fieldX=valueY". However, it does have an overhead.
For each field, build an inverted index that lists all the record-ids ( = row positions in the original data) that have each value:
var indexForEachField = {
fieldA: { "5": [0], "142": [1], "1324": [2]},
...
}
When someone asks for "records where fieldX=valueY", you return
indexForEachField["fieldX"]["valueY"]; // an array with all results
Lookup time is therefore constant (and requires only 2 lookups in tables), but you do need to keep your index up to date.
This is a generalization of the strategy used by search engines to look up webpages with certain terms; in that scenario, it is called an inverted index.
Edit: what if you want to find all records with fieldX=valueX and fieldY=valueY?
You would use the following code, which requires all input arrays
to be sorted:
var a = indexForEachField["fieldX"]["valueX"];
var b = indexForEachField["fieldY"]["valueY"];
var c = []; // result array: all elements in a AND in b
for (var i=0, j=0; i<a.length && j<b.length; /**/) {
if (a[i] < b[j]) {
i++;
} else if (a[i] > b[j]) {
j++;
} else {
c.push(a[i]);
i++; j++;
}
}
You can see that, in the worst case, the total complexity is exactly a.length + b.length; and, in the best case, half of that. You can use something very similar to implement OR.

Couchbase, reduction too large error

On my work I using couchbase and I have some problems. From some devices data come to couchbase, and after I calling aggregate view. This view must aggregate values by 2 keys: timestamp and deviceId.
Everything was fine, before I have tried to aggregate more then 10k values. In this case I have reduction error
Map function:
function(doc, meta)
{
if (doc.type == "PeopleCountingIn"&& doc.undefined!=true)
{
emit(doc.id+"#"+doc.time, [doc.in, doc.out, doc.id, doc.time, meta.id]);
}
}
Reduce function:
function(key, values, rereduce)
{
var result =
{
"id":0,
"time":0,
"in" : 0,
"out" : 0,
"docs":[]
};
if (rereduce)
{
result.id=values[0].id;
result.time = values[0].time;
for (i = 0; i<values.length; i++)
{
result.in = result.in + values[i].in;
result.out = result.out + values[i].out;
for (j = 0; j < values[i].docs.length; j++)
{
result.docs.push(values[i].docs[j]);
}
}
}
else
{
result.id = values[0][2];
result.time = values[0][3];
for(i = 0; i < values.length; i++)
{
result.docs.push(values[i][4]);
result.in = result.in + values[i][0];
result.out = result.out + values[i][1];
}
}
return result;
}
Document sample:
{
"id": "12292228#0",
"time": 1401431340,
"in": 0,
"out": 0,
"type": "PeopleCountingIn"
}
UPDATE
Output document:
{"rows":[
{"key":"12201774#0#1401144240","value":{"id":"12201774#0","time":1401144240,"in":0,"out":0,"docs":["12231774#0#1401546080#1792560127"]}},
{"key":"12201774#0#1401202080","value":{"id":"12201774#0","time":1401202080,"in":0,"out":0,"docs":["12201774#0#1401202080#1792560840"]}}
]
}
}
Error occurs in the case where "docs" array length more then 100. And I think in that cases working rereduce function. Is there some way to fix this error exept making count of this array less?
There are a number of limits on the output of map & reduce functions, to prevent indexes taking too long and/or growing too large.
These are in the process of being added to the official documentation, but in the meantime quoting from the issue (MB-11668) tracking the documentation update:
1) indexer_max_doc_size - documents larger then this value are skipped by the
indexer. A message is logged (with document ID, its size, bucket name, view name, etc)
when such a document is encountered. A value of 0 means no limit (like what it used to
be before). Current default value is 1048576 bytes (1Mb).
2) max_kv_size_per_doc - maximum total size (bytes) of KV pairs that can be emitted for
a single document for a single view. When such limit is passed, message is logged (with
document ID, its size, bucket name, view name, etc). A value of 0 means no limit (like what
it used to be before). Current default value is 1048576 bytes (1Mb)
Edit: Additionally, there is a limit of 64kB for the size of a single reduction (output of the reduce(). I suggest you re-work your reduce function to return data within this limit. See MB-7952 for a technical discussion on why this is the case.

Categories

Resources