I've got an array with about 250 entries in it, each their own array of values. Each entry is a point on a map, and each array holds info for:
name,
another array for points this point can connect to,
latitude,
longitude,
short for of name,
a boolean,
and another boolean
The array has been written by another developer in my team, and he has written it as such:
names[0]=new Array;
names[0][0]="Campus Ice Centre";
names[0][1]= new Array(0,1,2);
names[0][2]=43.95081811364498;
names[0][3]=-78.89848709106445;
names[0][4]="CIC";
names[0][5]=false;
names[0][6]=false;
names[1]=new Array;
names[1][0]="Shagwell's";
names[1][1]= new Array(0,1);
names[1][2]=43.95090307839151;
names[1][3]=-78.89815986156464;
names[1][4]="shg";
names[1][5]=false;
names[1][6]=false;
Where I would probably have personally written it like this:
var names = []
names[0] = new Array("Campus Ice Centre", new Array[0,1,2], 43.95081811364498, -78.89848709106445, "CIC", false, false);
names[1] = new Array("Shagwell's", new Array[0,1], 43.95090307839151, -78.89815986156464, 'shg", false, false);
They both work perfectly fine of course, but what I'm wondering is:
1) does one take longer than the other to actually process?
2) am I incorrect in assuming there is a benefit to the compactness of my version of the same thing?
I'm just a little worried about his 3000 lines of code versus my 3-400 to get the same result.
Thanks in advance for any guidance.
What you really want to do here is define a custom data type which represents your data more accurately. I'm not sure what language you are using so here is some psuedocode:
class Location
{
double latitude;
double longitude;
String Name;
String Abbreviation;
bool flag1;//you should use a better name
bool flag2;
}
Then you can just create an array to hold all the Location objects and it would be much more readable and maintainable.
Locations = new Array;
Locations[0] = new Location("Shagwell's",...);
....
===EDIT===
Because you said you are using javascript then the best practise would probably be to store your data in a json text file, this has the benefit of removing the data from the code file and having a very easily editable data source if you want to make changes.
your JSON file would look like this
[{"lat":"23.2323", "long":"-72.3", "name":"Shagwell's" ...},
{"lat":"26.2323", "long":"-77.3", "name":"loc2" ...},
...]
You could then store the json text in an accesible place on your webserver say "data.json", then if you are using jquery you can load it in by doing something like this:
$.getJSON("data.json", function(data) { //do something with the data});
With structured data, like your example, both you and your co-worker are relatively "wrong". From the looks of things, you should have implemented an array of structures, assuming of course that the data you are presenting is truly unordered, which I would be willing to guess it probably isn't. Arrays are used too often, because they are amongst the first data structures we learn, but very often aren't the best choice.
As to performance, that more often comes down to the data access code than the data type itself. Frankly too, unless you are dealing with gigantic datasets or literally real time applications, performance should be a non issue.
As to the two examples you have posted, after the compiler is done with them, they will be virtually identical.
I personally find the former much more readable. From a performance perspective, the difference is probably minimal.
Leaving the other answers aside (although I agree with the others that you need structs here) your co-workers way seems better to me. Like Serapth says, the compiler will optimize away the differences and the original code has better readability.
Related
Sample object:
const myArray = {'attributes':{ 'fullName': 'Foo Bar'};
During code review, I found that one key(string type) was being used to access the object in multiple functions.
Now my question is, should we access the object directly by using the string literal as key, e.g. myArray['attributes']['fullName']
or use a constant instead, like :
const ATTRIBUTES = 'attributes';
const FULLNAME = 'fullName';
someVar = myArray[ATTRIBUTES][FULLNAME];
According to my knowledge, the latter approach is better because it reserves only one memory block.
But my friend had a different opinion, he told if we use the string literal for key, then it won't have any impact on memory.
Now, I am confused and don't know which approach is better.
Could anyone help me understanding which one is better with explanation?
Something like this will make no performance difference. In my opinion the second option is less readable too.
I would take
myArray.attributes.fullName
because it shows the compilter to use a constant to a value, instead of using brwacket notation where the structure is aimed for dynamic use.
I don't know about the memory, and I'm using the first option, but in case of the key changing I'll have to modify it at several places rather than one so for code retake it's not so good
To preface, I have found a solution that works for me in the situations I have tried it, but I am fairly new to javascript and RXJS and don't fully understand what the solution is doing so I can't be sure if it will work in every instance.
I am using reactive forms and have arrays of checkboxes, what I would like to do is get an array of all of the keys used to generate the checkboxes, but without any of the unchecked boxes. Ideally, I could use this to return additional values as well such as a user-readable string, but that is far from important as I can do that in a separate step fairly easily. I have come up with a few methods of my own, but they don't seem to be very robust or performant.
I would have replied on this thread, but I lack the reputation.
This is the best solution I have found, which I would be totally happy to use, but I don't have much experience with RXJS or maps in javascript so I am not totally sure what it is doing:
this.controlNames = Object.keys(this.checkboxGroup.controls).map(_=>_); //This seems to just produce an object of control keys as properties and their value
this.selectedNames = this.checkboxGroup.valueChanges.pipe(map(v => Object.keys(v).filter(k => v[k]))); //Some sort of magic happens and an array is generated and contains only the keys whose values are 'true'
I have tried breaking that snippet apart and using console.log to test what it is doing in each step, but it really didn't give me much useful information. Any advice or or better ideas would be thoroughly appreciated, there seem to be a lot of conventions in javascript that people adhere to and it can be hard to sort through what is a convention and what is actually doing something.
I think I found a way to break it down and get a grip on it and want to post my explanation for anyone who comes looking.
In this part, it is just creating an iterable map of the 'checkboxGroup.controls' object. This could have been used to loop over in the template and make all of the checkboxes. Since my form structure is already generated from arrays of objects with known properties, I don't need this. The underscores aren't doing anything special here, people just like to use them for private variables.
this.controlNames = Object.keys(this.checkboxGroup.controls).map(_=>_);
For those who are new to arrow functions or some of the conventions of javascript, the code above is not quite, but essentially shorthand for this:
this.controlNames = [];
Object.keys(this.checkboxGroup.controls).forEach(function(key) {
this.controlNames.push(key);
}
I have changed the short variables to longer ones to make them easier to understand in this second part. This maps the value changes observable as an iterable 'changesObj', retrieves the keys, and filters the keys by instances where the key has a true value. The code filter(key => changesObj[key]) returns the key if the key is not null, undefined, or false.
this.selectedNames = this.checkboxGroup.valueChanges.pipe(map(changesObj => Object.keys(changesObj).filter(key => changesObj[key])));
This is essentially just shorthand for this:
function propNotFalse (changes, prop) {
return changes[prop] == true;
}
this.selectedNames = this.alternateFilter = Object.keys(this.checkboxGroup.valueChanges).filter(this.propNotFalse.bind(null, this.checkboxGroup.valueChanges));
EDIT
Did a JSPerf. Ran it against Chrome as Chrome uses v8.
http://jsperf.com/passing-large-objects
It looks like passing a large object doesn't matter; the difference is negligible. However, lookup on an object at some point gets a lot slower.
INTRODUCTION:
I’m writing a 2D JavaScript game engine while following component based and data oriented (via typed arrays) design principles. It’s designed for use by a simulation based multiplayer netcode.
My performance concerns are for the master simulation that will be running on the server; I believe that client browsers will be more than fast enough. As of now, the server is NodeJS, so it would involve the V8 interpreter. However, I’m not ruling out a switch to other technologies like Vert.x, which I believe uses the Rhino interpreter.
THE QUESTION:
When does JavaScript access objects in memory?
More specifically, let’s say I have an object like so.
var data = {
a1 : new Float64Array(123456),
a2 : new Float64Array(123456),
…
a9001: new Float64Array(123456)
};
And now let’s say I pass it to this function like so.
var update = function(obj) {
for(var property in obj) {
if(obj.hasOwnProperty(property)) {
obj[property][0]++;
}
}
};
update(data);
At what point are the Float64 arrays accessed? Does it access it the moment I pass data into update, attempting to load all 9001 arrays into the memory cache and page faulting like crazy? Does it wait to load the arrays until the hasOwnProperty? Or obj[property]? Or obj[property][0]?
WHY I ASK:
I’m trying to follow the data oriented design principles of keeping stuff in contiguous blocks of memory. Depending on how JavaScript works with memory, I will have to change the interface and structure of the engine.
For example, if all the arrays in data are accessed the moment I pass it into update, then I have to make special data objects with as few arrays as possible to reduce page faulting. If however the arrays are only accessed at say obj[property], then I can pass a large data object with more arrays without any performance penalties, which simplifies a lot of things.
A big reason why I’m not sure of the answer is because JavaScript objects aren’t objects like in other languages. From some random reading here or there, I’ve heard stuff like JavaScript objects have their own internal class. I’ve also heard of things like JavaScript objects being hash tables so you incur a lookup time with every property that you access.
Then I’ve heard that the interpreters treat objects differently based on how large the object is; smaller ones are treated one way and larger ones another. So jsperf stuff may not be an accurate measure.
FURTHER:
Taking the example further, there’s the question of how JavaScript handles nested objects. For example:
var properties = {
a1 : {
a1 : {
…
a1 : {
}
}
},
a2 : {
a2 : {
…
a2 : {
}
}
},
…
a9001 : {
a9001 : {
…
a9001 : {
}
}
}
};
var doSomething = function() {
};
doSomething(properties);
If passing in properties to doSomething causes every sub object and their sub objects to get accessed, then that’s a performance hit. If however it just passes a reference to the properties object and only accesses the sub objects when the code calls them, then it’s not bad at all.
If I had access to Vectors, I’d make an entity system framework in a heartbeat and this wouldn’t really be a problem. If I had access to pointers, which I believe only accesses the object when the code converts the pointer, then I could try other things. But only having typed arrays at my disposal limits my options, so I end up agonizing over questions like this.
Thanks for the any insight you can provide. I really appreciate it.
I have been experimenting with PostgreSQL and PL/V8, which embeds the V8 JavaScript engine into PostgreSQL. Using this, I can query into JSON data inside the database, which is rather awesome.
The basic approach is as follows:
CREATE or REPLACE FUNCTION
json_string(data json, key text) RETURNS TEXT AS $$
var data = JSON.parse(data);
return data[key];
$$ LANGUAGE plv8 IMMUTABLE STRICT;
SELECT id, data FROM things WHERE json_string(data,'name') LIKE 'Z%';
Using, V8 I can parse JSON data into JS, then return a field and I can use this as a regular pg query expression.
BUT
On large datasets, performance can be an issue, as for every row I need to parse the data.
The parser is fast, but it is definitely the slowest part of the process and it has to happen every time.
What I am trying to work out (to finally get to an actual question) is if there is a way to cache or pre-process the JSON ... even storing a binary representation of the JSON in the table that could be used by V8 automatically as a JS object might be a win. I've had a look at using an alternative format such as messagepack or protobuf, but I don't think they will necessarily be as fast as the native JSON parser in any case.
THOUGHT
PG has blobs and binary types, so the data could be stored in binary, then we just need a way to marshall this into V8.
Postgres supports indexes on arbitrary function calls. The following index should do the trick :
CREATE INDEX json_idx ON things (json_string(field,'name'));
The short version appears to be that with Pg's new json support, so far there's no way to store json directly in any form other than serialised json text. (This looks likely to change in 9.4)
You seem to want to store a pre-parsed form that's a serialised representation of how v8 represents the json in memory, and that's not currently supported. It's not even clear that v8 offers any kind of binary serialisation/deserialisation of json structures. If it doesn't do so natively, code would need to be added to Pg to produce such a representation and to turn it back into v8 json data structures.
It also wouldn't necessarily be faster:
If json was stored in a v8 specific binary form, queries that returned the normal json representation to clients would have to format it each time it was returned, incurring CPU cost.
A binary serialised version of json isn't the same thing as storing the v8 json data structures directly in memory. You can't write a data structure that involves any kind of graph of pointers out to disk directly, it has to be serialised. This serialisation and deserialisation has a cost, and it might not even be much faster than parsing the json text representation. It depends a lot on how v8 represents JavaScript objects in memory.
The binary serialised representation could easily be bigger, since most json is text and small numbers, where you don't gain any compactness from a binary representation. Since storage size directly affects the speed of table scans, value fetches from TOAST, decompression time required for TOASTed values, index sizes, etc, you could easily land up with slower queries and bigger tables.
I'd be interested to see whether an optimisation like what you describe is possible, and whether it'd turn out to be an optimisation at all.
To gain the benefits you want when doing table scans, I guess what you really need is a format that can be traversed without having to parse it and turn it into what's probably a malloc()'d graph of javascript objects. You want to be able to give a path expression for a field and grab it out directly from the serialised form where it's been read into a Pg read buffer or into shared_buffers. That'd be a really interesting design project, but I'd be surprised if anything like it existed in v8.
What you really need to do is research how the existing json-based object databases do fast searches for arbitrary json paths and what their on-disk representations are, then report back on pgsql-hackers. Maybe there's something to be learned from people who've already solved this - presuming, of course, that they have.
In the mean time, what I'd want to focus on is what the other answers here are doing: Working around the slow point and finding other ways to do what you need. You could also look into helping to optimise the json parser, but depending on whether the v8 one or some other one is in use that might already be far past the point of diminishing returns.
I guess this is one of the areas where there's a trade-off between speed and flexible data representation.
perhaps instead of making the retrieval phase responsible for parsing the data, creating a new data type which could pre-disseminate json data on input might be a better approach?
http://www.postgresql.org/docs/9.2/static/sql-createtype.html
I don't have any experience with this, but it got me curious so I did some reading.
JSON only
What about something like the following (untested, BTW)? It doesn't address your question about storing a binary representation of the JSON, it's an attempt to parse all of the JSON at once for all of the rows you're checking, in the hope that it will yield higher performance by reducing the processing overhead of doing it individually for each row. If it succeeds at that, I'm thinking it may result in higher memory consumption though.
The CREATE TYPE...set_of_records() stuff is adapted from the example on the wiki where it mentions that "You can also return records with an array of JSON." I guess it really means "an array of objects".
Is the id value from the DB record embedded in the JSON?
Version #1
CREATE TYPE rec AS (id integer, data text, name text);
CREATE FUNCTION set_of_records() RETURNS SETOF rec AS
$$
var records = plv8.execute( "SELECT id, data FROM things" );
var data = [];
// Use for loop instead if better performance
records.forEach( function ( rec, i, arr ) {
data.push( rec.data );
} );
data = "[" + data.join( "," ) + "]";
data = JSON.parse( data );
records.forEach( function ( rec, i, arr ) {
rec.name = data[ i ].name;
} );
return records;
$$
LANGUAGE plv8;
SELECT id, data FROM set_of_records() WHERE name LIKE 'Z%'
Version #2
This one gets Postgres to aggregate / concatenate some values to cut down on the processing done in JS.
CREATE TYPE rec AS (id integer, data text, name text);
CREATE FUNCTION set_of_records() RETURNS SETOF rec AS
$$
var cols = plv8.execute(
"SELECT" +
"array_agg( id ORDER BY id ) AS id," +
"string_agg( data, ',' ORDER BY id ) AS data" +
"FROM things"
)[0];
cols.data = JSON.parse( "[" + cols.data + "]" );
var records = cols.id;
// Use for loop if better performance
records.forEach( function ( id, i, arr ) {
arr[ i ] = {
id : id,
data : cols.data[ i ],
name : cols.data[ i ].name
};
} );
return records;
$$
LANGUAGE plv8;
SELECT id, data FROM set_of_records() WHERE name LIKE 'Z%'
hstore
How would the performance of this compare?: duplicate the JSON data into an hstore column at write time (or if the performance somehow managed to be good enough, convert the JSON to hstore at select time) and use the hstore in your WHERE, e.g.:
SELECT id, data FROM things WHERE hstore_data -> name LIKE 'Z%'
I heard about hstore from here: http://lwn.net/Articles/497069/
The article mentions some other interesting things:
PL/v8 lets you...create expression indexes on specific JSON elements and save them, giving you stored search indexes much like CouchDB's "views".
It doesn't elaborate on that and I don't really know what it's referring to.
There's a comment attributed as "jberkus" that says:
We discussed having a binary JSON type as well, but without a protocol to transmit binary values (BSON isn't at all a standard, and has some serious glitches), there didn't seem to be any point.
If you're interested in working on binary JSON support for PostgreSQL, we'd be interested in having you help out ...
I don't know if it would be useful here, but I came across this: pg-to-json-serializer. It mentions functionality for:
parsing JSON strings and filling postgreSQL records/arrays from it
I don't know if it would offer any performance benefit over what you've been doing so far though, and I don't really even understand their examples.
Just thought it was worth mentioning.
I have a list with 10.000 entrys.
for example
myList = {};
myList[hashjh5j4h5j4h5j4]
myList[hashs54s5d4s5d4sd]
myList[hash5as465d45ad4d]
....
I dont use an array (0,1,2,3) because i can check
very fast -> if this hash exist or not.
if(typeof myObject[hashjh5j4h5j4h5j4] == 'undefined')
{
alert('it is new');
}
else
{
alert('old stuff');
}
But i am not sure, is this a good solution?
Is it maybe a problem to handle an object with 10.000 entries?
EDIT:
I try to build an rss feed reader which shows only new feeds. So i calculate an hash from the link (every news has an uniqe link) and store it in the object (mongoDB). BTW: 10.000 entrys is not the normal case (but it is possible)
My advice:
Use as small of a hash as possible for the task at hand. If you are dealing with hundreds of hashable strings, compared to billions, then your hash length can be relatively small.
Store the hash as an integer, not a string, to avoid making it take less room than needed.
Don't store as objects, just store them in a simple binary tree log2(keySize) deep.
Further thoughts:
Can you come at this with a hybrid approach? Use hashes for recent feeds less than a month old, and don't bother showing items more than a month old. Store the hash and date together, and clean out old hashes each day?
You can use the in operator:
if ('hashjh5j4h5j4h5j4' in myList) { .. }
However, this will also return true for members that are in the objects prototype chain:
Object.prototype.foo = function () {};
if ("foo" in myList) { /* will be true */ };
To fix this, you could use hasOwnProperty instead:
if (myList.hasOwnProperty('hashjh5j4h5j4h5j4')) { .. }
Whilst you yourself may not have added methods to Object.prototype, you cannot guarantee that other 3rd party libraries you use haven't; incidentally, extending Object.prototype is frowned upon, so you shouldn't really do it. Why?; because you shouldn't modify things you don't own.
10.000 is quite a lot. You may consider storing the hashes in a database and query it using ajax. It maybe takes a bit longer to query one hash but your page loads much faster.
It is not a problem in modern browser on modern computers in any way.
10k entries that take up 50 bytes each would still take up less than 500KB ram.
As long as the js is served gzipped then bandwidth is no problem - but do try to serve the data as late as possible so they don't block perceived pageload performance.
All in all, unless you wish to cater to cellphones then your solution is fine.