Is there a way to save space on keys of JS objects? - javascript

I was just thinking about the following situation:
Let's say we have a line definition like shown below. Here start and end are both points.
let line = {
start: {x:0, y:0},
end: {x:0,y:0},
orientation: 'vertical'
}
Now imagine we have a very large array of lines, how do we save space? I know you can replace the orientation value 'vertical' with an Enum. But can you save space on the key names without reducing readability? E.g., you can replace orientation with o, but now it is no longer clear what the key stands for.
Let me know!

If you mean memory usage, Javascript Engines are very clever these days, things like using lookup's for keys internally, string de-duplication etc, make having short or long key names will have very little effect.
For example, using your data structure above I pushed 1 million records into an array, one with very long key names, and one with 1 characters key names. In both cases the memory usage per items worked out about 147 bytes. Also even using a const for the vertical had little effect.
But as you can see 147 bytes does seem high, so if you wanted to reduce this, you would need to use TypedArrays, unfortunately these can be a little bit more tricky to reason with. But if memory was a concern that might be worth the effort.
If you did use TypedArray you could use getters and setters to make this much easier. And doing this you could maybe use 33 bytes per record, 4 doubles & 1 byte.
But before doing anything, I would first make sure doing this sort of optimisation is even necessary. Pre-Optimising Javascript is a fools errand.
ps. I did the memory tests using NodeJs, that basically uses Chrome's Javascript engine.
Also if you want to play with how memory is effected, and you have NodeJs installed, here is some example code to get you started.
const oused = process.memoryUsage().heapUsed;
const values = [];
for (let l = 0; l < 1000000; l += 1) {
values.push({
start: { x: 0, y: 0 },
end: { x: 0, y: 0 },
orientation: "vertical",
});
}
console.log(process.memoryUsage().heapUsed - oused);

Here's a representation of a line that strikes a balance between usability and memory efficiency. It uses no string identifiers because the positions of the data are encoded into the type. It also uses standard arrays because you didn't specify the size of the cartesian coordinate plane (so I can't suggest a bounded container like a TypedArray):
TS Playground
type Point = [x: number, y: number];
type Line = [a: Point, b: Point];
const line: Line = [[0, 0], [0, 2]];
// More memory efficient: 2 fewer arrays per line:
type CompactLine = [...a: Point, ...b: Point];
const compactLine: CompactLine = [0, 0, 0, 2];

I suggest you create a map for all the shortened version of the keys and use the map in parallel with your object. The map can have the object keys as the map keys and the full form as the values
const map = new Map()
const object = {
o: 4
} // o means orientation
map.set("o", "orientation")
const mapKeys = Array.from(map.keys())
const redeableObj = {}
mapKeys.forEach(key => {
const redeableKey = map.get(key)
redeableObj[redeableKey] = object[key]
})
console.log(object)
console.log(redeableObj)

Related

How do I convert from a byte array to different number types starting at arbitrary byte starts in javascript

I'm working with a file buffer and started processing it as a byte array:
//data is a buffer from reading my file
let view = new Uint8Array(data);
I want some function that will basically act like the DataView class, but that uses byte offsets, rather than offsets based on the converted type.
For example, let's say by byte array was [0 1 2 3 4 5 6] (sorry, not sure how to initialize this in code)
I'd like code something like this:
let int16_value = view.getUint16(1);
With the result being 513, or 1 + 256*2
I thought this might work, but the problem is it starts at the 2nd uint16 value, not at the 2nd byte
let view2 = new DataView(data);
let temp = view2.getUint16(1);
I'd like solutions for all different conversions (uint16,uint32,int16,single,double,etc.)
My impression is that this is not possible without just manually writing the conversions but I'm pretty new to javascript and wondering if there is some clever array slicing/copying solution where you could get something like this:
let temp = typecast(view,1,'uint16');
Where typecast takes in the byte array, the start (in bytes), and the data type to extract.
Bonus points if there is a nice solution that adds on number of elements to read with the default being 1 (as in above)
the problem is it starts at the 2nd uint16 value.
No, the offset is in bytes.
You can declare an offset (also in bytes) at the construction of your DataView
new DataView(buffer, offset);
but calling getUint16(0) on this one will do the same as calling getUint16(1) on yours.
But from your expected result (513) it seems you expected LittleEndian to be the default. The default is actually BigEndian, and you need to pass true to the second argument of DataView.getUint16(offset, littleEndian).
// They all return the same value
const data = new Uint8Array([0, 1, 2, 3, 4, 5, 6]).buffer;
const fullView = new DataView(data);
console.log("offset # get", fullView.getUint16(1, true));
const offsetView = new DataView(data, 1);
console.log("offset # DataView", offsetView.getUint16(0, true));
// offset is made at the buffer level
// (just to show it's the same, don't actually do that)
const noOffsetData = new Uint8Array([1, 2, 3, 4, 5, 6]).buffer;
const noOffsetView = new DataView(noOffsetData);
console.log("offset # buffer", noOffsetView.getUint16(0, true));

Efficient way to compute the median of an array of canvas in JavaScript

I have an array of N HTMLCanvasElements that come from N frames of a video, and I want to compute the "median canvas" in the sense that every component (r, g, b, opacity) of every pixel is the median of the corresponding component in all the canvases.
The video frames are 1280x720, so that the pixels data for every canvas (obtained with canvas.getContext('2d').getImageData(0, 0, canvas.width, canvas.height).data) is a Uint8ClampedArray of length 3.686.400.
The naive way to compute the median is to:
prepare a result Uint8ClampedArray of length 3.686.400
prepare a temporary Uint8ClampedArray of length N
loop from 0 to 3.686.399
a) loop over the N canvases to fill the array
b) compute the median of the array
c) store the median to the result array
But it's very slow, even for 4 canvases.
Is there an efficient way (or existing code) to do that? My question is very similar to Find median of list of images, but I need to to this in JavaScript, not Python.
Note: for b), I use d3.median() which doesn't work on typed arrays, as far as I understand, so that it implies converting to numbers, then converting back to Uint8Clamped.
Note 2: I don't know much of GLSL shaders, but maybe using the GPU would be a way to get faster results. It would require to pass data from the CPU to the GPU though, which takes time if done repeatedly.
Note 3: the naive solution is there: https://observablehq.com/#severo/compute-the-approximate-median-image-of-a-video
You wrote
I use d3.median() which doesn't work on typed arrays…
Although that is not exactly true it leads into the right direction. Internally d3.median() uses the d3.quantile() method which starts off like this:
export default function quantile(values, p, valueof) {
values = Float64Array.from(numbers(values, valueof));
As you can see, this in fact does make use of typed arrays, it is just not your Uint8ClampedArray but a Float64Array instead. Because floating-point arithmetic is much more computation-intensive than its integer counterpart (including the conversion itself) this has a dramatic effect on the performance of your code. Doing this some 3 million times in a tight loop kills the efficiency of your solution.
Since you are retrieving all your pixel values from a Uint8ClampedArray you can be sure that you are always dealing with integers, though. That said, it is fairly easy to build a custom function median(values) derived from d3.median() and d3.quantile():
function median(values) {
// No conversion to floating point values needed.
if (!(n = values.length)) return;
if (n < 2) return d3.min(values);
var n,
i = (n - 1) * 0.5,
i0 = Math.floor(i),
value0 = d3.max(d3.quickselect(values, i0).subarray(0, i0 + 1)),
value1 = d3.min(values.subarray(i0 + 1));
return value0 + (value1 - value0) * (i - i0);
}
On top of getting rid of the problematic conversion on the first line this implementation additionally applies some more micro-optimizations because in your case you are always looking for the 2-quantile (i.e. the median). That might not seem much at first, but doing this multiple million times in a loop it does make a difference.
With minimal changes to your own code you can call it like this:
// medianImageData.data[i] = d3.median(arr); Instead of this use line below.
medianImageData.data[i] = median(arr);
Have a look at my working fork of your Observable notebook.

moving from for() to map() - can't get my head around it

Wondering if someone can help - I'm wanting to use Array.map and Array.filter but i'm so stuck in my for loop thinking that despite reading tutorials etc i can't seem to get my head around this.
In this code, I have an Array of objects, I want to:
compare each item in the array with the other items, and ensure that obj[i] != obj[i]
perform operations on current item: check if item.target is null, compare distance between item and item+1, and if item & item+1 distance is smaller than item & item.target then i want to replace item.target with item.
code:
for (var i = 0; i < 111; i++) {
var itm = {x:Math.random()*w, y:Math.random()*h, tgt:null};
dotArr.push(itm);
}
function findTarget(itemA, itemB){
var x1 = itemA.x;
var y1 = itemA.y;
var x2 = itemB.x;
var y2 = itemB.y;
var distance = Math.sqrt( (x2-=x1)*x2 + (y2-=y1)*y2 );
return distance;
}
for (var i = 0; i < dotArr.length; i++) {
let itm = dotArr[i];
for (var j = 0; j < dotArr.length; j++) {
if(itm != dotArr[j]){
let itm2 = this.dotArr[j];
if(itm.tgt==null){
itm.tgt = itm2;
}else{
let newDist = findTarget(itm, itm2);
let curDist = findTarget(itm, itm.tgt);
if(newDist<curDist){
itm.tgt = itm2;
}
}
}
}
}
All the 'multiply each value by 2' examples in the tutorials i read make sense but can't extrapolate that into an approach that i use all the time.
Expected results: i have a bunch of particles, they are looping through a requestAnimationFrame() loop, checking the distance each loop. Each particle finds the closest particle and sets it to 'tgt' (and then moves toward it in other code), but it updates each loop.
Summary
const distance = (a, b) =>
Math.sqrt(Math.pow(b.x - a.x, 2) + Math.pow(b.y - a.y, 2))
const findClosest = (test, particles) => particles.reduce(
({val, dist}, particle) => {
const d = distance(test, particle)
return d < dist && d != 0 ? {val: particle, dist: d} : {val, dist}
},
{val: null, dist: Infinity}
).val
const addTargets = particles => particles.map(particle => {
particle.tgt = findClosest(particle, particles)
return particle
})
(This is hard to do in a snippet because of the cyclic nature of your data structure. JSON stringification doesn't work well with cycles.)
Change style for the right reason
You say you want to change from for-loops to map, filter, et. al., but you don't say why. Make sure you're doing this for appropriate reasons. I am a strong advocate of functional programming, and I generally push junior developers I'm responsible for to make such changes. But I explain the reasons.
Here is the sort of explanation I make:
"When you're doing a loop, you're doing it for a reason. If you are looking to transform a list of values one-by-one into another list of values, then there is a built-in called map which makes your code clearer and simpler. When you're trying to check for those which should be kept, then you have filter, which makes your code clearer and simpler. When you want to find the first item in a list with a certain property, you have find, which, again, is clearer and simpler. And if you are trying to combine the elements until you're reduced them to a single value, you can use reduce, which, surprise, surprise, is cleaner and simpler.
"The reason to use these is to better express the intent of your code. Your intent is pretty well never going to be 'to continually increment the value of some counter starting with some value and ending when some condition is met, performing some routine on each iteration.' If you can use tools that better express your goals, then your code is easier to understand. So look for where map, filter, find, and reduce make sense in your code.
"Not every for-loop fits one of these patterns, but a large subset of them will. Replacing those that do fit will make for more understandable, and therefore more maintainable, code."
I will go on from there to explain the advantages of never worrying about fencepost errors and how some of these functions can work with more generic types, making it easier to reuse such code. But this is the basic gist I use with my teams.
You need to decide why you're changing, and if it makes sense in your case. There is a real possibility, given your requirements, that it doesn't.
The functions map, find, and filter work only on individual items in your list. reduce works on one item and the currently accumulated value. It looks as though your requirement is to word pair-wise across all the values. That might mean that none of these functions is a good fit.
Or perhaps they do. Read on for how I would solve this.
Names are important
You include a function called findTarget. I would assume that such a function somehow or another finds a target. In fact, all it does it to calculate the distance between two items.
Imagine coming to someone else's code and reading through the code that uses findTarget. Until you read that function, you will have no idea that it's simply calculating a distance. The code will seem strange. It will be much harder to understand than if you just named it distance.
Also, using item or the shortened version itm does not tell the reader anything about what these are. (Update: a change to the post points out that these are 'particles', so I will use that rather than itm in the code.)
Avoid trickiness
That findTarget/distance function does something strange, and somewhat difficult to follow. It modifies computation variables in the middle of the computation: (x2-=x1)*x2 and (y2-=y1)*y2. While I can see that this works out the same, it's easy to write a very clear distance function without this trickiness:
const distance = (a, b) =>
Math.sqrt((b.x - a.x) * (b.x - a.x) + (b.y - a.y) * (b.y - a.y))
There are many variants of this that are just as clear.
const distance = (a, b) =>
Math.sqrt(Math.pow(b.x - a.x, 2) + Math.pow(b.y - a.y, 2))
And one day we'll be able to do
const distance = (a, b) => Math.sqrt((b.x - a.x) ** 2 + (b.y - a.y) ** 2)
Any of these would make for much clearer code. You could also use intermediate variables such as dx/dy or deltaX/deltaY if that made it clearer to you.
Look carefully at your requirements
It took me far too long looking at your code to determine what precisely you were trying to do.
If you can break apart the pieces you need into named functions, it's often significantly easier to write, and it's generally much easier for someone else to understand (or even for yourself a few weeks later.)
So, if I understand the problem correctly now, you have a list of positioned objects, and for each one of them you want to update them with a target, that being the object closest to them. That sounds very much like map.
Given that, I think the code should look something like:
const addTargets = particles => particles.map(item => ({
x: item.x,
y: item.y,
tgt: findClosest(item, particles)
}))
Now I don't know how findClosest will work yet, but I expect that this matches the goal if only I could write that.
Note that this version takes seriously my belief in the functional programming concept of immutability. But it won't quite do what you want, because a particle's target will be the one from the old list and not one from its own list. I personally might look at altering the data structure to fix this. But instead, let's ease that restriction and rather than returning new items, we can update items in place.
const addTargets = particles => particles.map(particle => {
particle.tgt = findClosest(particle, particles)
return particle
})
So notice what we're doing here: we're turning a list of items without targets (or with null ones) into a list of items with them. But we break this into two parts: one converts the elements without the targets to ones with them; the second finds the appropriate target for a given element. This more clearly captures the requirements.
We still have to figure out how to find the appropriate target for an element. In the abstract, what we're doing is to take a list of elements and turning it into a single one. That's reduce. (This is not a find operation, since it has to check everything in the list.)
Let's write that, then:
const findClosest = (test, particles) => particles.reduce(
({val, dist}, particle) => {
const d = distance(test, particle)
return d < dist && d != 0 ? {val: particle, dist: d} : {val, dist}
},
{val: null, dist: Infinity}
).val
We use the distance for dual purposes here. First, of course, we're looking at how far apart two particles are. But second, we assume that another particle in the same exact location is the same particle. If that is not accurate, you'll have to alter this a bit.
At each iteration, we have a new object with val and dist properties. And this always represents the closest particle we've found so far and its distance from our current particle. At the end, we just return the val property. (The reaon for Infinity is that every particle will be closer than that, so we don't need specific logic to test the first one.)
Conclusion
In the end we were able to use map and reduce. Note that in this example we have two reusable helper functions, but each is used just once. If you don't need to reuse them, you could fold them into the functions that call them. But I would not recommend it. This code is fairly readable. Folded in, these would be less expressive.
dotArr.map(itemI => {
const closestTarget = dotArr.reduce((currentMax, itemJ) => {
if(currentMax === null){
const targetDistance = findTarget(itemI, itemJ)}
if(targetDistance !== null){
return {item:itemJ, distance:targetDistance};
}
return null;
}
const newDistance = findTarget(itemI, itemJ);
if((currentMax.distance - newDistance) < 0){ //No need to check if it is the same item, because distance is 0
return {item:itemJ, distance: newDistance};
}
return sum;
}, null);
itemI.tgt = closestTarget.item;
return itemI;
}
After constructing this example, i found that you are using a very complex example to figure out how map works.
Array.map is typically used for one value, so we can use it for [i], then we need to iterate all the other values in the array using [j], but we can't do this with map, because we only care about the closest [j], so we can use Array.reduce which also is an accumulator like Array.map, but the end result is whatever you want it to be, while the end result of Array.map always is an array of same length.
What my reduce function does is that it iterates through the entire list, similar to [j]. I initialize the currentMax as null, so when j==0 then currentMax===null, Then i figure out what state [j] is compared to [i]. The return statements is what currentMax will be equal to in [j+1]
When i finally found the closest target, i can just add it so itemI.tgt and i have to return it, so that the new map knows what the item looks like at the current index.
Without looking at Array.map this is how i imagine it is implemented
function myMap(inputArray, callback){
const newArray = [];
for(let i=0;i<inputArray.length;i++){
newArray.push(callback(inputArray[i], i, inputArray));
}
return newArray;
}
So this is why you always need to write return
I think in this instance you want to use reduce and NOT map.
reduce can allow you to "Reduce" an array of items to a single item.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce
example
let largestThing = arrayOfThings.reduce(function (largest, nextItem) {
if (largest == null) {
return nextItem;
}
if (largest.prop > nextItem.prop){
return largest;
}
return nextItem;
}, null);
null as the parameter of the callback is the starting "largest" in the callback.

More efficient way to copy repeating sequence into TypedArray?

I have a source Float32Array that I create a secondary Float32Array from. I have a sequence of values model that I want to copy as a repeating sequence into the secondary Float32Array. I am currently doing this operation using a reverse while loop.
sequence = [1, 0, 0, 0, 0, 1, 0, 0, 2, 0, 1, 0];
n = 3179520; //divisible by sequence length
modelBuffs = new Float32Array(n);
var v = modelBuffs.length;
while(v-=12){
modelBuffs[v-12] = sequence[0];
modelBuffs[v-11] = sequence[1];
modelBuffs[v-10] = sequence[2];
modelBuffs[v-9] = sequence[3];
// YTransform
modelBuffs[v-8] = sequence[4];
modelBuffs[v-7] = sequence[5];
modelBuffs[v-6] = sequence[6];
modelBuffs[v-5] = sequence[7];
// ZTransform
modelBuffs[v-4] = sequence[8];
modelBuffs[v-3] = sequence[9];
modelBuffs[v-2] = sequence[10];
modelBuffs[v-1] = sequence[11];
}
Unfortunately, n can be unknown. I may have to do a significant refactor if there is no alternative solution. I am hoping that I can set the sequence once and there is a copy in place/ repeating fill / bitwise operation to repeat the initial byte sequence.
Edit simplified the example input
A fast method to fill an array with a repeated sequence, is to double up length of buffer for each iteration using the copyWithin() method of the typed array. You could use set() as well by creating a different view for the same underlying ArrayBuffer, but it's simpler to use the former for this purpose.
Using for example 1234 as source, the first initial iteration fill will be 1:1, or 4 indices in this case:
1234
From there we will use destination buffer as source for the remaining fill, so second iteration fills 8 indices:
12341234
Third iteration fills 16 indices:
1234123412341234
Fourth iteration fills 32 indices:
12341234123412341234123412341234
and so forth.
If the last segment length doesn't match power of 2 you can simple do a diff between last fill and the length remaining in the buffer and use that for the last iteration.
var
srcBuffer = new Uint8Array([1,2,3,4]), // any view type will do
dstBuffer = new Uint8Array(1<<14), // 16 kb
len = dstBuffer.length, // important: use indices length, not byte-length
sLen = srcBuffer.length,
p = sLen; // set initial position = source sequence length
var startTime = performance.now();
// step 1: copy source sequence to the beginning of dest. array
// todo: dest. buffer might be smaller than source. Check for this here.
dstBuffer.set(srcBuffer);
// step 2: copy existing data doubling segment length per iteration
while(p < len) {
if (p + sLen > len) sLen = len - p; // if not power of 2, truncate last segment
dstBuffer.copyWithin(p, 0, sLen); // internal copy
p += sLen; // add current length to offset
sLen <<= 1; // double length for next segment
}
var time = performance.now() - startTime;
console.log("done", time + "ms");
console.log(dstBuffer);
If the array is very long it will obviously take some time regardless. In those cases you could consider using a Web Worker with the new SharedArrayBuffer so that you can do the copying in a different process and not have to copy or transfer the data to and from. The gain from this is merely that the main thread is not blocked with little overhead dealing with the buffer as the internals of copyWithin() is relative optimal for its purpose already. The cons are the async aspect combined with the overhead from the event system (e.g.: it depends if this is useful).
A different approach is to use WebAssembly where you write the buffer fill code in C/C++, compile and expose methods to take source and destination buffers, then call that from JavaScript. I don't have any example for this case.
In both of these latter cases you will run into compatibility issues with (not that much) older browsers.

Performance between multidimensional array or arrays of objects in JavaScript

I have to load a good chunk of data form my API and I have the choice of the format that I get the data. My question is about performance and to choose the fastest format to load on a query and being able to read it fast as well in JavaScript.
I can have a two dimensional array :
[0][0] = true;
[0][1] = false;
[1][2] = true;
[...]
etc etc..
Or I can have an array of object :
[
{ x: 0, y: 0, data: true},
{ x: 0, y: 1, data: false},
{ x: 1, y: 2, data: true},
[...]
etc etc..
]
I couldn't find any benchmark for this comparison for a GET request, with a huge amount of data.. If there is anything anywhere, I would love to read it !
The second part of the question is to read the data. I will have a loop that will need to get the value for each coordinate.
I assume looking up directly for the coordinate in a 2 dimensional array would be faster than looking up into each object at every loop. Or maybe I am wrong ?
Which one of the two format would be the fastest to load and read ?
Thanks.
For the first part of your question regarding the GET request, I imagine the array would be slightly quicker to load, but depending on your data, it could very well be negligible. I'm basing that on the fact that, if you take out the white space, the example data you have for each member of the array is 12 bytes, while the example data for the similar object is 20 bytes. If that were true for your actual data, theoretically there would be only 3/5 of the data to transfer, but unless you're getting a lot of data it's probably not going to make a noticeable difference.
To answer the second part of your question: the performance of any code is going to depend significantly on the details of your specific use case. For most situations, I think the most important point is:
Objects are significantly more readable and user-friendly
That said, when performance/speed is an issue and/or high priority, which it sounds like could be the case for you, there are definitely things to consider. While it relates to writing data instead of reading it, I found this good comparison of the performance of arrays vs objects that brought up some interesting points. In running the tests above multiple times using Chrome 45.0.2454.101 32-bit on Windows 7 64-bit, I found these points to generally be true:
Arrays will always be close to the fastest, if not the fastest
If the length of the object is known/can be hard coded, it's possible to make their performance close to and sometimes better than arrays
In the test linked above, this code using objects ran at 225 ops/sec in one of my tests:
var sum = 0;
for (var x in obj) {
sum += obj[x].payload;
}
Compared to this code using arrays that ran at 13,620 ops/sec in the same test:
var sum = 0;
for (var x = 0; x < arr.length; ++x) {
sum += arr[x].payload
}
Important to note, however, is that this code using objects with a hard coded length ran at 14,698 ops/sec in the same test, beating each of the above:
var sum = 0;
for (var x = 0; x < 10000; ++x) {
sum += obj[x].payload
}
All of that said, it probably depends on your specific use case what will have the best performance, but hopefully this gives you some things to consider.

Categories

Resources