Lets say I have a C API as follows:
void get_result_buffer(context* ctx, void** result, size_t* result_size);
Where context is some arbitrary opaque context type holding state. The intended way to call this is
context* ctx = ...;
do_something_with_context(ctx, ...);
void* result_buffer = 0;
size_t result_buffer_size = 0;
get_result_buffer(ctx, &result_buffer, &result_buffer_size);
/* Now result_buffer and result_buffer_size are meaningful and populated with the results of having called `do_something_with_context`. */
The result_buffer is owned by the context object, so the caller doesn't need to free it. Now I'd like to be able to call get_result_buffer from Emscripten. I can easily enough set up cwrap for this, it looks something like:
wrap_get_result_buffer = something.cwrap(
'get_result_buffer',
null,
['number', 'number', 'number']
)
But I'm unclear how I can set things up so that the out parameters "work" in JS. Ideally, at the end, I'd have something that looks like a byte buffer containing a copy of the data pointed to by the result out parameter, with a length as described by the result_size out parameter.
It seems that the values that I pass in need to be allocated somehow, and then I would pass the resulting allocation handle in as the number type parameters, but I have no idea how to do that in the JS/Emscripten layer. Similarly, after the call, I'd expect that those values have now been updated by the transpiled C code, but I'm unclear on how to extract the now populated data into some sort of JS byte array.
Any guidance on how to do this or pointers to example code?
OK. I figured this out. For future emscripteners, you want to do something like the following.
var out_data_ptr = Module._malloc(8)
var out_data_array = new Uint32Array(Module.HEAPU32.buffer, out_data_ptr, 2)
wrap_get_result_buffer(context, out_data_ptr, out_data_ptr + 4)
var response_uint8_array = new Uint8Array(Module.HEAPU8.buffer, out_data_array[0], out_data_array[1])
Module._free(out_data_ptr)
The theory of operations here is that we create a two element array that will store the 'slots' to be filled in by calling get_result_buffer, and then construct a view over that exposing it as two number compatible elements. We then pass those in to our get_result_buffer function as lifted above with cwrap. After that, the heap memory that the context refers to is reachable from those slots, which can then be used to construct a Uint8Array that provides JS level access to the bytes in the result.
Related
I would compare what I am doing to what JavaScript runtimes already do, yet I'm doing it in JavaScript and Wasm. JavaScript implementations store JavaScript objects and values in actual computer heap memory, yet performing operations such as attempting to read/write out of bounds memory don't actually modify the memory (ex: arrays perform a no-op and return undefined respectively).
I'll give an example of my specific situation:
Let's say that I have an array buffer of 1000 bytes, we'll name the variable memory.
I want to split apart the buffer specifically into Int32Arrays of size 4. Each partition from the ArrayBuffer must do two things:
a) Refer to the original buffer (so that, when the original data is manipulated, the partition will update its values automaticially)
b) Not expose the original buffer (as the partition could then be used to corrupt the other partitions)
I have a function that determines which section is available for usage, we'll call it findPartition. It returns an integer acting as a pointer to a set of available bytes. (like C's malloc)
Each partition is expected to always remain the same type, that is, they will always be Int32Arrays if they start as an Int32Array, and their size will always be constant.
The script operating on the partition may both, write to, and read from, its partitioned array.
Originally, I was thinking that I could just call the Int32Array constructor on my array buffer, simply like so:
const createPartition = () => new Int32Array( memory, findPartition(), 4 );
The problem is that the buffer is exposed, so I could either delete the buffer property.
But... the buffer property is readonly, so delete fails when used on the array.
I then thought that I could make a class to do this:
class Partition {
#source = new Int32Array( memory, findPartition(), 4 );
get 0() { return this.#source[0]; }
set 0(x) { this.#source[0] = x; }
get 1() { return this.#source[1]; }
set 1(x) { this.#source[1] = x; }
get 2() { return this.#source[2]; }
set 2(x) { this.#source[2] = x; }
get 3() { return this.#source[3]; }
set 3(x) { this.#source[3] = x; }
get length() { return 4; }
};
Well, that works, but it's much more verbose, thus harder to maintain later, and, as the partitions are not given direct access to the indexes' values, because they have to go through getters and setters, I feel that performance could be lost.
Ideally, the Int32Array.prototype is also on the object, so I would have to wrap everything, which would be annoying and unmaintainable. If the spec updates the methods of the prototype, then I would have to update the wrappers too.
Does anyone have a better way to segment the array buffer, while maintaining safety between the segments?
Simplest way is to extend chosen typed array like that:
// Seal and freeze hidden Object (TypedArray.prototype) that has methods
// that can leak original buffer if attacked using defineProperty tricks
// Since we can't directly access hidden Object on which `subarray`
// and many other methods defined we use this workaround
// Paranoid: More checks required to make sure that: `subarray` method; 'byteOffset',
// 'byteLength', 'buffer' getters; are not modified beforehand
Object.seal(Int8Array.__proto__.prototype);
Object.freeze(Int8Array.__proto__.prototype);
class customUint32Array extends Uint32Array {
get buffer(){
// copy! viewed array buffer segment
// test if `super.` is faster/slower than `this.` access
return super.buffer.slice(this.byteOffset, this.byteOffset + this.byteLength);
// return super.buffer.slice(super.byteOffset, super.byteOffset + super.byteLength);
}
}
// var customUint32ArrayOverWholeBufferCached = new customUint32Array(memory);
function Partition(){
// test performance of `new` vs `customUint32ArrayOverWholeBufferCached.subarray`
// for fastest array buffer view creation
return new customUint32Array(memory, findPartitionByteOffset(), 4);
// return customUint32ArrayOverWholeBufferCached.subarray(findPartitionIndex(), 4);
}
By the way 'private' class properties in most JS environments are exposed as any other property and will leak original buffer.
Prototype chains forged manually instead of class X extends Y are welcome in comments.
If one will pass my original buffer leak tests, I'll include it here.
Current instance' prototype chain looks something like: customUint32Array.Uint32Array.TypedArray.prototype.Object
I'm still learning JS. In some other languages, you can pass variables byref and then modify them elsewhere in code.
In an attempt to avoid having lots of duplicate code, I have structured a series of callbacks and parsing like so:
class MarketData {
constructor() {
//Arrays
this.OneMinuteData = [];
this.ThreeMinuteData = [];
this.initializeCandleData();
}
initializeData() {
var client = new Client();
this._initializeData(60, client, this.OneMinuteData);
this._initializeData(180, client, this.ThreeMinuteData);
}
_initializeData(granularity, client, dataStore) {
client.GetRates({ granularity: granularity }, function(err, msg, data) {
var items = data.map(item => ({
///data mapped here
}));
dataStore = dataStore.concat(items);
}
}
So essentially I have this 'private' _initializeData function with the hopes of passing in an array and having it add to the array, but since JS passes byval, I cannot achieve the desired effect (e.g. this.OneMinuteData array is not modified).
Because of this, the only way I currently know how to work around this problem is to essentially have the same function copy-pasted for each individual array, which I find incredibly sloppy. Is there a better way of doing this?
but since JS passes byval, I cannot achieve the desired effect (e.g. this.OneMinuteData array is not modified).
While JavaScript does pass by value, that value when dealing with an object (including any array) is a reference.
See the documentation for concat:
The concat() method is used to merge two or more arrays. This method does not change the existing arrays, but instead returns a new array.
So when you say dataStore = dataStore.concat(items);, you assign a new array to the local dataStore variable and discard the old one.
Outside the function, the original array is unchanged.
The reason the array assigned to OneMinuteData is not modified is because you never modify any array.
Push the values of items into dataStore instead.
dataStore.push.apply(dataStore, items);
NB: GetRates has the signature of an asynchronous function, so make sure you don't try to inspect the modifications to OneMinuteData before they are made.
I have read that transferable objects can be transferred really fast using postmessage of web worker. According to this transferable objects are either arraybuffer or messageport.
Question is, how do I convert say an arbitrary object that is of large size (30 mb) to a transferable object and pass it as an argument to postmessage. From what I understand I can convert my array to json string and then convert json string to raw byte data and store that inside of an array object. However, this seems to defeat the purpose of fast transferring.
could someone enlighten me to pass an object as transferable object or if it's even possible?
Thanks in advance!
This misconception is quite recurring here. You're imagining that it's possible to write some fast javascript code to convert your large object into transferable. But indeed, any conversion code you write defeats the purpose, just as you said. And the more complex data, the more speed you lose.
Objects are normally (when not transfering) converted by native structured clone algorithm (which uses implementation defined format and sure is optimal). Any javascript code you write will most likely be slower than structured clone, while achieving the same goal - transferring data as binary.
The purpose of transferable objects is to allow transfer for binary data, such as images (from canvas), audio or video. These kinds of data can be transferred without being processed by structured clone algorithm, which is why transferable interface was added. And the effect is insignificant even for these - see an answer about transferable speed.
As a last note, I wrote a prototype based library that converts javascript objects to ArrayBuffer and back. It's slower, especially for JSON like data. It's advantages (and advantages of any similar code you write) are:
You can define custom object conversions
You can use inheritance (eg. sending your own types, like Foo)
Code to transfer JSON like object
If your data is like JSON, just stick to structured clone and do not transfer. If you don't trust me, test it with this code. You will see it's slower than normal postMessage.
var object = {dd:"ddd", sub:{xx:"dd"}, num:666};
var string = JSON.stringify(object);
var uint8_array = new TextEncoder(document.characterSet.toLowerCase()).encode(string);
var array_buffer = uint8_array.buffer;
// now transfer array buffer
worker.postMessage(array_buffer, [array_buffer])
The opposite conversion, considering you have some ArrayBuffer:
// Let me just generate some array buffer for the simulation
var array_buffer = new Uint8Array([123,34,100,100,34,58,34,100,100,100,34,44,34,115,117,98,34,58,123,34,120,120,34,58,34,100,100,34,125,44,34,110,117,109,34,58,54,54,54,125]).buffer;
// Now to the decoding
var decoder = new TextDecoder("utf-8");
var view = new DataView(array_buffer, 0, array_buffer.byteLength);
var string = decoder.decode(view);
var object = JSON.parse(string);
Should have looked up Tomas's answer earlier.
Proof, although not specifically the way Tomas suggested.
Version A
Version B
I manually converted to a stringified json obejct to a Uint8Array like so:
function stringToUintArray(message) {
var encoded = self.btoa(message);
var uintArray = Array.prototype.slice.call(encoded).map(ch => ch.charCodeAt(0));
var uarray = new Uint8Array(uintArray);
return uarray;
}
and transferred it like so from the web worker to main thread:
console.time('generate');
var result = generate(params.low, params.high, params.interval, params.size);
var uarr = stringToUintArray(JSON.stringify(result));
console.timeEnd('generate');
self.postMessage(uarr.buffer, [uarr.buffer]);
and on the main thread I did something like this:
var uarr = new Uint8Array(e.data);
var json = UintArrayToString(uarr);
var result = JSON.parse(json);
In FORTRAN and C++, the address of a specific array element can be passed into a function. For example, in the main routine, WA1 is a work array of size 25 and offset is an integer variable that indicates the offset from the 0-index. Say offset is presently 6.
The declaration of the sub-routine might look like the following:
void Array_Manip1(double* WorkArray1){
. . .
When the sub-routine is called in the main program, the call might look like this:
Array_Manip1(&WA1[offset]);
By doing this, I can index WorkArray1 within the sub-routine starting at the 0-index, but knowing it is actually WA1[6].
e.g. -
for (int i = 0; i < 19; ++i)
WorkArray1[i] = whatever computation is required.
To do this in Javascript, I suppose the full array could be passed in to the sub-routine, plus one more variable to hold the offset. And then within the sub-routine, the offset would have to be added to the array index value.
e. g. -
for (int i = 0; i < 19; ++i){
WorkArray1[offset + i] = whatever computation is required.
But now I am passing one more variable into the sub-routine, and have to add the offset to the array index each time through the loop.
Is there a better way to do this in Javascript?
Is there a way to imitate C++'s ability to pass the address of a specific array element into a function?
The cleanest way would be to splice the array and pass in a subarray from the current index on. That way you still have one reference, and everything stays clean.
But no, arrays in most higher level languages do not allow you to reference a single element and then get back to the array. It is dangerous for a number of reasons on those kinds of languages where the underlying data may not even be stored contiguously. JavaScript is no exception, and you can pass in an array and an index, or a subarray, but you can't pass in a reference to an element in the array and get back to the array after passing it in.
Tim's answer is correct. I just want to add something about the C-like typed arrays: they can be created as a view into an ArrayBuffer, in which case you could create a new view of the same buffer as the larger array but starting at an offset, and pass that, without duplicating the underlying data. Closest you can get to your pointers.
You can sort of do what you want. And in fact, sometimes javascript can only do what you want. It all depends on what data the array contains.
In Javascript, the content of a variable may either be a value or a reference (pointer but without pointer arithmetic). But you have no choice in the matter. Numbers and strings are always values (there are exceptions but none of them apply when passing as function arguments) and everything else are always references.
So to get the behavior you want, simply use an object or array as your value holder instead of a string or number:
var ref_array = [ {value:1}, {value:2}, {value:3} ];
function increment (v_obj) {
v_obj.value ++;
}
var ref = ref_array[1];
increment(ref);
// ref_array will now contain: [{value:1},{value:3},{value:3}]
It's not that simple though. While the object appears to be passed by reference, the reference is however copied when the function is called. What this means is that ref and ref_array[1] and v_obj are three separate variables that point to the same thing.
For example, this wouldn't work:
function replace (obj1, obj2) {
obj1 = obj2;
}
replace(ref_array[1], {value:9});
// ref_array is still: [{value:1},{value:3},{value:3}]
That's because, while obj1 in the function above points to the same object as ref_array[1], it is not really a pointer to ref_array[1] but a separate variable. In C, this would be something like obj1 = &ref_array[1]. So passing an argument passes a copy of the pointer but not the pointer itself.
I have a C++ code like this:
extern "C" {
void MyCoolFunction (int** values)
{
int howManyValuesNeeded = 5;
*values = new int[howManyValuesNeeded];
for (int i = 0; i < howManyValuesNeeded; i++) {
(*values)[i] = i;
}
}
}
From C++ it can be used like this:
int *values = NULL;
MyCoolFunction (&values);
// do something with the values
delete[] values;
Of course the real code is much more complicated, but the point is that the function allocates an int array inside, and it decides what the array size will be.
I translated this code with Emscripten, but I don't know how could I access the array allocated inside the function from javascript. (I already know how to use exported functions and pointer parameters with Emscripten generated code, but I don't know how to solve this problem.)
Any ideas?
In Emscripten, memory is stored as a giant array of integers, and pointers are just indexes into that array. Thus, you can pass pointers back and forth between C++ and Javascript just like you do integers. (It sounds like you know how to pass values around, but if not, go here.)
Okay. Now if you create a pointer on the C++ side (as in your code above) and pass it over to Javascript, Emscripten comes with a handful of helper functions to allow you to access that memory. Specifically setValue and getValue.
Thus, if you passed your values variable into JS and you wanted to access index 5, you would be able to do so with something like:
var value5 = getValue(values+(5*4), 'i32');
Where you have to add the index times the number of bytes (5*4) to the pointer, and indicate the type (in this case 32 bit ints) of the array.
You can call the delete from JavaSCript by wrapping it inside another exported function.
extern "C" { ...
void MyCoolFunction (int** values);
void finsih_with_result(int*);
}
void finsih_with_result(int *values) {
delete[] values;
}
Alternatively you may also directly do this on JavaScript side: Module._free(Module.HEAPU32[values_offset/4]) (or something like that; code not tested).