Struct operations in Javascript through Emscripten - javascript

I am having quite a lot of problems with emscripten inter-operating between C and Javascript.
More specifically, I am having trouble accessing a struct created in C in javascript, given that the pointer to the struct is passed into javascript as an external library.
Take a look at the following code:
C:
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
struct test_st;
extern void read_struct(struct test_st *mys, int siz);
struct test_st{
uint32_t my_number;
uint8_t my_char_array[32];
};
int main(){
struct test_st *teststr = malloc(sizeof(struct test_st));
teststr->my_number = 500;
for(int i = 0; i < 32; i++){
teststr->my_char_array[i] = 120 + i;
}
for(int i = 0; i < 32; i++){
printf("%d\n",teststr->my_char_array[i]);
}
read_struct(teststr,sizeof(teststr));
return 0;
}
Javascript:
mergeInto(LibraryManager.library,
{
read_struct: function(mys,siz){
var read_ptr = 0;
console.log("my_number: " + getValue(mys + read_ptr, 'i32'));
read_ptr += 4;
for(var i = 0; i < 32; i++){
console.log("my_char[" + i + "]: " + getValue(mys + read_ptr, 'i8'));
read_ptr += 1;
};
},
});
This is then compiled using emcc cfile.c --js-library jsfile.js.
The issue here is that you can't really read structs in javascript, you have to get memory from the respective addresses according to the size of the struct field (so read 4 bytes from the uint32_t and 1 byte from the uint8_t). Ok, that wouldn't be an issue, except you also have to state the LLVM IR type for getValue to work, and it doesn't include unsigned types, so in the case of the array, it will get to 127 and overflow to -128, when the intended behaviour is to keep going up, since the variable is unsigned.
I looked everywhere for an answer but apparently this specific intended behaviour is not common. Changing the struct wouldn't be possible in the program I'm applying this to (not the sample one above).

One way is to use the HEAP* typed arrays exposed by Emscripten, which do have unsigned views:
mergeInto(LibraryManager.library, {
read_struct: function(myStructPointer, size) {
// Assumes the struct starts on a 4-byte boundary
var myNumber = HEAPU32[myStructPointer/4];
console.log(myNumber);
// Assumes my_char_array is immediately after my_number with no padding
var myCharArray = HEAPU8.subarray(myStructPointer+4, myStructPointer+4+32);
console.log(myCharArray);
}
});
This works in my test, running Emscripten 1.29.0-64bit, but as noted it makes assumptions about alignment/padding. The cases I tested seemed to show that a struct seemed to always start on a 4 byte boundary, and that 32 bit unsigned integers inside a struct were also always aligned on a 4 byte boundary, and so accessible by HEAPU32.
However, it's beyond my knowledge to know if you can depend on this behaviour in Emscripten. It's my understanding that you can't in usual C/C++ world.

Related

How to pass array of strings between javascript and C/C++ code with webassembly/emscripten?

I am trying to write a web application that will do sort of word processing (say spell check, grammar check, word analysis) using back-end C/C++ code. (I have got c/C++ code working in another desktop app... I want to bring it to web).
I want an example minimal code doing this (pass array of strings from JavaScript to c/c++ code...c/c++ code will do the word operations... I have this code ......and the resulting array of strings will be sent back to JavaScript where they will be processed further. (passing arrays to and from is important)
Please point me to any such code/tutorial, from where I can make a start.
I searched GitHub. I found several projects using emscripten but could not get this anywhere. (Only place I could get some clue was Hunspell built with emscripten ... however I could not build it successfully)
Please let me know . Thanks in advance.
First prepare the C++ side to receive a string (character array):
static char *string_buffer = NULL;
static size_t string_length = 0;
void EMSCRIPTEN_KEEPALIVE string_start_js(void) {}
void EMSCRIPTEN_KEEPALIVE string_final_js(void) {}
char * EMSCRIPTEN_KEEPALIVE string_ensure(size_t length)
{
// ensure that the buffer is long enough
if (length <= string_length) return string_buffer;
// grow the buffer
char *new_buffer = realloc(string_buffer, length + 1);
// handle the out of memory
if (new_buffer == null) return NULL;
// remember
string_buffer = new_buffer;
string_length = length;
// done
return string_buffer;
}
void EMSCRIPTEN_KEEPALIVE string_handle(size_t length)
{
// sanity
if (string_buffer == NULL || length > string_length) halt;
// terminate
string_buffer[length] = 0;
// work with the string characters, store/process it
}
void EMSCRIPTEN_KEEPALIVE string_clear(void)
{
// friendly
if (string_buffer == NULL) return;
// free
free(string_buffer);
// remember
string_buffer = NULL;
string_length = 0;
}
From the JavaScript side send one string to the C++ side:
let strings = ["abc", "defg", "1"];
// inform the C++ side that some strings are going to be transferred
exports['string_start_js']();
// send all strings
for (var i = 0; i < strings.length; i++)
{
// single string to transport
let string = strings[i];
// convert to a byte array
let string_bytes = new TextEncoder().encode(string);
// ensure enough memory in the C++ side
let string_offset = exports["string_ensure"](string_bytes.byteLength);
// handle the out of memory
if (string_offset == 0) throw "ops...";
// have view of the instance memory
let view = new Uint8Array(memory.buffer, string_offset, string_bytes.byteLength);
// copy the string bytes to the memory
view.set(string_bytes);
// handle
exports['string_handle'](string_bytes.byteLength);
}
// inform the C++ side that all strings were transferred
exports['string_final_js']();
// clear the used buffer
exports['string_clear']();
The way from C++ to WASM can be more simple:
have a character array (pointer) and its length
call an import function to give the array pointer to JavaScript and its length
make a view of the memory
read the characters from the view
Something like this in the C++ side:
extern "C" {
extern void string_start_cpp(void);
extern void string_final_cpp(void);
extern void string_fetch(char *pointer, size_t length);
}
void foo(void)
{
// inform the JavaScript side that
string_start_cpp();
// runtime string
const char *demo = "abc";
// send to JavaScript
string_fetch(demo, strlen(demo));
// inform the JavaScript side all strings were send
string_final_cpp();
}
And in JavaScript supply the functions during the instance creation:
string_start_cpp: function(offset, length)
{
console.log("{");
},
string_final_cpp: function(offset, length)
{
console.log("}");
},
string_fetch: function(offset, length)
{
// view the bytes
let view = new Uint8Array(memory.buffer, offset, length);
// convert the UTF-8 bytes to a string
let string = new TextDecoder().decode(view);
// use
console.log(string);
}
I did not test the code, there could be some syntax errors. You can improve in many places the code, but the idea is what counts.

unresolved symbol: llvm_trap from Emscripten

When I tried to compile the following snippet into WebAssembly binary, I kept hitting the unresolved symbol: llvm_trap warning, which makes the wasm code not consumable from JS.
emcc test.c -s WASM=1 -s ONLY_MY_CODE=1 -s "EXPORTED_FUNCTIONS=['_test']" -O2 -g -o test.js
test.c (This is a test code to reproduce the issue without doing meaningful jobs.)
int test(int *buf) {
int C = 1;
// Assuming WebAssembly.Memory buffer has been preloaed with data.
// *T represents the preloaded data here. And We know *T and *buf
// won't overlap in memory.
int *T = 0;
int index = C ^ buf[5];
int right = T[index];
int left = (unsigned)C >> 8;
// warning disappears if this is commented out. But why?
C = left ^ right;
return C;
}
I didn't write any llvm_trap related code. Does someone have ideas what does it mean?
The variable T must be initialised. If it represents an array that 'maps' to the WebAssembly linear memory, you can define it as a global as follows:
int T[1000];
int test(int *buf) {
int C = 1;
int index = C ^ buf[5];
int right = T[index];
int left = (unsigned)C >> 8;
// warning disappears if this is commented out. But why?
C = left ^ right;
return C;
}
This compiles without the llvm_trap warnings.
For more detail on how to pass data to a WASM function using linear memory, see the following question:
How to access WebAssembly linear memory from C/C++

Reading variable length bits from a binary string

Im new to javascript and node.js, I have a base64 encoded string of data that I need to parse several values from which are of various bit lengths.
I figured I would start by using the Buffer object to read the b64 string but from there I am completely lost.
The data are a series of unsigned integers, The format is something akin to this:
Header:
8 bits - uint
3 bits - uint
2 bits - uint
3 bits - unused padding
6 bits - uint
After that there are recurring sections of either 23 bit or 13 bit length of data each with a couple of fields I need to extract.
An example of a 23 bit section:
3 bit - uint
10 bit - uint
10 bit - uint
My question is this, What is the best way to take an arbitrary number of bits and put the resulting value in a separate uint? Note that some of the values are multi-byte (> 8 bits) so I cant step byte for byte.
I apologize if my explanation is kind of vague but hopefully it will suffice.
One simple way to read any amount of bits is e.g.
function bufferBitReader(buffer) {
var bitPos = 0;
function readOneBit() {
var offset = Math.floor(bitPos / 8),
shift = 7 - bitPos % 8;
bitPos += 1;
return (buffer[offset] >> shift) & 1;
}
function readBits(n) {
var i, value = 0;
for (i = 0; i < n; i += 1) {
value = value << 1 | readOneBit();
}
return value;
}
function isEnd() {
return Math.floor(bitPos / 8) >= buffer.length;
}
return {
readOneBit: readOneBit,
readBits: readBits,
isEnd: isEnd
};
}
You just take your but buffer and initialize the reader by
var bitReader = bufferBitReader(buffer);
Then you can read any number of bits by calling
bitReader.readBits(8);
bitReader.readBits(3);
bitReader.readBits(2);
...
You can test whether you already read all bits by
bitReader.isEnd()
One thing to make sure is the actual order of bit that is expected... some 'bit streams' are expected to get bits from the least significant to the most significant.. this code expects the opposite that the first bit you read is the most significant of the first byte...

Bitwise XOR in Javascript compared to C++

I am porting a simple C++ function to Javascript, but it seems I'm running into problems with the way Javascript handles bitwise operators.
In C++:
AnsiString MyClass::Obfuscate(AnsiString source)
{
int sourcelength=source.Length();
for(int i=1;i<=sourcelength;i++)
{
source[i] = source[i] ^ 0xFFF;
}
return source;
}
Obfuscate("test") yields temporary intvalues
-117, -102, -116, -117
Obfuscate ("test") yields stringvalue
‹šŒ‹
In Javascript:
function obfuscate(str)
{
var obfuscated= "";
for (i=0; i<str.length;i++) {
var a = str.charCodeAt(i);
var b = a ^ 0xFFF;
obfuscated= obfuscated+String.fromCharCode(b);
}
return obfuscated;
}
obfuscate("test") yields temporary intvalues
3979 , 3994 , 3980 , 3979
obfuscate("test") yields stringvalue
ྋྚྌྋ
Now, I realize that there are a ton of threads where they point out that Javascript treats all numbers as floats, and bitwise operations involve a temporary cast to 32bit int.
It really wouldn't be a problem except for that I'm obfuscating in Javascript and reversing in C++, and the different results don't really match.
How do i tranform the Javascript result into the C++ result? Is there some simple shift available?
Working demo
Judging from the result that xoring 116 with 0xFFF gives -117, we have to emulate
2's complement 8-bit integers in javascript:
function obfuscate(str)
{
var bytes = [];
for (var i=0; i<str.length;i++) {
bytes.push( ( ( ( str.charCodeAt(i) ^ 0xFFF ) & 0xFF ) ^ 0x80 ) -0x80 );
}
return bytes;
}
Ok these bytes are interpreted in windows cp 1252 and if they are negative, probably just subtracted from 256.
var ascii = [
0x0000,0x0001,0x0002,0x0003,0x0004,0x0005,0x0006,0x0007,0x0008,0x0009,0x000A,0x000B,0x000C,0x000D,0x000E,0x000F
,0x0010,0x0011,0x0012,0x0013,0x0014,0x0015,0x0016,0x0017,0x0018,0x0019,0x001A,0x001B,0x001C,0x001D,0x001E,0x001F
,0x0020,0x0021,0x0022,0x0023,0x0024,0x0025,0x0026,0x0027,0x0028,0x0029,0x002A,0x002B,0x002C,0x002D,0x002E,0x002F
,0x0030,0x0031,0x0032,0x0033,0x0034,0x0035,0x0036,0x0037,0x0038,0x0039,0x003A,0x003B,0x003C,0x003D,0x003E,0x003F
,0x0040,0x0041,0x0042,0x0043,0x0044,0x0045,0x0046,0x0047,0x0048,0x0049,0x004A,0x004B,0x004C,0x004D,0x004E,0x004F
,0x0050,0x0051,0x0052,0x0053,0x0054,0x0055,0x0056,0x0057,0x0058,0x0059,0x005A,0x005B,0x005C,0x005D,0x005E,0x005F
,0x0060,0x0061,0x0062,0x0063,0x0064,0x0065,0x0066,0x0067,0x0068,0x0069,0x006A,0x006B,0x006C,0x006D,0x006E,0x006F
,0x0070,0x0071,0x0072,0x0073,0x0074,0x0075,0x0076,0x0077,0x0078,0x0079,0x007A,0x007B,0x007C,0x007D,0x007E,0x007F
];
var cp1252 = ascii.concat([
0x20AC,0xFFFD,0x201A,0x0192,0x201E,0x2026,0x2020,0x2021,0x02C6,0x2030,0x0160,0x2039,0x0152,0xFFFD,0x017D,0xFFFD
,0xFFFD,0x2018,0x2019,0x201C,0x201D,0x2022,0x2013,0x2014,0x02DC,0x2122,0x0161,0x203A,0x0153,0xFFFD,0x017E,0x0178
,0x00A0,0x00A1,0x00A2,0x00A3,0x00A4,0x00A5,0x00A6,0x00A7,0x00A8,0x00A9,0x00AA,0x00AB,0x00AC,0x00AD,0x00AE,0x00AF
,0x00B0,0x00B1,0x00B2,0x00B3,0x00B4,0x00B5,0x00B6,0x00B7,0x00B8,0x00B9,0x00BA,0x00BB,0x00BC,0x00BD,0x00BE,0x00BF
,0x00C0,0x00C1,0x00C2,0x00C3,0x00C4,0x00C5,0x00C6,0x00C7,0x00C8,0x00C9,0x00CA,0x00CB,0x00CC,0x00CD,0x00CE,0x00CF
,0x00D0,0x00D1,0x00D2,0x00D3,0x00D4,0x00D5,0x00D6,0x00D7,0x00D8,0x00D9,0x00DA,0x00DB,0x00DC,0x00DD,0x00DE,0x00DF
,0x00E0,0x00E1,0x00E2,0x00E3,0x00E4,0x00E5,0x00E6,0x00E7,0x00E8,0x00E9,0x00EA,0x00EB,0x00EC,0x00ED,0x00EE,0x00EF
,0x00F0,0x00F1,0x00F2,0x00F3,0x00F4,0x00F5,0x00F6,0x00F7,0x00F8,0x00F9,0x00FA,0x00FB,0x00FC,0x00FD,0x00FE,0x00FF
]);
function toStringCp1252(bytes){
var byte, codePoint, codePoints = [];
for( var i = 0; i < bytes.length; ++i ) {
byte = bytes[i];
if( byte < 0 ) {
byte = 256 + byte;
}
codePoint = cp1252[byte];
codePoints.push( codePoint );
}
return String.fromCharCode.apply( String, codePoints );
}
Result
toStringCp1252(obfuscate("test"))
//"‹šŒ‹"
I'm guessing that AnsiString contains 8-bit characters (since the ANSI character set is 8 bits). When you assign the result of the XOR back to the string, it is truncated to 8 bits, and so the resulting value is in the range [-128...127].
(On some platforms, it could be [0..255], and on others the range could be wider, since it's not specified whether char is signed or unsigned, or whether it's 8 bits or larger).
Javascript strings contain unicode characters, which can hold a much wider range of values, the result is not truncated to 8 bits. The result of the XOR will have a range of at least 12 bits, [0...4095], hence the large numbers you see there.
Assuming the original string contains only 8-bit characters, then changing the operation to a ^ 0xff should give the same results in both languages.
I assume that AnsiString is in some form, an array of chars. And this is the problem. in c, char can typically only hold 8-bits. So when you XOR with 0xfff, and store the result in a char, it is the same as XORing with 0xff.
This is not the case with javascript. JavaScript using Unicode. This is demonstrated by looking at the integer values:
-117 == 0x8b and 3979 == 0xf8b
I would recommend XORing with 0xff as this will work in both languages. Or you can switch your c++ code to use Unicode.
First, convert your AnsiString to wchar_t*. Only then obfuscate its individual characters:
AnsiString MyClass::Obfuscate(AnsiString source)
{
/// allocate string
int num_wchars = source.WideCharBufSize();
wchar_t* UnicodeString = new wchar_t[num_wchars];
source.WideChar(UnicodeString, source.WideCharBufSize());
/// obfuscate individual characters
int sourcelength=source.Length();
for(int i = 0 ; i < num_wchars ; i++)
{
UnicodeString[i] = UnicodeString[i] ^ 0xFFF;
}
/// create obfuscated AnsiString
AnsiString result = AnsiString(UnicodeString);
/// delete tmp string
delete [] UnicodeString;
return result;
}
Sorry, I'm not an expert on C++ Builder, but my point is simple: in JavaScript you have WCS2 symbols (or UTF-16), so you have to convert AnsiString to wide chars first.
Try using WideString instead of AnsiString
I don't know AnsiString at all, but my guess is this relates to the width of its characters. Specifically, I suspect they're less than 32 bits wide, and of course in bitwise operations, the width of what you're operating on matters, particularly when dealing with 2's complement numbers.
In JavaScript, your "t" in "test" is character code 116, which is b00000000000000000000000001110100. 0xFFF (4095) is b00000000000000000000111111111111, and the result you're getting (3979) is b00000000000000000000111110001011. We can readily see that you're getting the right result for the XOR:
116 = 00000000000000000000000001110100
4095 = 00000000000000000000111111111111
3979 = 00000000000000000000111110001011
So I'm thinking you're getting some truncation or similar in your C++ code, not least because -117 is b10001011 in eight-bit 2's complement...which is exactly what we see as the last eight bits of 3979 above.

Node.JS Big-Endian UCS-2

I'm working with Node.JS. Node's buffers support little-endian UCS-2, but not big-endian, which I need. How would I do so?
According to wikipedia, UCS-2 should always be big-endian so it's odd that node only supports little endian. You might consider filing a bug. That said, switching endian-ness is fairly straight-forward since it's just a matter of byte order. So just swap bytes around to go back and forth between little and big endian, like so:
function swapBytes(buffer) {
var l = buffer.length;
if (l & 0x01) {
throw new Error('Buffer length must be even');
}
for (var i = 0; i < l; i += 2) {
var a = buffer[i];
buffer[i] = buffer[i+1];
buffer[i+1] = a;
}
return buffer;
}

Categories

Resources