using regexp on raw binary data - javascript

I'm embedding JavaScript in my C++ app (via V8) and I get some raw binary data which I want to pass to JavaScript. Now, in the JavaScript, I plan to do some regular expressions on the data.
When using just the standard JavaScript String object for my data, everything is quite straight-forward. However, as far as I understand it, it uses an UTF16 representation and expects the data to be valid Unicode. But I have arbitrary data (might contain '\0' and other raw data - although it is just text for the most part).
How should I handle this? I searched a bit around and maybe ArrayBuffer or something like this is the object I need to store my raw data. However, I didn't found how to do the usual regular expression methods on that object. (Basically I need RegExp.test and RegExp.exec).
I just checked out the Node.js code and it seems as if they support binary data and just put it into a string via v8::String::NewFromOneByte. See here and here. So that would answer my question (i.e., I can just use String), wouldn't it? Any downsides?
(I still don't see why my question is bad. Please explain the downvote.)

From all my current tests, it seems like it works just as expected with normal String.
You can even specify that in JavaScript directly, e.g.
var s = "\x00\x01\x02\x03"
and regular expressions on that string work like expected.
On the C++ side, if you want to get your binary data into a JS String object:
v8::Local<v8::String> jsBinary(const uint8_t* data, uint32_t len) {
assert(int(len) >= 0);
return String::NewFromOneByte(v8::Isolate::GetCurrent(), data, String::kNormalString, len);
}

Related

Creating Binary Data Structures in Javascript

I'm building a web app, and need to be able to encode user-generated data as concisely as possible before transmitting it to my server. In the past I've used Flash, and it had a very neat system where for any class that you want to serialize, you could write a pair of functions that would describe exactly how to serialize the data. For example:
out.writeShort(session);
out.writeUnsignedInt(itemID);
out.writeObject(arbitraryData);
out.writeShort(score);
You would have to write an equivalent function to read bytes from the serialized data and build the class from it.
Once data is serialized it could be encoded into a Base64 string for safe network transmission to the server.
I can't figure out how to do this in Javascript? JSON is nice and easy but it's incredibly wasteful, sending all object key/value pairs, and unless I'm mistaken everything is encoded as a string? So the value false is encoded as the string "false"?
Any advice on how to implement this in Javascript would be greatly appreciated! Use of libraries is fine so long as they work both on Node and in browser.
Look at this answer. You can use BSON format (Binary JSON) and it doesn't have those features of JSON you mentioned.

Error parsing JSON with escaped quotes inside single quotes

I have a variable var jsonData = '{"Key":"query","Value":"dept=\"Human Resources*\"","ValueType":"Edm.String"}';
I'm trying to parse the variable with JSON.parse(jsonData), however, I'm getting an error "Unexpected token H in JSON at position 30." I can't change how the variable is returned, so here's what I think I understand about the problem:
The JSON.parse(jsonData) errors out because it's not recognizing the escaped double quotes as escaped since it is fully enclosed in single quotes
jsonData.replace(/\\"/g, "\\\\"") or other combinations that I've tried aren't finding the \" because javascript treats \" as just "
QUESTION How can I parse this properly, by either replacing the escaped quotes with something JSON.parse() can handle or using something else to parse this correctly? I'd like to stick with JSON.parse() on account of it's simplicity, but open to other options.
EDIT: Unfortunately I can't change the variable at this stage, it is just a small example of a larger JSON response. This is a temporary solution until the app is granted access to the API, but I needed the solution in the interim until that happens (IT dept can be slow). What I'm doing now its getting a large JSON response back by hitting the API address directly and the browser uses the cookies from the user OAuth for authentication. I then copy and paste the JSON response into my application so I can work with the data. The response is riddled with the escaped quotes and manually editing the text would be laborious and I'm trying to avoid copying into text processor before copying into the variable.
You should escape the backslash character in your code by prefixing it with another backslash. So the code becomes:
var jsonData = '{"Key":"query","Value":"dept=\\"Human Resources*\\"","ValueType":"Edm.String"}';
The first backslash is so that JS puts the second backslash in the string, which must be in the string so that the json parser knows that it should ignore the quote character.
The unfortunate thing about this situation is that in the JavaScript code there is no difference between
var jsonData = '{"Key":"query","Value":"dept=\"Human Resources*\"","ValueType":"Edm.String"}'
and
var jsonData = '{"Key":"query","Value":"dept="Human Resources*"","ValueType":"Edm.String"}'.
You could hardcode information you have about the JSON into the way you program it. For example, you could replace occurences of the regex ([\[\{,:]\s+)\" by $1\" but this would fail to work if the string Human Resources* could also end in a :, { or ,. This would also potentially cause security issues.
In my opinion, the best way to solve your problem would be to put the json response in a json file somewhere so that it can be read into a string by the javascript code that needs to use it.
I think you can also dispense with the initial String to represent the JSON object:
Use a standard JSON object.
Make whatever changes you need on that object.
Call JSON.stringify(YOUR_OBJECT) for a String representation.
Then, JSON.parse(…) when you need an object again.
That should be able to satisfy your initial request, question, keep your current (escaped) String values, and give you some room to make a lot of changes.
To escape your current String value:
obj["Value"] = 'dept=\"Human Resources*\"'
Alternatively, you can nest attributes:
obj["Value"]["dept"] = "Human Resources*"
Which may be helpful for other reasons.
I've found that I've rarely worked with JSON in an enterprise or production environment where the above sequence wasn't used (I've never used a purely string representation in a production environment) simply due to the ease of modifying attributes, generating dynamic data/modifying the JSON object, and actually using the JSON programmatically.
Using string representations for what are really attribute key-value pairings often causes headaches later on (for example, when you want to read the Human Resources* value programmatically and use it).
I hope you find that approach helpful!

How to produce a `ArrayBuffer` from `bytes` using `js_of_ocaml`

I am building a JavaScript library that is implemented in Ocaml and compiled to JavaScript using js_of_ocaml.
One of my Ocaml function returns a string with binary data. How can I expose that using js_of_ocaml as a ArrayBuffer?
When you compile to javascript, manipulating binary data in strings is extremely bug prone!
The underlying reason is questionable choice of js_of_ocaml:
Because javascript strings are encoded in UTF16 whereas OCaml ones are (implicitly) encoded in UTF8, js_of_ocaml tries to navigate in between the 2. Therefore, when it encounters a "character" whose code is > 127, js_of_ocaml converts it which is a disaster if it is, in fact, raw binary data!
The solution is to manipulate bigstrings instead of strings.
Bigstrings are (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout)
Bigarray.Array1.t in raw OCaml but more and more libraries aliases them.
Especially, they are Typed_array.​Bigstring.t in js_of_ocaml (where you can see functions to convert from and to ArrayBuffer)
If your function does work by magic on string once compiled in javascript, there are translation function in between bigstrings and strings in several places.
For example the bigstring library: http://c-cube.github.io/ocaml-bigstring/ but these functions are also available in Lwt_bytes of lwt
You can see an other question on the same subject (including ways to manipulate OCaml string in javascript while not touching them at all using gen_js_api) at
https://discuss.ocaml.org/t/handling-binary-data-in-ocaml-and-javascript/1519

Difference between CryptoJS.enc.Base64.stringify() and normal Base64 encryption

I'm trying to encrypt the following hash to base64:
6bb984727b8c8c8017207e54b63976dc42ea9d24ad33bd5feeaa66869b650096
It's needed to access the API of a website. The website shows an example script in JavaScript using the CryptoJS.enc.Base64.stringify() method to encrypt the hash.
The result with this method is
a7mEcnuMjIAXIH5Utjl23ELqnSStM71f7qpmhptlAJY=
However, every online base64 encryption tool I tried gives me the following result:
NmJiOTg0NzI3YjhjOGM4MDE3MjA3ZTU0YjYzOTc2ZGM0MmVhOWQyNGFkMzNiZDVmZWVhYTY2ODY5YjY1MDA5Ng==
I need to create the encoded string in C++. I've also already tried 4 different base64encode implementations (OpenSSL and custom codes), but also there I get the above result and the API always answers my string is not correctly encoded.
So where is the difference, and does somebody know a C++ implementation for CryptoJS.enc.Base64.stringify()?
Let's call
a = "6bb984727b8c8c8017207e54b63976dc42ea9d24ad33bd5feeaa66869b650096";
b = "a7mEcnuMjIAXIH5Utjl23ELqnSStM71f7qpmhptlAJY=";
c = "NmJiOTg0NzI3YjhjOGM4MDE3MjA3ZTU0YjYzOTc2ZGM0MmVhOWQyNGFkMzNiZDVmZWVhYTY2ODY5YjY1MDA5Ng==";
Both conversions are correct, but depend on what you actually want.
For example the following two equations hold
toBase64FromBytes(toBytesFromUtf8(a)) == c
toBase64FromBytes(toBytesFromHex(a)) == b
It's a bad idea to trust some kind of online calculator, because they usually don't disclose how they encode stuff, so you will get arbitrary results. If you program it yourself, you get the expected results if you follow the documentation.
I suspect you got a by printing a hash or encryption result to the console like this:
console.log(result.toString()); // a
Most result objects in CryptoJS are WordArray types. When you call the toString() function on such an object, you get a Hex-encoded string of that binary object.
If you print result.toString(CryptoJS.enc.Base64) then you get the Base64-encoded string of the binary result.
If you take a and directly encode it to Base64, then it is probably assumed that a is already a string (e.g. UTF-8 encoded). An online calculator doesn't know that it is Hex-encoded.

Flash Twitter API with JSON

I have read a lot about parsing JSON with Actionscript. Originally it was said to use this library. http://code.google.com/p/as3corelib/ but it seems Flash Player 11 has native support for it now.
My problem is that I cannot find examples or help that takes you from beginning to end of the process. Everything I have read seems to start in the middle. I have no real experience with JSON so this is a problem. I don't even know how to point ActionScript to the JSON file it needs to read.
I have a project with a tight deadline that requires me to read twitter through JSON. I need to get the three most recent tweets, along with the user who posted it, their twitter name and the time those tweets were posted.
The back end to this is already set up I believe by the development team here, therefor my JSON files or XML just needs to be pointed to and then I need to display the values in the interface text boxes I have already designed and created.
Any help will be greatly appreciated...I do know that there are a lot of threads on here I just do not understand them as they all have some understanding of it to begin with.
You need to:
Load the data, whatever it is.
Parse the data from a particular format.
For this you would normally:
Use URLLoader class to load any data. (Just go to the language reference and look into example of how to use this class).
Use whatever parser to parse the particular format that you need. http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/JSON.html this is the reference to JSON API, it also shows usage examples. I'm not aware of these API being in production version of the player, still there might be quite a bit of FP 10.X players out there, so I'd have a fallback JSON parser, but I would recommend using this library: http://www.blooddy.by/en/crypto/ over as3corelib because it is faster. The built-in API are no different from those you would find in browser, so if you look up JSON JavaScript entries, the use should be in general similar to Flash.
After you parse JSON format, you will end up with a number of objects of the following types: Object, Array, Boolean, Number, String. It has also literals to mean null and undefined. Basically, you will be working with native to Flash data structures, you only should take extra care because they will be dynamically constructed, meaning you may not make assumption about existence of parts of the data - you must always check the availability.
wvxvw's answer is good, but I think skips over a to be desired explanation of what JSON itself is. JSON is plain text, javascript object notation, when you read the text on screen it looks something like this
http://www.json.org/example.html
you can see a side by side JSON and XML (both plain text formats) essentially JSON is a bunch of name value pairs.
When you use JSON.parse("your JSON string goes here") it will do the conversions to AS3 "dynamic objects" which are just plain objects (whose properties can be assigned without previously being defined, hence dynamic). But to make a long story short, take the example you see in the link above, copy and paste the JSON as a string variable in AS3, use
var str:String = '{"glossary": {"title": "example glossary","GlossDiv": {"title": "S","GlossList": {"GlossEntry": {"ID": "SGML","SortAs": "SGML","GlossTerm": "Standard Generalized Markup Language","Acronym": "SGML","Abbrev": "ISO 8879:1986","GlossDef": {"para": "A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso": ["GML", "XML"]},"GlossSee": "markup"}}}}}';
var test:Object = JSON.parse(str);
method on the string, store it in a variable and use the debugger to see what the resulting object is. As far as I know there's really nothing else to JSON it's simply this format for storing data (you can't use E4X on it since it's not XML based and because of that it's slightly more concise than XML, no closing tags, but in my opionion slightly less readable... but is valid javascript). For a nice break-down of the performance gains/losses between AMF, JSON and XML check out this page: http://www.jamesward.com/census2/ Though many times you don't have a choice with regard to the delivery message format or protocol being used if you're not building the service, it's good to understand what the performance costs of them are.

Categories

Resources