Which one is preferable: Buffer.from() or TextEncoder.encode()? - javascript

From my understanding and API docs, in Node the following are equivalent and return an Uint8Array:
Buffer.from(someString, 'utf-8')
(new TextEncoder()).encode(someString)
Is either of those on the way of becoming deprecated? Does someone know of any considerations that make either Buffer or TextEncoder/TextDecoder preferable over the other, if all that’s needed is converting UTF-8 strings to and from Uint8Arrays?

From my understanding, Buffer is Node’s original implementation of binary blobs before equivalent feature has made its way into browser JS runtime.
After browsers went with a different API, Node runtime incorporated that as well (which makes sense from code portability standpoint), and preserved the original buffer support.
As a result, in Node there are multiple ways of achieving roughly similar results when it comes to binary blobs, where some ways will also work in browser while others won’t. Buffer.from()/TextEncoder.encode() might be one of them.
I’m not sure if there’s any performance gain to be had by choosing “Node classic” Buffer API over browser-compatible TextEncoder.

Related

Convert between Blobs and ArrayBuffers without copying?

I can't find any performance info in the documentation about converting between ArrayBuffers and Blobs in Javascript. I believe the standard methods are:
const blob = new Blob([arrayBuffer])
and
const resp = new Response(blob)
const ab = await resp.arrayBuffer()
Note that I'm only concerned with Blobs that are in-memory, not File-based (so no I/O time is involved.) I did some timings in a codepen: https://codepen.io/oberbrunner/pen/ExjWLWY
which shows that both of those operations scale in time according to the size of the ArrayBuffer, so they are likely copying the data.
In fact just creating a new Response(blob) takes longer with a bigger blob, if my timing code is accurate. Also, the new Blob.arrayBuffer() function also appears to be O(n), although it's faster than the Response way. But not yet well-supported by browsers in 2020.
So my question is: is there authoritative documentation on the expected performance of these operations? Am I just using them wrong? Are there faster (constant-time, zero-copy) ways?
At the risk of answering an older question, for posterity -- the authoritative documentation you are looking for would be the [relevant sections of] File API and ECMAScript-2021 specifications for Blob and ArrayBuffer respectively.
As far as I have been able to determine, neither specification mandates a particular allocation scheme for either class of object; while specification of ArrayBuffer by ECMAScript may appear to be more suggestive of a particular data allocation mechanism, it's only so within the scope of the specification itself -- I cannot find anything like "you must allocate the data on the heap"; while specification of Blob is purposefully even more abstract and vague with regard to where and how data comprising the "blob" is actually allocated if at all.
Most user agents, in part guided by these specifications, will fall somewhere between naively allocating data of corresponding ArrayBuffer length, on the heap, while constructed blob objects may be efficiently backed by memory mapped files (in case of File blobs, for instance) which will basically make them backed by page file on some operating system -- meaning no RAM reserved at all until so required (for some operation on the blob, like converting it to an ArrayBuffer).
So a conversion from a Blob to an ArrayBuffer, while technically not specified to be O(n), would be so in most user agents, since the latter class facilitates actual immediate read and write data access (by element index etc), while Blob does not allow any immediate data access itself.
Now, I said "most user agents", because technically you could design a very elaborate data access mechanism with so copy-on-write semantics where no data is allocated when obtaining an ArrayBuffer from a Blob (through any of the methods you described in your question or the more modern APIs like Blob's arrayBuffer method) before the corresponding data needs to be read and/or modified.
In other words, all your benchmarks only pertain to concrete implementations (read user agents you have tried) which are free to implement either class and operations related to it however they seem fit as long as they do not violate their corresponding specifications.
However, if enough people will start to employ aggressive conversion of blobs back and forth, there isn't anything stopping vendors like Mozilla or Google from optimizing their implementations into something like described above (copy-on-write) which may or may not make them into O(log(n)) operations, for instance. After all, JavaScript was for a very long time called "interpreted" language -- but today calling it an interpreted language would be a bit of a misnomer, what with Chrome's V8 and Firefox's SpiderMonkey compiling native code out of it, in the name of optimization. And, again, they are free to do so -- no applicable specification of the language or a related environment mandates it be "interpeted". Same with blobs and array buffers.
Unfortunately, in practice, this means we have to live with what actually runs our programs that use blobs and array buffers -- which incur a real O(n) cost when you need to do something useful with a blob, normally.

Listing Canvas and WebAudio contexts methods and properties under Node.js

I am working on a tool dedicated to compression for demoscene, initiated for js1k and targeted to prods in the 1k-4k categories.
The current difficulty I am facing is to have it work and produce the exact same results in both browser and Node.js environments.
One of its feature requires knowing all methods and properties of 2D, GL and Audio contexts. It also needs the values for GL constants.
No method is ever invoked though, so the actual implementation is not needed.
EDIT - an example to give a better understanding of what is going on
The original uncompressed code given to the packer looks like this (after stripping lines not relevant here, such as those adding colors to n)
c=a.getContext("2d");
e=c.getImageData(0,0,150,150);
c.fillStyle=n=c.createRadialGradient(225,75,25,225,75,60);
c.fillRect(150,0,150,150);
The packer computes the best hash, in this case i[0]+i[6]. It then replaces the methods in the code, and prepends a loop to perform the hashing (the output is standalone, thus it contains the decompression routine). Otherwise at runtime, the js interpreter would have no way to understand that c.cR() is actually context.createRadialGradient(). Here is the resulting code :
for(i in c=a.getContext("2d"))c[i[0]+i[6]]=c[i];
e=c.gg(0,0,150,150);
c.fillStyle=n=c.cR(225,75,25,225,75,60);
c.fc(150,0,150,150);
In case of a collision (several methods resulting in the same hashed string), the replacement is not performed.
Inside the browser, one can simply create an instance of the appropriate context and iterate on its methods/properties. However, Node.js does not provide this possibility. I need another way to obtain that information.
The answers to similar questions (2d canvas or WebAudio) suggested the use of Canvas module or Node WebAudio API. However, these modules are not a perfect mirror of their browser counterparts, having either additional methods, or a subset thereof. This will in some cases cause the hashing algorithm to produce a different output.
Unfortunately, this rules out the solution, as the same result is needed in both environments. What other options are possible ? Thanks in advance.

CPU vs Memory usage (theory)

I found some interesting post about memory usage and CPU usage here on Stack Overflow, however none of them had a direct approach to the apparently simple question:
As a generic strategy in a JavaScript app, is it better in terms of performances to use memory (storing data) or CPU (recalculating data each time)?
I refer to javascript usage in common browsers environment (FF, Chrome, IE>8)
Does anybody have a more direct and documented answer to this?
--- EDIT ---
Ok, I understand the question is very generic. I try to reduce the "scope".
Reading your answer I realized that the real question is: "how to undestand the memory limit under which my javascript code still has good performances?".
Environment: common browsers environment (FF, Chrome, IE>8)
Functions I use are not very complex math functions, but can produce quite a huge amount of data (300-400kb) and I wanted to understand if it was better to recalculate them every time or just store results in variables.
Vaguely related - JS in browsers is extremely memory hungry when you start using large objects / arrays. If you think about binary data produced by canvas elements, or other rich media APIs, then clearly you do not want to be storing this data in traditional ways - disregarding performance issues, which are also important.
From the MDN article talking about JS Typed Arrays:
As web applications become more and more powerful, adding features such as audio and video manipulation, access to raw data using WebSockets, and so forth, it has become clear that there are times when it would be helpful for JavaScript code to be able to quickly and easily manipulate raw binary data.
Here's a JS Perf comparison of arrays, and another looking at canvas in particular, so you can get some direct examples on how they work. Hope this is useful.
It's just another variation on the size/performance tradeoff. Storing values increases size, recalculating decreases performance.
In some cases, the calculation is complex, and the memory usage is small. This is particularly true for maths functions.
In other cases, the memory needed would be huge, and calculations are simple. This is particularly true when the output would be a large data structure, and you can calculate an element in the structure easily.
Other factors you will need to take into account is what resources are available. If you have very limited memory then you may have no choice, and if it is a background process then perhaps using lots of memory is not desirable. If the calculation needs to be done very often then you are more likely to store the value than if it's done once a month...
There are a lot of factors in the tradeoff, and so there is no "generic" answer, only a set of guidelines you can follow as each case arises.

C.S. Basics: Understanding Data Packets, Protocols, Wireshark

The Quest
I'm trying to talk to a SRCDS Server from node.js via the RCON Protocol.
The RCON Protocol seems to be explained enough, implementations can be found on the bottom of the site in every major programming language. Using those is simple enough, but understanding the protocol and develop a JS library is what I set out to do.
Background
Being a self taught programmer, I skipped a lot of Computer Science Basics - learned only what I needed, to accomplish what I wanted. I started coding with PHP, eventually wrapped my head around OO, talked to databases etc. I'm currently programming with JavaScript, more specifically doing web stuff with node.js ..
Binary Data?!?!
I've read and understood the absolute binary basics. But when it comes to the packet data I'm totally lost. I'd like to read and understand the wireshark output, but I can't make any sense if it. My biggest problem is probably that I don't understand what the binary representation of the various INT and STRING (char ..) from JS look like and how I convert from data I got from the server to something usable in the program.
Help
So I'd be more than grateful if someone can point me to a tutorial on these topics. Tutorial as in "explanation that mere mortals can understand, preferably not written by a C.S. professor". :)
When I'm looking at the PHP reference implementation I see (too much) magic happening there which I can't translate to JS. Sending and reading data from a socket is no problem, but I need to know how PHPs unpack function works respectively how I can do that in JS with node.js.
So I hope you can see what I'm trying to accomplish here. First and foremost is understanding the whole theory needed to make implementing the protocol a breeze. But because I'm only good with scripting languages it would be incredibly helpful if someone could guide me a bit in the HOWTO part in PHP/JS..
Thank you so much for your time!
I applaud the low level protocol pursuit.
I'll tell you the path I took. My approach was to use the client and server that already spoke the protocol and use libpcap to do analysis. I created a library that was able to unpack the custom protocol I was analyzing during this phase.
Its super helpful to start with diagrams like this one:
From the wiki on TCP. Its an incredibly useful way to visualize the structure of the binary data. Its tightly packed, so slicing it apart requires attention to detail.
Buffers and Binary
I read up on Buffer. Its the way you deal with Binary in node. http://nodejs.org/docs/v0.4.8/api/buffers.html -- the first thing to realize here is that buffers can be accessed bit by bit via array syntax, ie buffer[0] and such.
Visualization
Its helpful to be able to dump your binary data into a hex representation. I used https://github.com/a2800276/hexy.js to achieve this.
node_pcap
I grabbed https://github.com/mranney/node_pcap -- this is the equivalent to wireshark, but you can programmatically poke at all outgoing and incoming traffic. I added udp payload support: https://github.com/jmoyers/node_pcap/commit/2852a8123486339aa495ede524427f6e5302326d
I read through all mranney's "unpack" code https://github.com/mranney/node_pcap/blob/master/pcap.js#L116-171
I found https://github.com/rmustacc/node-ctype
I read through all their "unpack" code https://github.com/rmustacc/node-ctype/blob/master/ctio.js
Now, things to remember when you're looking through this stuff. Most of the time they're taking a binary Buffer representation and converting to a native javascript type, like say Number or String. They'll use advanced techniques to do so -- bitwise operations like shifts and such. You don't necessarily need to understand all that.
The key things are:
1) endianness -- the ordering of bits (network and host byte order can be reverse from each other) as this pertains to how things are unpacked
2) Javascript Number representation is quirky -- node-ctype goes into detail in the comments about how they convert the various number types in javascript's Number. Integer, float, double etc are all Number in javascript land.
In the end, its likely fine if you just USE these unpackers for your adventures. I ended up having to unpack things that weren't covered in these libraries, like GUIDs and such, and it was tremendously helpful to study the source.
Isolate the traffic you're looking at
Filter, filter, filter. Target one host. Target one direction. Target one message type. Focus on stripping off data that has a known fixed length first -- often times the header in a protocol is a good place to start. Once you get the header unpacking into a nice json structure from binary, you are well on your way.
After that, its one field at a time, top to bottom, one message at a time. You can use Buffer#slice and the unpack functions from node-ctype to grab each piece of data at a time.

Performing .replace() on Buffer (Node.js) contents?

This is quite a newb question, but I have not found any reliable answers through Google/SO/Etc.
If you have content in a Buffer, what is the best pattern for running a .replace() on that content?
Do you simply pull out the content with .toString(), run replace(), then put it back in the Buffer? Or is there a better way?
Thanks
Depends on what you want to replace, Buffers don't reallocate them self, the Buffer object you have in JavaScript is merely a "pointer" into an external memory region (I'm speaking specifically about Node.js 3.x here, the old "SlowBuffers" in 2.x work in a different way).
So there are two possible scenarios:
Your replacement value's length is <> the value that's being replaced. In this case there's not much you can do, you need to use toString() which allocates a new String (hint: slow) and then create a new Buffer based on the size of that string.
You're just swapping bytes ([] on buffers is not a character index) here it will be way faster would be faster on 2.x to just use a plain loop, and perform the replacement your self, since there is nearly no allocation overhead (Node allocates a new int with the same value as the one that was written) but on 3.x toString is fine for 99% of the time.
But what you really want to watch out for is, that you don't write gigantic strings to sockets, because that's really slow under 2.x.
Due to the fact that V8 can move the strings in memory at any time, Node 2.x needs to copy them out before passing their pointer to the OS. This has been fixed with some hacks on V8 in 3.x.

Categories

Resources