Performing .replace() on Buffer (Node.js) contents?

Performing .replace() on Buffer (Node.js) contents? - javascript

This is quite a newb question, but I have not found any reliable answers through Google/SO/Etc.
If you have content in a Buffer, what is the best pattern for running a .replace() on that content?
Do you simply pull out the content with .toString(), run replace(), then put it back in the Buffer? Or is there a better way?
Thanks

Depends on what you want to replace, Buffers don't reallocate them self, the Buffer object you have in JavaScript is merely a "pointer" into an external memory region (I'm speaking specifically about Node.js 3.x here, the old "SlowBuffers" in 2.x work in a different way).
So there are two possible scenarios:
Your replacement value's length is <> the value that's being replaced. In this case there's not much you can do, you need to use toString() which allocates a new String (hint: slow) and then create a new Buffer based on the size of that string.
You're just swapping bytes ([] on buffers is not a character index) here it will be way faster would be faster on 2.x to just use a plain loop, and perform the replacement your self, since there is nearly no allocation overhead (Node allocates a new int with the same value as the one that was written) but on 3.x toString is fine for 99% of the time.
But what you really want to watch out for is, that you don't write gigantic strings to sockets, because that's really slow under 2.x.
Due to the fact that V8 can move the strings in memory at any time, Node 2.x needs to copy them out before passing their pointer to the OS. This has been fixed with some hacks on V8 in 3.x.

Related

Which one is preferable: Buffer.from() or TextEncoder.encode()?

From my understanding and API docs, in Node the following are equivalent and return an Uint8Array:
Buffer.from(someString, 'utf-8')
(new TextEncoder()).encode(someString)
Is either of those on the way of becoming deprecated? Does someone know of any considerations that make either Buffer or TextEncoder/TextDecoder preferable over the other, if all that’s needed is converting UTF-8 strings to and from Uint8Arrays?

From my understanding, Buffer is Node’s original implementation of binary blobs before equivalent feature has made its way into browser JS runtime.
After browsers went with a different API, Node runtime incorporated that as well (which makes sense from code portability standpoint), and preserved the original buffer support.
As a result, in Node there are multiple ways of achieving roughly similar results when it comes to binary blobs, where some ways will also work in browser while others won’t. Buffer.from()/TextEncoder.encode() might be one of them.
I’m not sure if there’s any performance gain to be had by choosing “Node classic” Buffer API over browser-compatible TextEncoder.

Is there a way to get the address of a variable or object in javascript? [duplicate]

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.

If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?

It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.

You can using a side-channel, but you can't do anything useful with it other than attacking browser security!
The closest to virtual addresses are ArrayBuffers.
If one virtual address within an ArrayBuffer is identified,
the remaining addresses are also known, as both the addresses
of the memory and the array indices are linear.
Although virtual addresses are not themselves physical memory addresses, there are ways to translate virtual address into a physical memory address.
Browser engines allocate ArrayBuffers always page
aligned. The first byte of the ArrayBuffer is therefore at the
beginning of a new physical page and has the least significant
12 bits set to ‘0’.
If a large chunk of memory is allocated, browser engines typically
use mmap to allocate this memory, which is optimized to
allocate 2 MB transparent huge pages (THP) instead of 4 KB
pages.
As these physical pages are mapped on
demand, i.e., as soon as the first access to the page occurs,
iterating over the array indices results in page faults at the
beginning of a new page. The time to resolve a page fault is
significantly higher than a normal memory access. Thus, you can knows the index at which a new 2 MB page starts. At
this array index, the underlying physical page has the 21 least
significant bits set to ‘0’.
This answer is not trying to provide a proof of concept because I don’t have time for this, but I may be able to do so in the future. This answer is an attempt to point the right direction to the person asking the question.
Sources,
http://www.misc0110.net/files/jszero.pdf
https://download.vusec.net/papers/anc_ndss17.pdf

I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

Why aren't strings mutable? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why can't strings be mutable in Java and .NET?
Why .NET String is immutable?
Several languages have chosen for this, such as C#, Java, and Python. If it is intended to save memory or gain efficiency for operations like compare, what effect does it have on concatenation and other modifying operations?

Immutable types are a good thing generally:
They work better for concurrency (you don't need to lock something that can't change!)
They reduce errors: mutable objects are vulnerable to being changed when you don't expect it which can introduce all kinds of strange bugs ("action at a distance")
They can be safely shared (i.e. multiple references to the same object) which can reduce memory consumption and improve cache utilisation.
Sharing also makes copying a very cheap O(1) operation when it would be O(n) if you have to take a defensive copy of a mutable object. This is a big deal because copying is an incredibly common operation (e.g. whenever you want to pass parameters around....)
As a result, it's a pretty reasonable language design choice to make strings immutable.
Some languages (particularly functional languages like Haskell and Clojure) go even further and make pretty much everything immutable. This enlightening video is very much worth a look if you are interested in the benefits of immutability.
There are a couple of minor downsides for immutable types:
Operations that create a changed string like concatenation are more expensive because you need to construct new objects. Typically the cost is O(n+m) for concatenating two immutable Strings, though it can go as low as O(log (m+n)) if you use a tree-based string data structure like a Rope. Plus you can always use special tools like Java's StringBuilder if you really need to concatenate Strings efficiently.
A small change on a large string can result in the need to construct a completely new copy of the large String, which obviously increases memory consumption. Note however that this isn't usually a big issue in garbage-collected languages since the old copy will get garbage collected pretty quickly if you don't keep a reference to it.
Overall though, the advantages of immutability vastly outweigh the minor disadvantages. Even if you are only interested in performance, the concurrency advantages and cheapness of copying will in general make immutable strings much more performant than mutable ones with locking and defensive copying.

It's mainly intended to prevent programming errors. For example, Strings are frequently used as keys in hashtables. If they could change, the hashtable would become corrupted. And that's just one example where having a piece of data change while you're using it causes problems. Security is another: if you checking whether a user is allowed to access a file at a given path before executing the operation they requested, the string containing the path better not be mutable...
It becomes even more important when you're doing multithreading. Immutable data can be safely passed around between threads while mutable data causes endless headaches.
Basically, immutable data makes the code that works on it easier to reason about. Which is why purely functional languages try to keep everything immutable.

In Java not only String but all primitive Wrapper classes (Integer, Double, Character etc) are immutable. I am not sure of the exact reason but I think these are the basic data types on which all the programming schemes work. If they change, things could go wild. To be more specific, I'll use an example: Say you have opened a socket connection to a remote host. The host name would be a String and port would be Integer. What if these values are modified after the connection is established.
As far as performance is concerned, Java allocates memory to these classes from a separate memory section called Literal Pool, and not from stack or Heap. The Literal Pool is indexed and if you use a string "String" twice, they point to the same object from Literal pool.

Having strings as immutable also allows the new string references easy, as the same/similar strings will be readily available from the pool of the Strings previously created. Thereby reducing the cost of new object creation.

Comparing large strings in JavaScript with a hash

I have a form with a textarea that can contain large amounts of content (say, articles for a blog) edited using one of a number of third party rich text editors. I'm trying to implement something like an autosave feature, which should submit the content through ajax if it's changed. However, I have to work around the fact that some of the editors I have as options don't support an "isdirty" flag, or an "onchange" event which I can use to see if the content has changed since the last save.
So, as a workaround, what I'd like to do is keep a copy of the content in a variable (let's call it lastSaveContent), as of the last save, and compare it with the current text when the "autosave" function fires (on a timer) to see if it's different. However, I'm worried about how much memory that could take up with very large documents.
Would it be more efficient to store some sort of hash in the lastSaveContent variable, instead of the entire string, and then compare the hash values? If so, can you recommend a good javascript library/jquery plugin that implements an appropriate hash for this requirement?

In short, you're better off just storing and comparing the two strings.
Computing a proper hash is not cheap. For example, check out the pseudo code or an actual JavaScript implementation for computing the MD5 hash of a string. Furthermore, all proper hash implementations will require enumerating the characters of the string anyway.
Furthermore, in the context of modern computing, a string has to be really, really long before comparing it against another string is slow. What you're doing here is effectively a micro-optimization. Memory won't be an issue, nor will the CPU cycles to compare the two strings.
As with all cases of optimizing: check that this is actually a problem before you solve it. In a quick test I did, computing and comparing 2 MD5 sums took 382ms. Comparing the two strings directly took 0ms. This was using a string that was 10000 words long. See http://jsfiddle.net/DjM8S.
If you really see this as an issue, I would also strongly consider using a poor-mans comparison; and just comparing the length of the 2 strings, to see if they have changed or not, rather than actual string comparisons.
..

An MD5 hash is often used to verify the integrity of a file or document; it should work for your purposes. Here's a good article on generating an MD5 hash in Javascript.

I made a JSperf rev that might be useful here for performance measuring. Please add different revisions and different types of checks to the ones I made!
http://jsperf.com/long-string-comparison/2
I found two major results
When strings are identical performance is murdered; from ~9000000 ops/s to ~250 ops/sec (chrome)
The 64bit version of IE9 is much slower on my PC, results from the same tests:
+------------+------------+
| IE9 64bit | IE9 32bit |
+------------+------------+
| 4,270,414 | 8,667,472 |
| 2,270,234 | 8,682,461 |
+------------+------------+
Sadly, jsperf logged both results as simply "IE 9".
Even a precursory look at JS MD5 performance tells me that it is very, very slow (at least for large strings, see http://jsperf.com/md5-shootout/18 - peaks at 70 ops/sec). I would want to go as far as to try AJAXing the hash calculation or the comparison to the backend but I don't have time to test, sorry!

How can I get the memory address of a JavaScript variable?

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.

If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?

It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.

I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

Develop Reference

JavaScript is the programming language of the Web.