In the WebCrypto/Subtle crypto API, you can generate keys and whatnot. However there appears to be a distinct lack of .destroyKey() or something of the sort.
Are keys cleaned up upon their reference count reaching zero or something of the sort? Is there no way to explicitly destroy/remove a key from memory?
Note that my concern isn't one of security as I know this wouldn't give much of a security benefit, though I am worried about resource leaks and the like. It feels weird not being able to clean up after one's self explicitly.
The Web Cryptography Specification writes:
Authors should be aware that this specification places no normative requirements on implementations as to how the underlying cryptographic key material is stored. The only requirement is that key material is not exposed to script, except through the use of the exportKey and wrapKey operations.
This specification places no normative requirements on how implementations handle key material once all references to it go away. That is, conforming user agents are not required to zeroize key material, and it may still be accessible on device storage or device memory, even after all references to the CryptoKey have gone away.
That is, a user agent may chose to discard the key data as soon as its CryptoKey becomes eligible for garbage collection, but may also choose to keep the data around longer, for instance until the entire browsing context is discarded upon navigating to a different page or closing the browser tab.
In practice, the difference is unlikely to matter: You can fit thousands if not millions of keys in the memory of any web-capable device, so exhausting memory due to delayed collection of key material is exceedingly unlikely. And since browser implementers have an incentive to keep memory usage low, most will choose to free key material upon collection of the CryptoKey.
Related
I'm reading a blog post regarding encapsulation, after reading it I think I get what encapsulation is and why we need to use it, but at the end, the author talks about what are the advantages of encapsulation and he lists "Decoupling Implementation Details" as one of the benefits of using encapsulation.
I'm quoting it here:
Since Encapsulation enables operating on data using a public interface, it is easier to update the implementation because the tight coupling is eliminated. This is an excellent way to always code against an interface. This also means that with Encapsulation, the object is scalable to accommodate future changes without breaking compatibility.
The Coupling means the extent to which two modules are dependent on each other. Decoupling refers to eliminating such dependencies.
I'm trying to work this out in my head, and thought that maybe a SO member could help me understand what he is talking about. Can someone explain this to me?
Thanks in advance!
Encapsulation is a conscious decision to give your code - your implementation of some functionality - a particular interface, such that the consumer of your implementation only assume what they have to.
I'll start with a non-JS example, but will show one at the end.
Hostnames are an interface. IP addresses are implementation details.
When you go to Stack Overflow, you go to stackoverflow.com, you don't go to 151.101.193.69. And if you did go to 151.101.193.69, you'd notice that it's a Fastly CDN address, not a Stack Overflow address.
It's likely that when Stack Overflow just started, it implemented its web access using its own server, on a different IP address, say, for example, 198.51.100.253.
If everyone bookmarked 198.51.100.253, then when Stack Overflow started using Fastly CDN, suddenly everyone who bookmarked it - millions of people - would have to adjust.
That is a case of broken compatibility, because those millions of people would have been coupled to the IP address 198.51.100.253.
By encapsulating the IP address 198.51.100.253 - the actual detail of the implementation of web access - behind only the thing that users need to know - the name of the website - stackoverflow.com - the public interface, Stack Overflow was able to migrate to Fastly CDN, and all those millions of users were none the wiser.
This was possible because all these users were not coupled to the IP address 198.51.100.253, so when it changed to 151.101.193.69, nobody was affected.
This principle applies in many areas. Here are some examples:
Energy: You pay for electricity. The supplier can provide it using coal, gas, diesel, nuclear energy, hydro, they can change it from one to the other, and you're none the wiser, you're not coupled to hydro, because your interface is the electric socket, not a generator.
Business: When an office building gets cleaning company to keep the building clean, they only have a contract with the company; They cleaners get hired and fired, their salary changes, but that's all encapsulated by the cleaning company and does not affect the building.
Money: You don't need money, you need food and shelter and clothes. But those are implementation details. The interface you export to your employer is money, so they don't have to pay you in food, and if you change your diet or style, they don't have to adjust what food or clothes they buy you.
Engineering: When an office building gets HVAC, and it breaks, the owner just calls the HVAC company, they don't try to fix it themselves. If they did, they void the warranty, because the HVAC company can't guarantee good product if someone else touches the HVAC. Their public interface is the maintenance contract and the HVAC user-facing controls - you're not allowed to access the implementation directly.
And of course, software: Let's say you have a distributed key-value store which has the following client API:
client = kv.connect("endpoint.my.db");
bucket = crc(myKey)
nodeId = bucket % client.nodeCount();
myValue = client.get(nodeId, bucket, myKey);
This interface:
allows the caller to directly and easily find the node which will store the key.
allows the caller to cache bucket information to further save calls.
allows the caller to avoid extra calls to map a key to a bucket.
However, it leaks a ton of implementation details into the interface:
the existence of buckets
the usage of CRC to map keys to buckets
the bucket distribution and rebalancing strategy - the usage of bucket % nodeCount as the logic to map buckets to nodes
the fact that buckets are owned by individual nodes
And now the caller is coupled with all these implementation details. If the maintainer of the DB wants to make certain changes, they will break all existing users. Examples:
Use CRC32 instead of CRC, presumably because it's faster. This would cause existing code to use the wrong bucket and/or node, failing the queries.
Instead of round-robin buckets, allocate buckets based on storage nodes' free space, free CPU, free memory, etc. - that breaks bucket % client.nodeCount() - likewise leads to wrong bucket/node and fails queries.
Allow multiple nodes to own a bucket - requests will still go to a single node.
Change the rebalancing strategy - if a node goes down, then nodeCount goes from e.g. 3 to 2, so all the buckes have to be rebalanced such that bucket % client.nodeCount() finds the right node for that bucket.
Allow reading from any node instead of the bucket owner - requests will still go to a single node.
To decouple the caller from the implementation, you don't allow them to cache anything, save calls, or assume anything:
client = kv.connect("endpoint.my.db");
myValue = client.get(myKey);
The caller doesn't know about CRC, buckets, or even nodes.
Here, the client has to do extra work to figure out which node to send the request to. Perhaps with Zookeeper or using a gossip protocol.
With this interface:
Hashing logic e.g. CRC isn't hard-coded in the caller, it's on the server side and changing it won't break the caller.
Any bucket distribution strategy is likewise only on the server side.
Any rebalancing logic is likewise not in the client.
Even other changes are possible by just upgrading the client, but not changing any code in the caller:
Allow multiple nodes to own a bucket
Read from any node (e.g. choosing the one with the lowest latency).
Switching from a Zookeeper-based node finding infrastructure to a gossip-based one.
Let's take the Map class as an example. it has an API for adding key/value entries to the map, getting the value for a key, etc. You don't know the details of how the Map stores its key/value pairs, you just know that if you use set with a key, then later do get with the same key, you'll get back the value. The detail of the implementation are hidden from you, and largely don't matter to your code.¹ That's an example of how encapsulation decouples the API from the implementation details.
¹ You do know a little bit about the implementation in this specific example: The specification say "Maps must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection." So you know that a Map isn't implemented as (say) an array of key/value pairs, because that implementation wouldn't offer sublinear access times on the number of elements. But you don't know if it's a hash table or B-tree or what. Those are implementation details.
I can't find any performance info in the documentation about converting between ArrayBuffers and Blobs in Javascript. I believe the standard methods are:
const blob = new Blob([arrayBuffer])
and
const resp = new Response(blob)
const ab = await resp.arrayBuffer()
Note that I'm only concerned with Blobs that are in-memory, not File-based (so no I/O time is involved.) I did some timings in a codepen: https://codepen.io/oberbrunner/pen/ExjWLWY
which shows that both of those operations scale in time according to the size of the ArrayBuffer, so they are likely copying the data.
In fact just creating a new Response(blob) takes longer with a bigger blob, if my timing code is accurate. Also, the new Blob.arrayBuffer() function also appears to be O(n), although it's faster than the Response way. But not yet well-supported by browsers in 2020.
So my question is: is there authoritative documentation on the expected performance of these operations? Am I just using them wrong? Are there faster (constant-time, zero-copy) ways?
At the risk of answering an older question, for posterity -- the authoritative documentation you are looking for would be the [relevant sections of] File API and ECMAScript-2021 specifications for Blob and ArrayBuffer respectively.
As far as I have been able to determine, neither specification mandates a particular allocation scheme for either class of object; while specification of ArrayBuffer by ECMAScript may appear to be more suggestive of a particular data allocation mechanism, it's only so within the scope of the specification itself -- I cannot find anything like "you must allocate the data on the heap"; while specification of Blob is purposefully even more abstract and vague with regard to where and how data comprising the "blob" is actually allocated if at all.
Most user agents, in part guided by these specifications, will fall somewhere between naively allocating data of corresponding ArrayBuffer length, on the heap, while constructed blob objects may be efficiently backed by memory mapped files (in case of File blobs, for instance) which will basically make them backed by page file on some operating system -- meaning no RAM reserved at all until so required (for some operation on the blob, like converting it to an ArrayBuffer).
So a conversion from a Blob to an ArrayBuffer, while technically not specified to be O(n), would be so in most user agents, since the latter class facilitates actual immediate read and write data access (by element index etc), while Blob does not allow any immediate data access itself.
Now, I said "most user agents", because technically you could design a very elaborate data access mechanism with so copy-on-write semantics where no data is allocated when obtaining an ArrayBuffer from a Blob (through any of the methods you described in your question or the more modern APIs like Blob's arrayBuffer method) before the corresponding data needs to be read and/or modified.
In other words, all your benchmarks only pertain to concrete implementations (read user agents you have tried) which are free to implement either class and operations related to it however they seem fit as long as they do not violate their corresponding specifications.
However, if enough people will start to employ aggressive conversion of blobs back and forth, there isn't anything stopping vendors like Mozilla or Google from optimizing their implementations into something like described above (copy-on-write) which may or may not make them into O(log(n)) operations, for instance. After all, JavaScript was for a very long time called "interpreted" language -- but today calling it an interpreted language would be a bit of a misnomer, what with Chrome's V8 and Firefox's SpiderMonkey compiling native code out of it, in the name of optimization. And, again, they are free to do so -- no applicable specification of the language or a related environment mandates it be "interpeted". Same with blobs and array buffers.
Unfortunately, in practice, this means we have to live with what actually runs our programs that use blobs and array buffers -- which incur a real O(n) cost when you need to do something useful with a blob, normally.
Before my re-entry in JavaScript (and related) I've done lots of ActionScript 3 and there they had a Dictionary object that had weak keys just like the upcoming WeakMap; but the AS3 version still was enumerable like a regular generic object while the WeakMap specifically has no .keys() or .values().
The AS3 version allowed us to rig some really interesting and usefull constructs but I feel the JS version is somewhat limited. Why is that?
If the Flash VM could do it then what is keeping browsers from doing same? I read how it would be 'non-deterministic' but that is sort of the point right?
Finally found the real answer: http://tc39wiki.calculist.org/es6/weak-map/
A key property of Weak Maps is the inability to enumerate their keys. This is necessary to prevent attackers observing the internal behavior of other systems in the environment which share weakly-mapped objects. Should the number or names of items in the collection be discoverable from the API, even if the values aren't, WeakMap instances might create a side channel where one was previously not available.
It's a tradeoff. If you introduce object <-> object dictionaries that support enumerability, you have two options with relation to garbage collection:
Consider the key entry a strong reference that prevents garbage collection of the object that's being used as a key.
Make it a weak reference that allows its keys to be garbage collected whenever every other reference is gone.
If you do #1 you will make it extremely easy to to shoot yourself in the foot by leaking large objects into memory all over the place. On the other hand, if you go with option #2, your key dictionary becomes dependent on the state of garbage collection in the application, which will inevitably lead to impossible to track down bugs.
I have significant garbage collection pauses. I'd like to pinpoint the objects most responsible for this collection before I try to fix the problem. I've looked at the heap snapshot on Chrome, but (correct me if I am wrong) I cannot seem to find any indicator of what is being collected, only what is taking up the most memory. Is there a way to answer this empirically, or am I limited to educated guesses?
In chrome profiles takes two heap snapshots, one before doing action you want to check and one after.
Now click on second snapshot.
On the bottom bar you will see select box with option "summary". Change it to "comparision".
Then in select box next to it select snaphot you want to compare against (it should automaticaly select snapshot1).
As the results you will get table with data you need ie. "New" and "Deleted" objects.
With newer Chrome releases there is a new tool available that is handy for this kind of task:
The "Record Heap Allocations" profiling type. The regular "Heap SnapShot" comparison tool (as explained in Rafał Łużyński answers) cannot give you that kind of information because each time you take a heap snapshot, a GC run is performed, so GCed objects are never part of the snapshots.
However with the "Record Heap Allocations" tool constantly all allocations are being recorded (that's why it may slow down your application a lot when it is recording). If you are experiencing frequent GC runs, this tool can help you identify places in your code where lots of memory is allocated.
In conjunction with the Heap SnapShot comparison you will see that most of the time a lot more memory is allocated between two snapshots, than you can see from the comparison. In extreme cases the comparison will yield no difference at all, whereas the allocation tool will show you lots and lots of allocated memory (which obviously had to be garbage collected in the meantime).
Unfortunately the current version of the tool does not show you where the allocation took place, but it will show you what has been allocated and how it is was retained at the time of the allocation. From the data (and possibly the constructors) you will however be able to identify your objects and thus the place where they are being allocated.
If you're trying to choose between a few likely culprits, you could modify the object definition to attach themselves to the global scope (as list under document or something).
Then this will stop them from being collected. Which may make the program faster (they're not being reclaimed) or slower (because they build up and get checked by the mark-and-sweep every time). So if you see a change in performance, you may have found the problem.
One alternative is to look at how many objects are being created of each type (set up a counter in the constructor). If they're getting collected a lot, they're also being created just as frequently.
Take a look at https://developers.google.com/chrome-developer-tools/docs/heap-profiling
especially Containment View
The Containment view is essentially a "bird's eye view" of your
application's objects structure. It allows you to peek inside function
closures, to observe VM internal objects that together make up your
JavaScript objects, and to understand how much memory your application
uses at a very low level.
The view provides several entry points:
DOMWindow objects — these are objects considered as "global" objects
for JavaScript code; GC roots — actual GC roots used by VM's garbage
collector; Native objects — browser objects that are "pushed" inside
the JavaScript virtual machine to allow automation, e.g. DOM nodes,
CSS rules (see the next section for more details.) Below is the
example of what the Containment view looks like:
I am building a webapp to edit some information from a database. The database is being displayed as a table with editing capabilities.
When editing a value, generally I have to validate and do some other tasks, depending on the value that's being edited.
Should I keep a copy as array of objects in memory and use their methods or should I store all the information I need (type of value, id, etc) somewhere in the html table (as attributes or hidden inputs) and get them using several functions?
Which would be best practice?
Is it risky to have many objects stored in memory (taking into account memory usage of the browser)?
storing moderate or large amount of data in memory as objects wont affect the performance with the modern systems. The main factor you should consider is CPU intensive DOM iteration and recurive operations.These takes much of a browser memory.
I preferred to use storing objects in memory rather than HTML hidden fields in many application. It works well and didnt find any performance bottlenecks.
I think you're describing a MVC, and it is considered best practice. However, the memory model of the view would typically be held on the server for security purposes.
It may not matter in your case (and I may be jumping to conclusions), but I would caution against trusting the client with all of the data and validation. You can modify everything in a page in real time with Firebug, so if that puts your app at risk, consider moving your memory model to the server.
whether you will run into memory troubles on client depends on how much data you will be holding at a time. Consider limiting the information returned to a certain number of records and paging through, you can then limit the amount of data to be held in memory or on the page.
I would expect that holding a information in-memory will give a better user experience than requiring constant calls back to a server, or into the DOM. It is probably easier from a programming perspective also
Just do whatever is simplest from a programming perspective. I wouldn't worry too much about memory usage for something like this, unless you're absolutely sure that it's causing problems.
You can address the memory usage of your application later, if and when it becomes an issue.
Most database editing tools e.g. PhpPgAdmin and PhpMyAdmin paginate results and only allow editing 1 row at a time. You can extend that to several without much fuss. As mentioned before remember to paginate.