Why aren't strings mutable? [duplicate]

Why aren't strings mutable? [duplicate] - javascript

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why can't strings be mutable in Java and .NET?
Why .NET String is immutable?
Several languages have chosen for this, such as C#, Java, and Python. If it is intended to save memory or gain efficiency for operations like compare, what effect does it have on concatenation and other modifying operations?

Immutable types are a good thing generally:
They work better for concurrency (you don't need to lock something that can't change!)
They reduce errors: mutable objects are vulnerable to being changed when you don't expect it which can introduce all kinds of strange bugs ("action at a distance")
They can be safely shared (i.e. multiple references to the same object) which can reduce memory consumption and improve cache utilisation.
Sharing also makes copying a very cheap O(1) operation when it would be O(n) if you have to take a defensive copy of a mutable object. This is a big deal because copying is an incredibly common operation (e.g. whenever you want to pass parameters around....)
As a result, it's a pretty reasonable language design choice to make strings immutable.
Some languages (particularly functional languages like Haskell and Clojure) go even further and make pretty much everything immutable. This enlightening video is very much worth a look if you are interested in the benefits of immutability.
There are a couple of minor downsides for immutable types:
Operations that create a changed string like concatenation are more expensive because you need to construct new objects. Typically the cost is O(n+m) for concatenating two immutable Strings, though it can go as low as O(log (m+n)) if you use a tree-based string data structure like a Rope. Plus you can always use special tools like Java's StringBuilder if you really need to concatenate Strings efficiently.
A small change on a large string can result in the need to construct a completely new copy of the large String, which obviously increases memory consumption. Note however that this isn't usually a big issue in garbage-collected languages since the old copy will get garbage collected pretty quickly if you don't keep a reference to it.
Overall though, the advantages of immutability vastly outweigh the minor disadvantages. Even if you are only interested in performance, the concurrency advantages and cheapness of copying will in general make immutable strings much more performant than mutable ones with locking and defensive copying.

It's mainly intended to prevent programming errors. For example, Strings are frequently used as keys in hashtables. If they could change, the hashtable would become corrupted. And that's just one example where having a piece of data change while you're using it causes problems. Security is another: if you checking whether a user is allowed to access a file at a given path before executing the operation they requested, the string containing the path better not be mutable...
It becomes even more important when you're doing multithreading. Immutable data can be safely passed around between threads while mutable data causes endless headaches.
Basically, immutable data makes the code that works on it easier to reason about. Which is why purely functional languages try to keep everything immutable.

In Java not only String but all primitive Wrapper classes (Integer, Double, Character etc) are immutable. I am not sure of the exact reason but I think these are the basic data types on which all the programming schemes work. If they change, things could go wild. To be more specific, I'll use an example: Say you have opened a socket connection to a remote host. The host name would be a String and port would be Integer. What if these values are modified after the connection is established.
As far as performance is concerned, Java allocates memory to these classes from a separate memory section called Literal Pool, and not from stack or Heap. The Literal Pool is indexed and if you use a string "String" twice, they point to the same object from Literal pool.

Having strings as immutable also allows the new string references easy, as the same/similar strings will be readily available from the pool of the Strings previously created. Thereby reducing the cost of new object creation.

Related

How are arrays implemented in JavaScript? What happened to the good old lists?

JavaScript provides a variety of data structures to be used ranging from simple objects over arrays, sets, maps, the weak variants as well as ArrayBuffers.
Over the half past year I found myself in the spot to recreate some of the more common structures like Dequeues, count maps and mostly different variants of trees.
While looking at the Ecma specification I could not find a description on how arrays implemented on a memory level, supposedly this is up to the underlying engine?
Contrary to languages I am used to, arrays in JavaScript have a variable length, similar to list. Does that mean that elements are not necessarily aligned next to each other in memory? Does a splice push and pop actually result in new allocation if a certain threshold is reached, similar to for example ArrayLists in Java? I am wondering if arrays are the way to go for queues and stacks or if actual list implementations with references to the next element might be suited in JavaScript in some cases (e.g. regarding overhead opposed to the native implementation of arrays?).
If someone has some more in-depth literature, please feel encouraged to link them here.

While looking at the Ecma specification I could not find a description on how arrays implemented on a memory level, supposedly this is up to the underlying engine?
The ECMAScript specification does not specify or require a specific implementation. That is up to the engine that implements the array to decide how best to store the data.
Arrays in the V8 engine have multiple forms based on how the array is being used. A sequential array with no holes that contains only one data type is highly optimized into something similar to an array in C++. But, if it contains mixed types or if it contains holes (blocks of the array with no value - often called a sparse array), it would have an entirely different implementation structure. And, as you can imagine it may be dynamically changed from one implementation type to another if the data in the array changes to make it incompatible with its current optimized form.
Since arrays have indexed, random access, they are not implemented as linked lists internally which don't have an efficient way to do random, indexed access.
Growing an array may require reallocating a larger block of memory and copying the existing array into it. Calling something like .splice() to remove items will have to copy portions of the array down to the lower position.
Whether or not it makes more sense to use your own linked list implementation for a queue instead of an array depends upon a bunch of things. If the queue gets very large, then it may be faster to deal with the individual allocations of a list so avoid having to copy large portions of the queue around in order to manipulate it. If the queue never gets very large, then the overhead of a moving data in an array is small and the extra complication of a linked list and the extra allocations involved in it may not be worth it.
As an extreme example, if you had a very large FIFO queue, it would not be particularly optimal as an array because you'd be adding items at one end and removing items from the other end which would require copying the entire array down to insert or remove an item from the bottom end and if the length changed regularly, the engine would probably regularly have to reallocate too. Whether or not that copying overhead was relevant in your app or not would need to be tested with an actual performance test to see if it was worth doing something about.
But, if your queue was always entirely the same data type and never had any holes in it, then V8 can optimize it to a C++ style block of memory and when calling .splice() on that to remove an item can be highly optimized (using CPU block move instructions) which can be very, very fast. So, you'd really have to test to decide if it was worth trying to further optimize beyond an array.
Here's a very good talk on how V8 stores and optimizes arrays:
Elements Kinds in V8
Here are some other reference articles on the topic:
How do JavaScript arrays work under the hood
V8 array source code
Performance tips in V8
How does V8 optimize large arrays

JS Optimization/Performance: Comparing objects using JSON.stringify

I'm currently building a small application is Vanilla JS (without any dependencies like lodash/jquery) and I needed to compare two objects to check for equality in keys and values. I was just wondering about how to optimize this problem.
The keys of both objects are in the same order as they are derived from the same method. According to this answer, the fastest and most efficient way to do this is using JSON.stringify(object1) === JSON.stringify(object2).
However, in my app, if the two objects are not equal, then I loop through the two of them and perform some operations. The problem is that these operations are pretty performance heavy and run occasionally. I need to optimize my solution.
Therefore, I was wondering if JSON.stringify runs some sort of for loop internally as well. In my application, it is more likely for the two objects to be unequal. Therefore, if JSON.stringify also runs some sort of for loop, I could just remove the check and run the operations I need right away (which will only cause a difference in the program if the two objects are unequal) saving time and making it more optimized. If I don't do this, then I will technically be running two for loops for the exact same purpose when the two objects are unequal and running one for loop either way when the two objects are equal. If JSON.stringify is some sort of for loop internally, then I can just one for loop no matter if the objects are equal. Am I making sense here? Please let me know if you don't understand something. Is this check useless and should I remove it to optimize my code?

Your question touches 4 different areas:
The implementation (and thus performance) of JSON.stringify
The implementation (and thus performance) of object iteration
The quality and performance of the JIT compiler
The speed of memory allocation (JSON.stringify is a memory hog for big objects)
So it is quite clear, that there is now "Universal" answer for all JS engines and OSes.
I recommend you do checks your in code ... why?
While right now the order of attributes might be constant, future maintenance to your codebase might change that and introduce a hard to track down bug.
It is good practice to create an isEqual method for all object types you use
It is better readable.
Ofcourse there are also disadvantages:
Your code will become bigger (this might be linked to better readable)
ANything else I might have forgotten.

Javascript Object Big-O

Coming from Java, Javascript object reminds me of HashMap in Java.
Javascript:
var myObject = {
firstName: "Foo",
lastName: "Bar",
email: "foo#bar.com"
};
Java:
HashMap<String, String> myHashMap = new HashMap<String, String>();
myHashMap.put("firstName", "Foo");
myHashMap.put("lastName", "Bar");
myHashMap.put("email", "foo#bar.com");
In Java HashMap, it uses the hashcode() function of the key to determine the bucket location (entries) for storage, and retrieval. Majority of the time, for basic operations such as put() and get(), the performance is constant time, until a hash collision occurs which becomes O(n) for these basic operations because it forms a linked list in order to store the collided entries.
My question is:
How does Javascript stores object?
What is the performance of operations?
Will there ever be any collision or other scenarios which will degrade the performance like in Java
Thanks!

Javascript looks like it stores things in a map, but that's typically not the case. You can access most properties of an object as if they were an index in a map, and assign new properties at runtime, but the backing code is much faster and more complicated than just using a map.
There's nothing requiring VMs not to use a map, but most try to detect the structure of the object and create an efficient in-memory representation for that structure. This can lead to a lot of optimizations (and deopts) while the program is running, and is a very complicated situation.
This blog post, linked in the question comments by #Zirak, has a quite good discussion of the common structures and when VMs may switch from a struct to a map. It can often seem unpredictable, but is largely based on a set of heuristics within the VM and how many different objects it believes it has seen. That is largely related to the properties (and their types) of return values, and tends to be centered around each function (especially constructor functions).
There are a few questions and articles that dig into the details (but are hopefully still understandable without a ton of background):
slow function call in V8 when using the same key for the functions in different objects
Why is getting a member faster than calling hasOwnProperty?
http://mrale.ph/blog/2013/08/14/hidden-classes-vs-jsperf.html (and the rest of this blog)
The performance varies greatly, based on the above. Worst case should be a map access, best case is a direct memory access (perhaps even a deref).
There are a large number of scenarios that can have performance impacts, especially given how the JITter and VM will create and destroy hidden classes at runtime, as they see new variations on an object. Suddenly encountering a new variant of an object that was presumed to be monomorphic before can cause the VM to switch back to a less-optimal representation and stop treating the object as an in-memory struct, but the logic around that is pretty complicated and well-covered in this blog post.
You can help by making sure objects created from the same constructor tend to have very similar structures, and making things as predictable as possible (good for you, maintenance, and the VM). Having known properties for each object, set types for those properties, and creating objects from constructors when you can should let you hit most of the available optimizations and have some awfully quick code.

What is the best way to compile JavaScript-like structures to static, fast C++?

On the development of a compiler from a language very similar to JavaScript to C++, I need a way to represent data structures. JavaScript's main data structures are Arrays and Hash-Tables. Arrays are more straighforward: I can use a vector of untyped pointers. It needs to be a vector because JS arrays are dynamic, and of pointers because JS arrays can hold any kind of object, for example:
var array = [1,2,[3,4],"test"];
I can't see a way to represent this other than that (is there?). For the hashes, I could use something similar, except including the string hashing step on access.
The problem is: JavaScript hashes are JIT-compiled into actual C++ objects which probably are much faster than hashes. This way, I'm afraid my attempt to generate C++ like that will actually result in slower code than the JavaScript version!
Does that make sense?
What would be the best approach to my compiler?

If this is an AOT compiler you can only process the hash keys that you see at compile-time, obviously. In this case you can change hash accesses to known keys to array accesses, giving each known key a small integer as index.

Does assigning a new string value create garbage that needs collecting?

Consider this javascript code:
var s = "Some string";
s = "More string";
Will the garbage collector (GC) have work to do after this sort of operation?
(I'm wondering whether I should worry about assigning string literals when trying to minimize GC pauses.)
e: I'm slightly amused that, although I stated explicitly in my question that I needed to minimize GC, everyone assumed I'm wrong about that. If one really must know the particular details: I've got a game in javascript -- it runs fine in Chrome, but in Firefox has semi-frequent pauses, that seem to be due to GC. (I've even checked with the MemChaser extension for Firefox, and the pauses coincide exactly with garbage collection.)

Yes, strings need to be garbage-collected, just like any other type of dynamically allocated object. And yes, this is a valid concern as careless allocation of objects inside busy loops can definitely cause performance issues.
However, string values are immutable (non-changable), and most modern JavaScript implementations use "string interning", that is they store only one instance of each unique string value. This means that if you have something like this...
var s1 = "abc",
s2 = "abc";
...only one instance of "abc" will be allocated. This only applies to string values, not String objects.
A couple of things to keep in mind:
Functions like substring, slice, etc. will allocate a new object for each function call (if called with different parameters).
Even though both variable point to the same data in memory, there are still two variables to process when the GC cycle runs. Having too many local variables can also hurt you as each of them will need to be processed by the GC, adding overhead.
Some further reading on writing high-performance JavaScript:
https://developer.mozilla.org/en-US/docs/JavaScript/Memory_Management
https://www.scirra.com/blog/76/how-to-write-low-garbage-real-time-javascript
http://jonraasch.com/blog/10-javascript-performance-boosting-tips-from-nicholas-zakas

Yes, but unless you are doing this in a loop millions of times it won't likely be a factor for you to worry about.

As you already noticed, JavaScript is not JavaScript. It runs on different platforms and thus will have different performance characteristics.
So the definite answer to the question "Will the GC have work to do after this sort of operation?" is: maybe. If the script is as short as you've shown it, then a JIT-Compiler might well drop the first string completely. But there's no rule in the language definition that says it has to be that way or the other way. So in the end it's like it is all too often in JavaScript: you have to try it.
The more interesting question might also be: how can you avoid garbage collection. And that is try to minimize the allocation of new objects. Games typically have a pretty constant amount of objects and often there won't be new objects until an old one gets unused. For strings this might be harder as they are immutable in JS. So try to replace strings with other (mutable) representations where possible.

Yes, the garbage collector will have a string object containing "Some string" to get rid of. And, in answer to your question, that string assignment will make work for the GC.
Because strings are immutable and are used a lot, the JS engine has a pretty efficient way of dealing with them. You should not notice any pauses from garbage collecting a few strings. The garbage collector has work to do all the time in the normal course of javascript programming. That's how it's supposed to work.
If you are observing pauses from GC, I rather doubt it's from a few strings. There is more likely a much bigger issue going on. Either you have thousands of objects needing GC or some very complicated task for the GC. We couldn't really speculate on that without study of the overall code.
This should not be a concern unless you were doing some enormous loop and dealing with tens of thousands of objects. In that case, one might want to program a little more carefully to minimize the number of intermediate objects that are created. But, absent that level of objects, you should first right clear, reliable code and then optimize for performance only when something has shown you that there is a performance issue to worry about.

To answer your question "I'm wondering whether I should worry about assigning string literals when trying to minimize GC pauses": No.
You really don't need to worry about this sort of thing with regard to garbage collection.
GC is only a concern when creating & destroying huge numbers of Javascript objects, or large numbers of DOM elements.

Develop Reference

JavaScript is the programming language of the Web.