In the book Professional Javascript for Web Developers i read that primitive wrappers are used internally by JavaScript when trying to access properties and methods of primitive objects. Does that mean that each time i try to access the length property on a string primitive the value is recalculated? My gut tells me that since strings are fixed then their length value is stored somewhere and only accessed by the wrapper, but i'd rather be sure.
By specification, yes (§11.2.1, §8.7.1, §9.9, §15.5.5).
Still that does not mean an actual implementation will create string objects in the memory, this is surely optimized.
I think that's true, primitive wrappers are created on the fly when you try to access properties of primitive values, like this:
"foo".length; // behaves as new String('foo').length
Not only the length is calculated on the moment you try to access the property, but a whole new object is created too (that object is what actually contains the property). The wrapper is then discarded immediately.
If you're worried about performance, don't be. There's rarely a case when you must use a primitive wrapper object, and their performance seems to be orders of magnitude slower than just using the primitive values (see test). Let the interpreter care about optimization.
Related
I'm reading Professional JavaScript for Web Developers 3rd ed. and in the summary of chapter 4 one can read:
Two types of values can be stored in JavaScript variables: primitive values and reference values.
Primitive values have one of the five primitive data types: Undefined, Null, Boolean, Number, and String. Primitive and reference values have the following characteristics:
Primitive values are of a fixed size and so are stored in memory on the stack.
But I can have different strings, say:
var a = "ABC";
// or
var b = "Some very irritatingly long string..."
They clearly differ in size, so how can they be allocated on the stack?
I believe the same question can be asked about numbers...
So I am for sure missing something important here.
Can someone explain why strings/numbers are of fixed size and how they can be stored on stack?
Strings (and usually numbers) are not of fixed size, and are not stored in their entirety on the stack, but within the language they behave as if they could be stored on the stack.
It's up to the one implementing the language to decide how to store the data internally. Often the data is stored in different ways depending on the value.
Although numbers in JavaScript always behave as double precision floating point numbers, usually numbers are stored differently when they happen to be integer values. Some JavaScript engines uses unused double values as integer values, some others store integers in the value itself and double values on the heap.
For strings some of the data can be stored in an item on the stack, for example the length and a reference to the string content stored on the heap. For short strings the characters could fit in the value in the stack in place of the reference, and thus need no extra data on the heap.
Primitive values are of a fixed size and so are stored in memory on the stack.
This seems wrong on several levels.
First, as you point out, they are not of a fixed size.
Second, even if they were, that is not necessarily a reason for storing them on the "stack".
Third, I don't even know what the "stack" is. Generally, "stack" is a term used in the context of compiled languages, most often referring to a list of invocation frames containing local variables. How JS engines store information is a matter of their internal implementation. They may use stack-like constructs, or not, or use them for some things, and not other things, or use one heap, or many heaps, or stacks containing things that point into a heap. In any case, the traditional notion of "stack" does not apply to the extent that JS supports lexical closures that require maintaining variable bindings after a function completes executing.
In any case, for the JS programmer, worrying about stacks and heaps is somewhere between meaningless and distracting. It's more important to understand the behavior of various types of values.
I was reading through code submitted by someone on a coding forum. I came across something like 100[0] in his code. I expected JavaScript to throw an error but to my surprise it is undefined. I have known the bracket notation to be used with objects, arrays and strings but it is quite strange that it doesn't throw when used with a number.
Is this similar to invoking a method on a string literal like "FOO".toLowerCase() where the string is first coerced to an object and then toLowerCase is invoked? Can someone explain to me what is going on here?
While numbers like 12 are primitives in javascript, you can still access properties on it. Take a look at the Number API Reference for what kinds of properties a number has.
For example, (2).toString() will yield '2'. So will 2['toString'](). This should hopefully help explain why such syntax is still valid.
What makes this possible is a technique called "boxing". Most of the time when working with numbers, we aren't accessing properties or methods on the number, so the runtime will just use primitive values to keep things quick. But, the moment we try to access a property like toString() on the number, then a temporary instance will automatically be created for that number so that you can access the property.
This is mostly an invisible process, but artifacts of it are noticeable in the language. For example, comparing two primitives, 2 === 2, is true. But if we force boxing to happen with the Number constructor, new Number(2) === new Number(2), then we'll be comparing two number object, and these objects follow the same rules as any other objects, so this will actually evaluate to false. (Note that we can only observe these kinds of behaviors by force creating the number instances, the auto-boxing will never happen for anything but equality checks).
Why is it that in Javascript, methods that relate to numbers are stored in the Math object? For example, to round you would need to do Math.round(5.2)
On the other hand, for strings I can just do “hello”.toUpperCase()
Why not String.toUpperCase(“hello”)
Like why cant we do 2.5..round() like how we would do "hello".toUpperCase()
Is there a reason why things are organized this way?
We can't do something like:
2.toString();
directly, but we can:
console.log((2).toString());
This is suspicious to me... so i think javascript convert object literals in a specific generic type, something like String, and then makes the operation that is needed according the String method given.
In your example you said that Math object stores methods for making convenient number transformations, like parsing it to an int, parsing it to a float and etc.
But this is not exactly true, because we have also functions like:
window.parseInt();
window.parseFloat();
that are focused in.
And also we have the generic Number class which allows us to create a generic number as the java Integer which is different of java primitive int, so this clarify a bit what is going on here.
I think javascript have some methods for string and String and also number and Number but they are not the same, one is generic and the other is primitive, but... if this where true, the primitive objects should never have methods, so we can assume three possibilities:
For object literals methods javascript converts the primitive object (if it's a primitive object) to a generic object and uses the generic object method, or uses an static generic class method.
Primitive objects in javascript aren't primitive they are generic.
Primitive objects have been constructed with some attached methods to it.
I will not make a confirmation about what is, because i don't know really if some of these posibilities are the right, but in my opinion i think the first one is the more accurate of how javascript stores and uses object literal methods.
The object1 == object2 operation checks to see if the references are the same.
A lot of times we want to check if the objects structure (properties, values and even methods) are the same. We have to implement a isEqual() function for ourselves or use an external library.
Why isn't it just added to the javascript ECMA standard, like JSON.stringify() was?
Is there a specific reason?
For what I can gather, this hasn't been implemented because objects can have very different structures and only in very simple object structures consisting of name:value like obj = {name:"value",age="anotherValue"} the isEqual(obj1,obj2) would be useful.
Although I think it should be implemented nevertheless.
Most probably because there is no obvious way to determine what exactly makes two objects equal.
For example, you could check that they have the same property names with the same values. However,
These values can be objects, should their equality be loosely checked recursively? Then, what should be done with cyclic references?
Should only enumerable properties be checked, or all of them?
Should only string properties be checked, or also symbols?
Should only properties be checked, or also internal slots?
If not all internal slots are checked, which ones? For example, should the [[Prototype]] values in ordinary objects be compared, or maybe call the [[GetPrototypeOf]] method? Should all function internal slots be compared, or otherwise how would you determine function equality?
Should only the property values be compared, or also the configurability, writability and enumerability?
What about accessor properties? Should getters be called and compare the returned values, or compare the getters and setters themselves?
What about proxy objects, which may return a different set of properties each time you ask them?
There is no best answer to these questions. For each person, object equality might mean different things. So they can write a function which checks exactly what they want.
I know in python I can use lists in order to make fast sortings and dictionaries in order to search things faster (because immutable objects can be hashed). Is that the same for javascript too? I haven't seen anything about the performance of datatypes in javascript after much search.
Yes. "Object vs Arrays in Javascript is as Python's Dictionaries vs Lists".
Performance pros and cons are also the same. With lists being more efficient if numeric indexes are appropriate to the task and dictionaries being more efficient for long lists that must be accessed by a string.
var dict = {};
dict['apple'] = "a sweet edible fruit";
dict['boy'] = "a young male human";
var list = [];
list.push("apples");
list.push("oranges");
list.push("pears");
I have been looking for some bibliography and other sources that could answer this question. I Know that this isn't the best answer, but let me try an answer that involve some concepts that lend us to discuss this topic.
Javascript and inheritance
Although the it could suggest that arrays and objects in javascript are like lists and dictionaries, they are different because each language are written in different ways, with different underlying philosophies, concepts and purposes.
In the case of Javascript, it seems that both Arryas and Objects are more like hash tables. Contrary to the intuition, Arrays are just an other type of built in object of javascript. In fact, as they say in the ECMAScript Specification 6.1.7
The Object Type
An Object is logically a collection of properties. Each property is
either a data property, or an accessor property:
A data property associates a key value with an ECMAScript language
value and a set of Boolean attributes. An accessor property associates
a key value with one or two accessor functions, and a set of Boolean
attributes. The accessor functions are used to store or retrieve an
ECMAScript language value that is associated with the property.
Properties are identified using key values. A property key value is
either an ECMAScript String value or a Symbol value. All String and
Symbol values, including the empty string, are valid as property keys.
A property name is a property key that is a String value.
An integer index is a String-valued property key that is a canonical
numeric String (see 7.1.16) and whose numeric value is either +0 or a
positive integer ≤ 253 - 1. An array index is an integer index whose
numeric value i is in the range +0 ≤ i < 232 - 1.
Property keys are used to access properties and their values. There
are two kinds of access for properties: get and set, corresponding to
value retrieval and assignment, respectively. The properties
accessible via get and set access includes both own properties that
are a direct part of an object and inherited properties which are
provided by another associated object via a property inheritance
relationship. Inherited properties may be either own or inherited
properties of the associated object. Each own property of an object
must each have a key value that is distinct from the key values of the
other own properties of that object.
And,about the arrays, it specifies:
22.1Array Objects
Array objects are exotic objects that give special treatment to a certain class of property names.
Following the logic above, and as it says in the specification, the language was thinked in such way that all types in javascript extends a global object, and then new methods and properties are added to have differents behaivors.
Memory Management
There are a gap between the language specifications and how they must be implemented in an actual runtime enviroment. Altought each implementation has its own logics, it seems that most of them has similarities.
As This Article Explains:
Most JavaScript interpreters use dictionary-like structures (hash
function based) to store the location of object property values in the
memory. This structure makes retrieving the value of a property in
JavaScript more computationally expensive than it would be in a
non-dynamic programming language like Java or C#. In Java, all of the
object properties are determined by a fixed object layout before
compilation and cannot be dynamically added or removed at runtime
(well, C# has the dynamic type which is another topic). As a result,
the values of properties (or pointers to those properties) can be
stored as a continuous buffer in the memory with a fixed-offset
between each. The length of an offset can easily be determined based
on the property type, whereas this is not possible in JavaScript where
a property type can change during runtime.
As this make javascript kind of ineffitient, the engineers had to came with some clever workarounds in order to solve this problem. Following this other article:
If you access a property, e.g. object.y, the JavaScript engine looks
in the JSObject for the key 'y', then loads the corresponding property
attributes, and finally returns the [[Value]].
But where are these property attributes stored in memory? Should we
store them as part of the JSObject? If we assume that we’ll be seeing
more objects with this shape later, then it’s wasteful to store the
full dictionary containing the property names and attributes on the
JSObject itself, as the property names are repeated for all objects
with the same shape. That’s a lot of duplication and unnecessarily
memory usage. As an optimization, engines store the Shape of the
object separately.
This Shape contains all the property names and the attributes, except
for their [[Value]]s. Instead the Shape contains the offset of the
values inside of the JSObject, so that the JavaScript engine knows
where to find the values. Every JSObject with this same shape points
to exactly this Shape instance. Now every JSObject only has to store
the values that are unique to this object.
The benefit becomes clear when we have multiple objects. No matter how
many objects there are, as long as they have the same shape, we only
have to store the shape and property information once!
All JavaScript engines use shapes as an optimization, but they don’t
all call them shapes:
Academic papers call them Hidden Classes (confusing w.r.t. JavaScript classes)
V8 calls them Maps (confusing w.r.t. JavaScript Maps)
Chakra calls them Types (confusing w.r.t. JavaScript’s dynamic types and typeof)
JavaScriptCore calls them Structures
*SpiderMonkey calls them Shapes
Python
Arrays
Python uses a different aproach for the implementation of lists, it seems that lists are more like some dynamics arrays than an actual array that you could find in C, But they are sill are focussed on saving spaces of time and complexity in a runtime. As this FAQ cited form the PyDocs says:
Python’s list objects are really variable-length arrays, not
Lisp-style linked lists. The implementation uses a contiguous array of
references to other objects, and keeps a pointer to this array and the
array’s length in a list head structure.
This makes indexing a list (L[i]) an operation whose cost is
independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is
resized. Some cleverness is applied to improve the performance of
appending items repeatedly; when the array must be grown, some extra
space is allocated so the next few times don’t require an actual
resize.
Like javascript, Python's lists are not required to be homogeneous, so they are not an actual implementation of other "strong typed" data structures that does have to contain only the same entities such as integers, strings, etc.
Same as javascript, the specifications of the language the actual implementation are two separate things. Depending on if you are using Cpython, Jython, IronPython, etc, the memory management and the actual functions that runs behind the scenes will be making diferent things in the process of interpreting python to machine code.
I know that this isnt the best source, but as I found discussed in Quora:
Contrary to what their name implies, Python lists are actually arrays(...).
Specifically, they are dynamic arrays with exponential
over-allocation, which allows code like the following to have linear
complexity:
lst = []
for i in xrange(0, 100000):
lst.append(i)
Alternative implementations like Jython and IronPython seem to use
whatever native dynamic array class their underlying language
(respectively Java and C#) provides, so they have the same performance
characteristics (the precise underlying classes seem to be ArrayList
for Jython and C# List for IronPython).
(...)arrays technically store pointers rather than the objects
themselves, which allows the array to contain only elements of a
specific size. Having pointers all over the place in the underlying
implementation is a common feature of dynamically typed languages, and
in fact of any language that tries to pretend it doesn't have
pointers.
Dictionaries
As the official docs puts in their "History and Design FAQ"
CPython’s dictionaries are implemented as resizable hash tables.
Compared to B-trees, this gives better performance for lookup (the
most common operation by far) under most circumstances, and the
implementation is simpler.
Dictionaries work by computing a hash code for each key stored in the
dictionary using the hash() built-in function. The hash code varies
widely depending on the key; for example, “Python” hashes to
-539294296 while “python”, a string that differs by a single bit, hashes to 1142331976. The hash code is then used to calculate a
location in an internal array where the value will be stored. Assuming
that you’re storing keys that all have different hash values, this
means that dictionaries take constant time – O(1), in computer science
notation – to retrieve a key. It also means that no sorted order of
the keys is maintained, and traversing the array as the .keys() and
.items() do will output the dictionary’s content in some arbitrary
jumbled order.
In Conclution
There are two separate things about a language: one involves how it should work, with it syntax, semantics, logic and philosophy. On the other hand you have the actual implementation of that language in a specific runtime, interpreter or compilation.
This way, although (in theory) you have one Python or one Javascript, you could have CPython, IronPython Jython, etc; and in the other hand, you have SpiderMonkey, V8, etc.
But referring to how each runtime implements the language features of Arrays/Lists and Objects/Dictionaries and how analogous they are, it seems that Javascript has chosen a inheritance model based on prototypes that makes everithing a kind of object; so both Objects and Dictionaries are more like a hash table than an actual array.
On the other hand, Python has a more flavores in respect of data structures, both in their libraries and in how the interpreters deal with them, making use of arrays or dynamic arrays to bring to life the Pyton's Lists, and using hash tables for the dictionaries, making them more similar to the objects in javascript.