This question already has answers here:
Are JavaScript Arrays actually implemented as arrays?
(2 answers)
How are JavaScript arrays implemented?
(8 answers)
Closed 2 years ago.
I am new to JavaScript and lately, I found out that arrays in JavaScript are like lists in Java and that they can contain different types of variables.
My question is if in JavaScript an array are made of pointers? How is it possible to have different types in the same array, because we must define the array size before we assign the variables?
I have tried to find some information on Google, but all I have found are examples on arrays ):
You do not have to define the array size before you assign the variables. You can go like:
let array = [];
array.push(12);
array.push("asd");
array.push({data:5});
array.forEach(element => {
console.log(element);
});
Also I think you should not think about pointers with such a high level language. The better way is to look at variables like 'primitives' and 'objects'. Here is a good read about it:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures
High level languages, and in particular scripting languages, tend to reference most things with pointers, and they make pointer access transparent. Javascript does this also. Most everything, even primitives like numbers and strings, are objects. Objects in javascript have properties that store things. Those properties are essentially pointers, in that they are references to other objects. Arrays are implemented in the same way, and are in fact objects with numeric properties (and a few utility methods a standard object doesn't have, such as .length, .push(), .map(), etc.). Arrays don't hav a fixed size anymore than objects do. So everything in javascript is stored in these object "buckets" that can store anything in their properties (although you can seal objects, like numbers and strings, so that they don't accidentally change).
Languages with fixed data types (C like languages for instance) implement things with fixed data structures, and the exact size is easily calculable and known. When you declare a variable, the compiler uses the type of that variable to reserve some space in memory. Javascript handles all that for you and doesn't assume anything is a fixed size, because it can't. The size of javascript objects can change at any time.
In C-Like languages, when you ask for an array, you are asking for a block of a specific size. The compiler needs to know how big that is so that it can determine where in memory to put everything, and it can use the type of objects in the array to easily calculate that. Interpreted languages use pointers behind the scenes to keep track of where everything is stored, because they can't assume it will always be in the same place, like a compiled program can. (This is somewhat of a simplification and there are caveats to this of course).
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array
JavaScript is a loosely typed language, Therefore there is noting stoping you from having different types in javascript array. but I would strongly avoid structuring your data that way without static type-checking (Typescript)
const test = ['test', {test:'test'}, 1, true]
Related
I am trying to write documentation on a piece of Javascript code but I am having trouble describing the objects made by the code in a concise and understandable way. It is especially difficult because the objects have nested objects (often multiple layers).
Is there any mathematics that involves things with keys and attached values?
If not, how best can I describe an object with multiple nest objects in a concise manner?
Note: Just showing an example of an object is not enough as the structure changes often. Also, there are mathematical relationships between the keys and the values (coupon dates as keys and coupon payments as values).
I would say that Javascript objects are functions or mappings, in that they map keys to values.
Beyond that, it is hard to compare... the domain can encompass numbers, and a subset of all strings. As simple as that is to say, I'm not sure what mathematical field (etc) the domain would be equivalent to!
The range would, of course, be worse, as values in the range can be numbers, strings, booleans, undefined, further objects, or functions. However, I think the concept of an object being a mapping is fairly intuitive.
This doesn't include the prototype style inheritance, but I'm not sure how deep you want to go...
I saw a comment on it earlier, JavaScript objects pretty much follow the associative array abstract data type, which is a mathematical concept by virtue since computer science is basically a subset of applied mathematics, but if you need a true mathematical representation there's relational algebra which was created for relational databases (close enough) and is essentially an extension of set theory... just remember math doesn't necessarily mean it's clear and concise – Patrick Barr yesterday
I know in python I can use lists in order to make fast sortings and dictionaries in order to search things faster (because immutable objects can be hashed). Is that the same for javascript too? I haven't seen anything about the performance of datatypes in javascript after much search.
Yes. "Object vs Arrays in Javascript is as Python's Dictionaries vs Lists".
Performance pros and cons are also the same. With lists being more efficient if numeric indexes are appropriate to the task and dictionaries being more efficient for long lists that must be accessed by a string.
var dict = {};
dict['apple'] = "a sweet edible fruit";
dict['boy'] = "a young male human";
var list = [];
list.push("apples");
list.push("oranges");
list.push("pears");
I have been looking for some bibliography and other sources that could answer this question. I Know that this isn't the best answer, but let me try an answer that involve some concepts that lend us to discuss this topic.
Javascript and inheritance
Although the it could suggest that arrays and objects in javascript are like lists and dictionaries, they are different because each language are written in different ways, with different underlying philosophies, concepts and purposes.
In the case of Javascript, it seems that both Arryas and Objects are more like hash tables. Contrary to the intuition, Arrays are just an other type of built in object of javascript. In fact, as they say in the ECMAScript Specification 6.1.7
The Object Type
An Object is logically a collection of properties. Each property is
either a data property, or an accessor property:
A data property associates a key value with an ECMAScript language
value and a set of Boolean attributes. An accessor property associates
a key value with one or two accessor functions, and a set of Boolean
attributes. The accessor functions are used to store or retrieve an
ECMAScript language value that is associated with the property.
Properties are identified using key values. A property key value is
either an ECMAScript String value or a Symbol value. All String and
Symbol values, including the empty string, are valid as property keys.
A property name is a property key that is a String value.
An integer index is a String-valued property key that is a canonical
numeric String (see 7.1.16) and whose numeric value is either +0 or a
positive integer ≤ 253 - 1. An array index is an integer index whose
numeric value i is in the range +0 ≤ i < 232 - 1.
Property keys are used to access properties and their values. There
are two kinds of access for properties: get and set, corresponding to
value retrieval and assignment, respectively. The properties
accessible via get and set access includes both own properties that
are a direct part of an object and inherited properties which are
provided by another associated object via a property inheritance
relationship. Inherited properties may be either own or inherited
properties of the associated object. Each own property of an object
must each have a key value that is distinct from the key values of the
other own properties of that object.
And,about the arrays, it specifies:
22.1Array Objects
Array objects are exotic objects that give special treatment to a certain class of property names.
Following the logic above, and as it says in the specification, the language was thinked in such way that all types in javascript extends a global object, and then new methods and properties are added to have differents behaivors.
Memory Management
There are a gap between the language specifications and how they must be implemented in an actual runtime enviroment. Altought each implementation has its own logics, it seems that most of them has similarities.
As This Article Explains:
Most JavaScript interpreters use dictionary-like structures (hash
function based) to store the location of object property values in the
memory. This structure makes retrieving the value of a property in
JavaScript more computationally expensive than it would be in a
non-dynamic programming language like Java or C#. In Java, all of the
object properties are determined by a fixed object layout before
compilation and cannot be dynamically added or removed at runtime
(well, C# has the dynamic type which is another topic). As a result,
the values of properties (or pointers to those properties) can be
stored as a continuous buffer in the memory with a fixed-offset
between each. The length of an offset can easily be determined based
on the property type, whereas this is not possible in JavaScript where
a property type can change during runtime.
As this make javascript kind of ineffitient, the engineers had to came with some clever workarounds in order to solve this problem. Following this other article:
If you access a property, e.g. object.y, the JavaScript engine looks
in the JSObject for the key 'y', then loads the corresponding property
attributes, and finally returns the [[Value]].
But where are these property attributes stored in memory? Should we
store them as part of the JSObject? If we assume that we’ll be seeing
more objects with this shape later, then it’s wasteful to store the
full dictionary containing the property names and attributes on the
JSObject itself, as the property names are repeated for all objects
with the same shape. That’s a lot of duplication and unnecessarily
memory usage. As an optimization, engines store the Shape of the
object separately.
This Shape contains all the property names and the attributes, except
for their [[Value]]s. Instead the Shape contains the offset of the
values inside of the JSObject, so that the JavaScript engine knows
where to find the values. Every JSObject with this same shape points
to exactly this Shape instance. Now every JSObject only has to store
the values that are unique to this object.
The benefit becomes clear when we have multiple objects. No matter how
many objects there are, as long as they have the same shape, we only
have to store the shape and property information once!
All JavaScript engines use shapes as an optimization, but they don’t
all call them shapes:
Academic papers call them Hidden Classes (confusing w.r.t. JavaScript classes)
V8 calls them Maps (confusing w.r.t. JavaScript Maps)
Chakra calls them Types (confusing w.r.t. JavaScript’s dynamic types and typeof)
JavaScriptCore calls them Structures
*SpiderMonkey calls them Shapes
Python
Arrays
Python uses a different aproach for the implementation of lists, it seems that lists are more like some dynamics arrays than an actual array that you could find in C, But they are sill are focussed on saving spaces of time and complexity in a runtime. As this FAQ cited form the PyDocs says:
Python’s list objects are really variable-length arrays, not
Lisp-style linked lists. The implementation uses a contiguous array of
references to other objects, and keeps a pointer to this array and the
array’s length in a list head structure.
This makes indexing a list (L[i]) an operation whose cost is
independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is
resized. Some cleverness is applied to improve the performance of
appending items repeatedly; when the array must be grown, some extra
space is allocated so the next few times don’t require an actual
resize.
Like javascript, Python's lists are not required to be homogeneous, so they are not an actual implementation of other "strong typed" data structures that does have to contain only the same entities such as integers, strings, etc.
Same as javascript, the specifications of the language the actual implementation are two separate things. Depending on if you are using Cpython, Jython, IronPython, etc, the memory management and the actual functions that runs behind the scenes will be making diferent things in the process of interpreting python to machine code.
I know that this isnt the best source, but as I found discussed in Quora:
Contrary to what their name implies, Python lists are actually arrays(...).
Specifically, they are dynamic arrays with exponential
over-allocation, which allows code like the following to have linear
complexity:
lst = []
for i in xrange(0, 100000):
lst.append(i)
Alternative implementations like Jython and IronPython seem to use
whatever native dynamic array class their underlying language
(respectively Java and C#) provides, so they have the same performance
characteristics (the precise underlying classes seem to be ArrayList
for Jython and C# List for IronPython).
(...)arrays technically store pointers rather than the objects
themselves, which allows the array to contain only elements of a
specific size. Having pointers all over the place in the underlying
implementation is a common feature of dynamically typed languages, and
in fact of any language that tries to pretend it doesn't have
pointers.
Dictionaries
As the official docs puts in their "History and Design FAQ"
CPython’s dictionaries are implemented as resizable hash tables.
Compared to B-trees, this gives better performance for lookup (the
most common operation by far) under most circumstances, and the
implementation is simpler.
Dictionaries work by computing a hash code for each key stored in the
dictionary using the hash() built-in function. The hash code varies
widely depending on the key; for example, “Python” hashes to
-539294296 while “python”, a string that differs by a single bit, hashes to 1142331976. The hash code is then used to calculate a
location in an internal array where the value will be stored. Assuming
that you’re storing keys that all have different hash values, this
means that dictionaries take constant time – O(1), in computer science
notation – to retrieve a key. It also means that no sorted order of
the keys is maintained, and traversing the array as the .keys() and
.items() do will output the dictionary’s content in some arbitrary
jumbled order.
In Conclution
There are two separate things about a language: one involves how it should work, with it syntax, semantics, logic and philosophy. On the other hand you have the actual implementation of that language in a specific runtime, interpreter or compilation.
This way, although (in theory) you have one Python or one Javascript, you could have CPython, IronPython Jython, etc; and in the other hand, you have SpiderMonkey, V8, etc.
But referring to how each runtime implements the language features of Arrays/Lists and Objects/Dictionaries and how analogous they are, it seems that Javascript has chosen a inheritance model based on prototypes that makes everithing a kind of object; so both Objects and Dictionaries are more like a hash table than an actual array.
On the other hand, Python has a more flavores in respect of data structures, both in their libraries and in how the interpreters deal with them, making use of arrays or dynamic arrays to bring to life the Pyton's Lists, and using hash tables for the dictionaries, making them more similar to the objects in javascript.
I've seen a lot of questions about the fastest way to access object properties (like using . vs []), but can't seem to find whether it's faster to retrieve object properties that are declared higher than others in object literal syntax.
I'm working with an object that could contain up to 40,000 properties, each of which is an Array of length 2. I'm using it as a lookup by value.
I know that maybe 5% of the properties will be the ones I need to retrieve most often. Is either of the following worth doing for increased performance (decreased lookup time)?
Set the most commonly needed properties at the top of the object literal syntax?
If #1 has no effect, should I create two separate objects, one with the most common 5% of properties, search that one first, then if the property isn't found there, then look through the object with all the less-common properties?
Or, is there a better way?
I did a js perf here: http://jsperf.com/object-lookup-perf
I basically injected 40000 props with random keys into an object, saved the "first" and "last" keys and looked them up in different tests. I was surprised by the result, because accessing the first was 35% slower than accessing the last entry.
Also, having an object of 5 or 40000 entries didn’t make any noticeable difference.
The test case can most likely be improved and I probably missed something, but there is a start for you.
Note: I only tested chrome
Yes, something like "indexOf" searches front to back, so placing common items higher in the list will return them faster. Most "basic" search algorithms are basic top down (simple sort) searches. At least for arrays.
If you have so many properties, they must be computed, no ? So you can replace the (string, most probably) computation by an integer hash computation, then use this hash in a regular array.
You might even use one single array by putting values in the 2*ith, 2*i+1th slot.
If you can use a typed array here, do it and you could no go faster.
Set the most commonly needed properties at the top of the object literal syntax?
No. Choose readability over performance. If you've got few enough properties that you use a literal in the code, it won't matter anyway; and you should order the properties in a logical sequence.
Property lookup in objects is usually based on hash maps, and position should not make a substantial difference. Depending on the implementation of the hash, they might be neglible slower, but I'd guess this is quite random and depends heavily on the applied optimisations. It should not matter.
If #1 has no effect, should I create two separate objects, one with the most common 5% of properties, search that one first, then if the property isn't found there, then look through the object with all the less-common properties?
Yes. If you've got really huge objects (with thousands of properties), this is a good idea. Depending on the used data structure, the size of the object might influence the lookup time, so if you've got a smaller object for the more frequent properties it should be faster. It's possible that different structures are chosen for the two objects, which could perform better than the single one - especially if you know beforehand in which object to look. However you will need to test this hypothesis with your actual data, and you should beware of premature [micro-]optimisation.
On the development of a compiler from a language very similar to JavaScript to C++, I need a way to represent data structures. JavaScript's main data structures are Arrays and Hash-Tables. Arrays are more straighforward: I can use a vector of untyped pointers. It needs to be a vector because JS arrays are dynamic, and of pointers because JS arrays can hold any kind of object, for example:
var array = [1,2,[3,4],"test"];
I can't see a way to represent this other than that (is there?). For the hashes, I could use something similar, except including the string hashing step on access.
The problem is: JavaScript hashes are JIT-compiled into actual C++ objects which probably are much faster than hashes. This way, I'm afraid my attempt to generate C++ like that will actually result in slower code than the JavaScript version!
Does that make sense?
What would be the best approach to my compiler?
If this is an AOT compiler you can only process the hash keys that you see at compile-time, obviously. In this case you can change hash accesses to known keys to array accesses, giving each known key a small integer as index.
I am using node.js as my server platform and I need to process a non sparse array of 65,000 items.
Javascript arrays are not true arrays, but actually hashes. Index access is accompagnied with conversion of the index to string and then doing a hash lookup. (see the Arrays section in http://www.crockford.com/javascript/survey.html).
So, my question is this. Does node.js implement a real array? The one that does cost us to resize or delete items, but with the true random access without any index-to-string-then-hash-lookup ?
Thanks.
EDIT
I may be asking for too much, but my array stores Javascript objects. Not numbers. And I cannot break it into many typed arrays, each holding number primitives or strings, because the objects have nested subobjects. Trying to use typed arrays will result in an unmaintainable code.
EDIT2
I must be missing something. Why does it have to be all or nothing? Either true Javascript with no true arrays or a C style extension with no Javascript benefits. Does having a true array of Javascript (untyped) objects contradicts the nature of Javascript in anyway? Java and C# have List<Object> which is essentially what I am looking for. C# even closer with List<DynamicObject>.
Node.js has the Javascript typed arrays: Int8Array, Uint8Array, Int16Array, Uint16Array, Int32Array, Uint32Array, Float32Array.
I think they are what you are asking for.
Node.js does offer a Buffer class that is probably what you're looking for:
A Buffer is similar to an array of integers but corresponds to a raw memory allocation outside the V8 heap. A Buffer cannot be resized.
Not intrinsically, no.
However depending on your level of expertise, you could write a "true" array extension using Node's C/C++ extension facility. See http://nodejs.org/api/addons.html
You want to use Low Level JavaScript (LLJS) to manipulate everything directly in C-style.
http://mbebenita.github.com/LLJS/
Notice that according to the link above, an LLJS array is more like the array you are looking for (true C-like array), rather than a Javascript array.
There is an implementation for LLJS in Node.js available , so maybe you do not have to write your own node.js C extension. Perhaps this implementation will do the trick: https://github.com/mbebenita/LLJS