Replacing `String.prototype.valueOf` in JavaScript

Replacing `String.prototype.valueOf` in JavaScript - javascript

I'm building a DSL which would benefit from being able to hack some JS internals. I understand this is a very bad idea in general JS usage, but for my purposes it's okay. The following code works fine:
var str = new String("blah");
str.valueOf = function() { return 10 }
console.log(str * 10); // outputs 100
But this doesn't:
var str = "blah";
str.valueOf = function() { return 10 }
console.log(str * 10); // outputs NaN (because str === "blah")
Can someone who understands the internals a bit explain what's happening here? What's the underlying difference between these two examples?
And now what if I want to change the String prototype itself, so I can set the valueOf method of all strings, no matter when/where/how they are created? Is this possible? Unfortunately this doesn't seem to work:
String.prototype.valueOf = function() { return 10 }
console.log("blah" * 10); // NaN
Though this does:
String.prototype.valueOf = function() { return 10 }
console.log("blah".valueOf() * 10); // 100
And so does this:
String.prototype.valueOf = function() { return 10 }
console.log(new String("blah") * 10); // 100
Why does the JS engine treat "blah" and new String("blah") differently? Thanks!
By the way, here is a good article that sort of led me to explore this stuff.

When you do
var str = "blah";
you're creating a string primitive, but when you do
var str = new String("blah");
you're invoking the constructor and creating a String object.
When you have an object, javascript internally calls valueOf when trying to use that object where a primitive should be inserted.
For primitives it's the opposite, to be able to chain on methods, javascript needs an object, and internally primitives are wrapped with new String when calling object methods on the primitive.
In other words, when you have a string primitive, and you try to call str.valueOf, javascript will internally do new String(str) before it calls valueOf and returns the value.
However when you try to use the string direcly you still just have the primitive, and valueOf isn't called, the primitive value is inserted direcly.
From MDN
Note that JavaScript distinguishes between String objects and
primitive string values. (The same is true of Boolean and Numbers.)
String literals (denoted by double or single quotes) and strings
returned from String calls in a non-constructor context (i.e., without
using the new keyword) are primitive strings.
JavaScript automatically
converts primitives to String objects, so that it's possible to use
String object methods for primitive strings.
In contexts where a
method is to be invoked on a primitive string or a property lookup
occurs, JavaScript will automatically wrap the string primitive and
call the method or perform the property lookup.

Related

Can we say that String is an object in Javascript?

I'm always confused when I hear strings are primitives in JS, because everybody knows that string has different methods like: length, indexOf, search etc.
let string = "Please locate where 'locate' occurs!";
let pos = str.lastIndexOf("locate");
let position = str.search("locate");

It's true that everything in JavaScript is just like object because we can call methods on it. When we use new keyword with string it becomes an object otherwise it's primitive type.
console.log(typeof new String('str')); //object
console.log(typeof 'str'); //string
Now whenever we try to access any property of the string it box the the primitive value with new String()
'str'.indexOf('s')
is equivalent to
(new String(str)).indexOf('s').
The above process is called as "Boxing". "Boxing" is wrapping an object around a primitive value.

Strings are not objects, they are native types, like numbers, but if you want to access the method on it they are boxed with String object. The same happen with numbers you can call (10).toString() but in fact you're calling toString on Number instance that wraps 10, when you call the method.

Not certainly in that way.
If you try to use a class method for a primitive, the primitive will be temporarily wrapped in an object of the corresponding class ("boxing") and the operation will be performed on this object.
For example,
1..a = 2;
It's equal to:
num = new Object(1.);
num.a = 2;
delete num;
So, after executing the operation, the wrapper will be destroyed.
However, the primitive itself will remain unchanged:
console.log( 1..a ) // undefined

Every data type is a object in JavaScript.
String, array, null, undefined . Everything is a object in JavaScript

Number vs new Number internal implementation

I understand that writing
var x = Number("7"); // makes a number, a primitive
var y = new Number("7"); // makes a Number object
and I'm aware of the usual cautions against option 2, but what is going on behind the scenes? I was under the impression if a function is a constructor, it should not return a value via a return statement: it just sets up its implicit object, this, and returns it.
So how come Number, String, and Boolean constructors are able to return either a primitive or an object? Does the JavaScript engine parse those expressions differently as a special feature in the language? Or can a programmer also "overload" a constructor function to either return a primitive or an object, depending on whether the constructor is called with/without "new"?

Using the constructor the Number object will be an object, using the function Number instead will return the conversion of an object to its representation as a numeric value.
So, basically within the Number object, the function is validating how was called. This is possible by checking the object this.
Something interesting is coercing the Number object to a numeric value as follow:
var x = Number("7"); // makes a number, a primitive
var y = new Number("7");
console.log(x === +y)
Go and read about the specification
https://www.ecma-international.org/ecma-262/5.1/#sec-15.7

It has nothing to do with syntax and there's nothing special about those constructors. The Number() and other constructors simply test to see whether this is bound before proceeding.
You can do it too:
function MyConstructor() {
if (!this) return new MyConstructor();
// stuff ...
}
Now calling MyConstructor() will behave exactly like new MyConstructor().
Also, a constructor can return something. If it returns an object, then that's used instead of the implicitly-constructed object that new creates. Thus you could also implement a "new-is-optional" constructor another way:
function MyConstructor() {
let object = this || Object.create(MyConstructor.prototype);
// stuff ...
return object;
}
So the Number() constructor takes a different tack. In most runtimes it may or may not actually be implemented as JavaScript, but if it were it might look something like this:
function Number(n) {
if (this) {
// invoked with "new"
this.fantasyValueSetter(n); // cannot really do this
return this;
}
return +n; // plain number primitive if not invoked with "new"
}

What is the difference between String and new String? [duplicate]

Taken from MDN
String literals (denoted by double or single quotes) and strings
returned from String calls in a non-constructor context (i.e., without
using the new keyword) are primitive strings. JavaScript automatically
converts primitives to String objects, so that it's possible to use
String object methods for primitive strings. In contexts where a
method is to be invoked on a primitive string or a property lookup
occurs, JavaScript will automatically wrap the string primitive and
call the method or perform the property lookup.
So, I thought (logically) operations (method calls) on string primitives should be slower than operations on string Objects because any string primitive is converted to string Object (extra work) before the method being applied on the string.
But in this test case, the result is opposite. The code block-1 runs faster than the code block-2, both code blocks are given below:
code block-1 :
var s = '0123456789';
for (var i = 0; i < s.length; i++) {
s.charAt(i);
}
code block-2 :
var s = new String('0123456789');
for (var i = 0; i < s.length; i++) {
s.charAt(i);
}
The results varies in browsers but the code block-1 is always faster. Can anyone please explain this, why the code block-1 is faster than code block-2.

JavaScript has two main type categories, primitives and objects.
var s = 'test';
var ss = new String('test');
The single quote/double quote patterns are identical in terms of functionality. That aside, the behaviour you are trying to name is called auto-boxing. So what actually happens is that a primitive is converted to its wrapper type when a method of the wrapper type is invoked. Put simple:
var s = 'test';
Is a primitive data type. It has no methods, it is nothing more than a pointer to a raw data memory reference, which explains the much faster random access speed.
So what happens when you do s.charAt(i) for instance?
Since s is not an instance of String, JavaScript will auto-box s, which has typeof string to its wrapper type, String, with typeof object or more precisely s.valueOf(s).prototype.toString.call = [object String].
The auto-boxing behaviour casts s back and forth to its wrapper type as needed, but the standard operations are incredibly fast since you are dealing with a simpler data type. However auto-boxing and Object.prototype.valueOf have different effects.
If you want to force the auto-boxing or to cast a primitive to its wrapper type, you can use Object.prototype.valueOf, but the behaviour is different. Based on a wide variety of test scenarios auto-boxing only applies the 'required' methods, without altering the primitive nature of the variable. Which is why you get better speed.

This is rather implementation-dependent, but I'll take a shot. I'll exemplify with V8 but I assume other engines use similar approaches.
A string primitive is parsed to a v8::String object. Hence, methods can be invoked directly on it as mentioned by jfriend00.
A String object, in the other hand, is parsed to a v8::StringObject which extends Object and, apart from being a full fledged object, serves as a wrapper for v8::String.
Now it is only logical, a call to new String('').method() has to unbox this v8::StringObject's v8::String before executing the method, hence it is slower.
In many other languages, primitive values do not have methods.
The way MDN puts it seems to be the simplest way to explain how primitives' auto-boxing works (as also mentioned in flav's answer), that is, how JavaScript's primitive-y values can invoke methods.
However, a smart engine will not convert a string primitive-y to String object every time you need to call a method. This is also informatively mentioned in the Annotated ES5 spec. with regard to resolving properties (and "methods"¹) of primitive values:
NOTE The object that may be created in step 1 is not accessible outside of the above method. An implementation might choose to avoid the actual creation of the object. [...]
At very low level, Strings are most often implemented as immutable scalar values. Example wrapper structure:
StringObject > String (> ...) > char[]
The more far you're from the primitive, the longer it will take to get to it. In practice, String primitives are much more frequent than StringObjects, hence it is not a surprise for engines to add methods to the String primitives' corresponding (interpreted) objects' Class instead of converting back and forth between String and StringObject as MDN's explanation suggests.
¹ In JavaScript, "method" is just a naming convention for a property which resolves to a value of type function.

In case of string literal we cannot assign properties
var x = "hello" ;
x.y = "world";
console.log(x.y); // this will print undefined
Whereas in case of String Object we can assign properties
var x = new String("hello");
x.y = "world";
console.log(x.y); // this will print world

String Literal:
String literals are immutable, which means, once they are created, their state can't be changed, which also makes them thread safe.
var a = 's';
var b = 's';
a==b result will be 'true' both string refer's same object.
String Object:
Here, two different objects are created, and they have different references:
var a = new String("s");
var b = new String("s");
a==b result will be false, because they have different references.

If you use new, you're explicitly stating that you want to create an instance of an Object. Therefore, new String is producing an Object wrapping the String primitive, which means any action on it involves an extra layer of work.
typeof new String(); // "object"
typeof ''; // "string"
As they are of different types, your JavaScript interpreter may also optimise them differently, as mentioned in comments.

When you declare:
var s = '0123456789';
you create a string primitive. That string primitive has methods that let you call methods on it without converting the primitive to a first class object. So your supposition that this would be slower because the string has to be converted to an object is not correct. It does not have to be converted to an object. The primitive itself can invoke the methods.
Converting it to an full-blown object (which allows you to add new properties to it) is an extra step and does not make the string oeprations faster (in fact your test shows that it makes them slower).

I can see that this question has been resolved long ago, there is another subtle distinction between string literals and string objects, as nobody seems to have touched on it, I thought I'd just write it for completeness.
Basically another distinction between the two is when using eval. eval('1 + 1') gives 2, whereas eval(new String('1 + 1')) gives '1 + 1', so if certain block of code can be executed both 'normally' or with eval, it could lead to weird results

The existence of an object has little to do with the actual behaviour of a String in ECMAScript/JavaScript engines as the root scope will simply contain function objects for this. So the charAt(int) function in case of a string literal will be searched and executed.
With a real object you add one more layer where the charAt(int) method also are searched on the object itself before the standard behaviour kicks in (same as above). Apparently there is a surprisingly large amount of work done in this case.
BTW I don't think that primitives are actually converted into Objects but the script engine will simply mark this variable as string type and therefore it can find all provided functions for it so it looks like you invoke an object. Don't forget this is a script runtime which works on different principles than an OO runtime.

The biggest difference between a string primitive and a string object is that objects must follow this rule for the == operator:
An expression comparing Objects is only true if the operands reference
the same Object.
So, whereas string primitives have a convenient == that compares the value, you're out of luck when it comes to making any other immutable object type (including a string object) behave like a value type.
"hello" == "hello"
-> true
new String("hello") == new String("hello") // beware!
-> false
(Others have noted that a string object is technically mutable because you can add properties to it. But it's not clear what that's useful for; the string value itself is not mutable.)

The code is optimized before running by the javascript engine.
In general, micro benchmarks can be misleading because compilers and interpreters rearrange, modify, remove and perform other tricks on parts of your code to make it run faster.
In other words, the written code tells what is the goal but the compiler and/or runtime will decide how to achieve that goal.
Block 1 is faster mainly because of:
var s = '0123456789'; is always faster than
var s = new String('0123456789');
because of the overhead of object creation.
The loop portion is not the one causing the slowdown because the chartAt() can be inlined by the interpreter.
Try removing the loop and rerun the test, you will see the speed ratio will be the same as if the loop were not removed. In other words, for these tests, the loop blocks at execution time have exactly the same bytecode/machine code.
For these types of micro benchmarks, looking at the bytecode or machine code wil provide a clearer picture.

we can define String in 3-ways
var a = "first way";
var b = String("second way");
var c = new String("third way");
// also we can create using
4. var d = a + '';
Check the type of the strings created using typeof operator
typeof a // "string"
typeof b // "string"
typeof c // "object"
when you compare a and b var
a==b ( // yes)
when you compare String object
var StringObj = new String("third way")
var StringObj2 = new String("third way")
StringObj == StringObj2 // no result will be false, because they have different references

In Javascript, primitive data types such is string is a non-composite building block. This means that they are just values, nothing more:
let a = "string value";
By default there is no built-in methods like toUpperCase, toLowerCase etc...
But, if you try to write:
console.log( a.toUpperCase() ); or console.log( a.toLowerCase() );
This will not throw any error, instead they will work as they should.
What happened ?
Well, when you try to access a property of a string a Javascript coerces string to an object by new String(a); known as wrapper object.
This process is linked to concept called function constructors in Javascript, where functions are used to create new objects.
When you type new String('String value'); here String is function constructor, which takes an argument and creates an empty object inside the function scope, this empty object is assigned to this and in this case, String supplies all those known built in functions we mentioned before. and as soon as operation is completed, for example do uppercase operation, wrapper object is discarded.
To prove that, let's do this:
let justString = 'Hello From String Value';
justString.addNewProperty = 'Added New Property';
console.log( justString );
Here output will be undefined. Why ?
In this case Javascript creates wrapper String object, sets new property addNewProperty and discards the wrapper object immediately. this is why you get undefined. Pseudo code would be look like this:
let justString = 'Hello From String Value';
let wrapperObject = new String( justString );
wrapperObject.addNewProperty = 'Added New Property'; //Do operation and discard

adding properties to primitive data types other than Array

I'm not supposed to add elements to an array like this:
var b = [];
b.val_1 = "a";
b.val_2 = "b";
b.val_3 = "c";
I can't use native array methods and why not just an object. I'm just adding properties to the array, not elements. I suppose this makes them parallel to the length property. Though trying to reset length (b.length = "a string") gets Uncaught RangeError: Invalid array length.
In any case, I can still see the properties I've set like this:
console.log(b); //[val_1: "a", val_2: "b", val_3: "c"]
I can access it using the dot syntax:
console.log(b.val_1); //a
If an array is just an object in the same way a string or a number is an object, why can't (not that I'd want to) I attach properties to them with this syntax:
var num = 1;
num.prop_1 = "a string";
console.log(num); //1
I cannot access its properties using dot syntax
console.log(num.prp); //undefined
Why can this be done with array and not with other datatypes. For all cases, I should use {} and would only ever need to use {}, so why have arrays got this ability?
JSBIN

Because arrays are treated as Objects by the language, you can see this by typing the following line of code:
console.log(typeof []) // object
But other types like number literals, string literals NaN ... etc are primitive types and are only wrapped in their object reprsentation in certain contexts defined by the language.
If you want to add properties or methods to a number like that, then you can use the Number constructor like this:
var num = new Number(1);
num.prop_1 = "fdadsf";
console.log(num.prop_1);
Using the Number constructor returns a number object which you can see by typing the following line:
console.log(typeof num); // object
While in the first case:
var num = 1;
console.log(typeof num) // number
EDIT 2: When you invoke a method on a number literal or string literal for instance, then that primitive is wrapped into its object representation automatically by the language for the method call to take place, for example:
var num = 3;
console.log(num.toFixed(3)); // 3.000
Here num is a primitive variable, but when you call the toFixed() metohd on it, it gets wrapped to a Number object so the method call can take place.
EDIT: In the first case, you created a string like this first var str = new String(), but then you changed it to str = "asdf" and then assigned the property str.var_1 = "1234".
Of course, this won't work, because when you assigned str = "asdf", str became a primitive type and the Object instance that was originally created is now gone, and you can't add properties to primitives.
In the second, it didn't output undefined like you said, I tested it in Firebug and everything worked correctly.
EDIT 3:
String literals (denoted by double or single quotes) and strings returned from String calls in a non-constructor context (i.e., without using the new keyword) are primitive strings.
This is taken from MDN Documentation, when you use string like that var p = String(3) it becomes a conversion function and not a constructor and it returns a primitive string as you can see from the quote above.
Regarding your second comment, I didn't understand how my comment has been defied, because if you try to console.log(p.pr) you'll get undefined which proves p is a primitive type and not an object.

If an array is just an object in the same way a string or a number is an object,
An array is different than strings, numbers, booleans, null and undefined. An array IS an object, while the others in the list are primitive values. You can add properties to the array just like you would with any other object, anything different being just what makes arrays special (the length property you mentioned for example). You cannot add properties or call methods on primitive values.
In this previous answer i talked about the use of Object wrappers over primitive values. Feel free to read it to see more about Object wrappers. The following will be a very short example:
console.log('TEST'.toLowerCase()) // 'test'
While it may seem that the toLowerCase method is called on the string 'TEST', in fact the string is converted automatically to a String object and the toLowerCase method is called on the String object.
Each time a property of the string, number or boolean is called, a new Object wrapper of the apropriate type is created to get or set that value (setting properties might be optimized away entirely by the browser, i am not sure about that).
As per your example:
var num = 1; // primitive value
num.prop_1 = "a string"; // num is converted to a Number, the prop_1 property is set on the Object which is discarded immediately afterwards
console.log(num); //1 // primitive value
console.log(num.prp); // num is converted to a Number, which doesn't have the prp property
My example:
Number.prototype.myProp = "works!";
String.prototype.myFunc = function() { return 'Test ' + this.valueOf() };
Boolean.prototype.myTest = "Done";
console.log(true.myTest); // 'Done'
console.log('really works!'.myFunc()); // 'Test really works!'
var x = 3;
console.log(x.myProp.myFunc()); // 'Test works!'
console.log(3['myProp']); // 'works!'
On the other hand:
console.log(3.myProp); // SyntaxError: Unexpected token ILLEGAL
The number isn't necessarily treated differently, that syntax just confuses the parser. The following example should work:
console.log(3.0.myProp); // 'works!'

String object versus literal - modifying the prototype?

I'm wondering why it seems that adding a method to the prototype of a string literal seems to work, but adding a property does not? I was playing with ideas in relation to this question, and have the following code:
String.prototype._str_index1 = 0;
String.prototype._str_reset = function() {
this._str_index1 = 0;
};
String.prototype._str_substr = function(len) {
var ret = this.substr(this._str_index1, len);
this._str_index1 = this._str_index1 + len;
return ret;
};
var testString = new String('Loremipsumdolorsitamet,consectetur');
log(testString._str_substr(5));
log(testString._str_substr(4));

This works fine. If however I change the third-last line to:
var testString = 'Loremipsumdolorsitamet,consectetur';
...it seems that although the method _str_substr exists and is callable on the string literal, the value of the property _str_index1 is always 0.
What's up?

The string primitive is converted to a transient String object every time you try to invoke a method of the String object (the JavaScript engine internally converts a string primitive to a String object when necessary). After this function returns, the String object is (unobtrusively) converted back to a string primitive (under the hood) and this new primitive is returned (and most of the time assigned to a variable); every time a method of the String object is invoked.
So, after each invocation of testString._str_substr, _str_index1 is thrown away with the object and a new object (with a reset _str_index1) is created when _str_substr is called again.
See also MDC:
Because JavaScript automatically converts between string primitives and String objects, you can call any of the methods of the String object on a string primitive. JavaScript automatically converts the string primitive to a temporary String object, calls the method, then discards the temporary String object.

This happens because the object is created and immediately thrown away when the assignment is made, because it's a string literal.
So with the first version, an object is created and kept, so testString is an object, not a string literal. In the second case, an object is created and thrown away, so all properties get lost...
Now try replacing that line with this:
var testString = 'Loremipsumdolorsitamet,consectetur'._str_substr();
Interesting, right? It still returns a string primitive, but that could be fixed...
String.prototype._str_substr = function(len) {
var ret = this.substr(this._str_index1, len);
this._str_index1 = this._str_index1 + len;
return new String(ret);
};
Of course these are just suggestions designed to help explain why literals act differently than objects, not real-world recommendations...

Develop Reference

JavaScript is the programming language of the Web.