Boolean vs Int in Javascript - javascript

I always assumed that booleans were more efficient than ints at storing an on/off value - considering that's their reason for existence. I recently decided to check if this is true with the help of jsperf, and it came up with some contrary results!
http://jsperf.com/bool-vs-int
Here is the first test I tried. Toggling the value of the on/off switch. On Chrome it's significantly faster to do this using 1/0, but on firefox it's slightly faster to do this using bool. Interesting.
http://jsperf.com/bool-vs-int-2
And here's the second test I tried. Using them in a conditional. This appears to have significant advantage for ints as opposed to bools, up to 70% faster to use 1/0 instead of booleans - on both firefox and chrome. Wtf?
I guess my question is, am I doing something wrong? Why are ints so much better at boolean's job? Is the only value of using bools clarity, or am I missing something important?

Disclaimer, I can only speak for Firefox, but I guess Chrome is similar.
First example (http://jsperf.com/bool-vs-int):
The Not operation
JägerMonkey (Spidmonkey's JavaScript methodjit) inlines the check for boolean first and then just xors, which is really fast (We don't know the type of a/b, so we need to check the type).
The second check is for int, so if a/b would be a int this would be a little bit slower.
Code
The Subtract operation.
We again don't know the type of c/d. And again you are lucky we are going to assume ints and inline that first. But because in JavaScript number operations are specified to be IEEE 754 doubles, we need to check for overflow. So the only difference is "sub" and a "conditional jump" on overflow vs. plain xor in case 1.
Code
Second example:
(I am not 100% sure about these, because I never really looked at this code before)
and 3. The If.
We inline a check for boolean, all other cases end up calling a function converting the value to a boolean.
Code
The Compare and If.
This one is a really complex case from the implementation point of view, because it was really important to optimize equality operations. So I think I found the right code, that seems to suggest we first check for double and then for integers.
And because we know that the result of a compare is always a boolean, we can optimize the if statement.
Code
Followup I dumped the generated machine code, so if you are still interested, here you go.
Overall this is just a piece in a bigger picture. If we knew what kind of type the variables had and knew that the subtraction won't overflow then we could make all these cases about equally fast.
These efforts are being made with IonMonkey or v8's Crankshaft. This means you should avoid optimizing based of this information, because:
it's already pretty fast
the engine developers take care of optimizing it for you
it will be even faster in the future.

your test was a bit off due to the definition of "function" and "var" and the call for the function. The cost to define function and variables and calling them will differ from engine to engine. I modified your tests, try to re-run with your browsers (note that IE was off because the first run was weird but consecutive runs were as expected where bool is fastest): http://jsperf.com/bool-vs-int-2/4

I don't know but in the second test it does
if(a) bluh();
vs
if(c == 1) bluh();
maybe c==1 is faster because you're comparing a value with one with the same type
but if you do if(a) then js need to check if the value evaluates to true, not just if it is true...
That could be the reason...
Maybe we need to test
if(c==1)
vs
if(a===true) with three =

For me the choice would be based on API usage. Always return that which is most useful. If I use secondary code, I'd favor methods that return booleans. This probably makes the code ready to be chained. The alternative is to provide overloaded methods.

Diggin' deep here. Regarding performance, I'm still unsure (this is why I found this thread) if booleans vs 0/1 is faster when computing and it still seems heavily browser-dependent. But take into account, that in extremely huge datasets the data has to be downloaded by the user first anyway:
"true" and "false" obv take up 4 or 5 characters respectively, whereas 0 and 1 are only 1 character. So it might save you a little bit of bandwidth at least, so less time to load and only after that is it up to the client's browser and hardware how to deal with those types, which seems pretty much negligible.
As a little bonus and to actually contribute something, since (I think?) no one mentioned it here, if you are going with the 0 and 1 approach, instead of using if-statements you can use bitwise operations to toggle between them, which should be pretty fast:
x=0;
x^=1; // 1
x^=1; // 0
This is the equivalence to using this toggle for booleans:
x=false;
x=!x; // true
x=!x; // false

Related

How to test an MD5 implementation?

I am considering using a JS MD5 implementation.
But I noticed that there are only a few tests. Is there a good way of verifying that implementation is correct?
I know I can try it with a few different values and see if it works, but that only means it is correct for some inputs. I would like to see if it is correct for all inputs.
The corresponding RFC has a good description of the algorithm, an example implementation in C, and a handful of test values at the end. All three together let you make a good guess about the quality of the examined implementation and that's all you can get: a good guess.
Testing an applications with an infinite or at least a very large input set as a black box is hard, impossible even in most cases. So you have to check if the code implements the algorithm correctly. The algorithm is described in RFC-3121 (linked to above). This description is sufficient for an implementation. The algorithm itself is well known (in the scientific sense, i.e.: many papers have been written about it and many flaws have been found) and simple enough to skip the formal part, just inspect the implementation.
Problems to expect with MD5 in JavaScript: input of one or more zero bytes (you can check the one and two bytes long inputs thoroughly), endianess (should be no problem but easy to check) and the problem of the unsigned integer used for bit-manipulation in JavaScript (">>" vs. ">>>" but also easy to check for). I would also test with a handful of data with all bits set.
The algorithm needs padding, too, you can check it with all possible input of length shorter than the limit.
Oh, and for all of you dismissing the MD5-hash: it still has its uses as a fast non-cryptographic hash with a low collision-rate and a good mixing (some call the effect of the mixing "avalanche", one bit change in the input changes many bits in the output). I still use it for larger, non-cryptographic Bloom-filters. Yes, one should use a special hash fitting the expected input but constructing such a hash function is a pain in the part of the body Nature gave us to sit on.

can my code determine if it needs a big number?

I am writing a method that must return a numeric value which is a result of an arithmetic operation applied to two input numbers.
If the operation results in an overflow then I need to use an existing big number implementation (specifically, https://github.com/MikeMcl/decimal.js/), if not then I need to return a built in Javascript Number.
Is it possible, in code, to determine that I have an overflow and I need a big number?
I think you will find it simpler to use the Big Number library from the start and then once you have the sum in the Big Number format, you can test it to see if it is small enough to fit in a regular Javascript Number and, if so, convert it to that and return that.
While this approach is slightly inefficient (involves extra conversions in some cases), it prevents you from having to predict whether the result of a math operation you haven't yet done is too big which can be kind of difficult to do and get right. Just using Big Number for the math operation guarantees that the math operation is correct and then lets you just see how big the result is and act accordingly.
If you were only doing addition or subtraction of two values, you could probably write a predictive function, but I'd be surprised if the extra effort to get this right was actually worth whatever savings there really was. Once you're doing a more complex math operation (multiple operands or multiplication or division), then you're going to need to re-implement part of the math operation in order to predict the size of the result.
For reference, my hierarchy of priorities in writing software is:
Correctness
Robustness (ability to deal with edge cases, unexpected input and any error cases)
Clarity and Maintainability of the code
Extensibility and Reusability
Performance (only when the performance is actually relevant)
Compactness
I will sacrifice some aspects of 3, 4 to improve performance, but only when I've proven that improving the performance of this particular piece of code is important to the goal of the project and only after measuring that the performance of this particular piece of code is actually the issue worth spending time on. I will never sacrifice 1 or 2 to improve performance. In your particular case, I'd look long and hard at whether the performance impact of using Big Number to do the math operation is really a problem worth sacrificing a number of other priorities for.
You could do it like this:
function multiply(a, b) {
var res = a * b;
if (isFinite(res)) {
return res;
}
return new Decimal(a).times(b);
}
multiply(2, 3); // returns number 6
multiply(2e200, 3e200); // returns BigNumber 6e+400
However, I think it may be a better idea to always return the same type of output (i.e. BigNumber), independent of the input. Right now when you use this function you always have to check what the returned result is (number or BigNumber), and act accordingly.

Method vs Basic JS? Should I use toString? parseInt? jQuery?

This is a question I've been wondering about ever since I found the toString() function, but have never bothered to ask. Should I use basic JS or the function that does the same thing?
Now, don't get me wrong, I realize toString has its redeeming qualities, like converting a function to a string.
var message = function() {
// multi
// line
// string
}.toString();
But admit it: we mainly use toString for converting numbers to strings. Couldn't we just do this instead?
var myNumber = 1234;
var message = ''+myNumber;
Not only is this shorter, but according to JSPerf the toString method is 97% slower! (Proof: http://jsperf.com/tostring-vs-basic-js ) And as I said, I know toString is useful, but when people raise question about types of Javascript variables, toString() usually comes up. And this is, like, basic Javascript. Every browser can do quotes.
Same goes for parseInt. Now, before I discovered parseInt, I discovered that multiplying a string by one would convert it to a number. That's because you cannot multiply a string, naturally, forcing Javascript to treat it as a number.
var message = "4321";
var myNumber = message*1;
Now, interestingly, this is slower than parseInt, but not by much. I also noticed that an empty string, or one without numbers, will return 0, whereas parseInt will return NaN because there are no numbers in the string. Once again, I realize parseInt is faster and can convert to different bases. However, multiplying is shorter, will work in any browser, and parseInt, remember, will only return integers. So why does it always come up as the answer to questions, asking how to convert to numbers/what NaN is?
Now this might be going a little bit off topic, but I actually wonder a similar thing about jQuery. Once again, jQuery is something I've never really understood the use for. Javascript code is clean and jQuery is in and of itself a JS file, so it cannot do anything Javascript can't do. It may simplify certain functions and stuff, but why not just copy those functions to your page then and leave out the remaining functions you don't use? It seems overkill to include jQuery merely to complete one simple task. And animation isn't excused either here - because that too can be done with native Javascript. So why jQuery?
Ultimately what I'm asking is, why do we need these things for these purposes when there are better methods? Or are they better methods? Is using functions a better practive just in general?
Not only is this shorter, but according to JSPerf the toString method is 97% slower!
Unless you're calling .toString() on hundreds of millions of numbers every second and you've found that this is a bottleneck in your application through profiling, this should not be a factor at all.
But admit it: we mainly use toString for converting numbers to strings
As you've seen, this can be done implicitly by just adding a string and a number together, so I fail to see any benefit of using '' + n in place of n.toString(). The latter is more readable when you're not actually concatenating n with a string.
However, multiplying is shorter, will work in any browser, and parseInt, remember, will only return integers.
Are you saying that parseInt doesn't work in every browser? If you want to parse something as an integer, use parseInt. If you want to parse something as a float (JavaScript doesn't actually have a special type for either, all numbers are floats), use parseFloat.
The more common pattern is using +'123', which has the exact same behavior as 1 * '123'. parseInt handles empty strings properly, but for whatever reason does not validate strings as you'd expect. The unary plus returns NaN in case of an error, but treats whitespace and empty strings incorrectly. It's one of JavaScript's shortcomings, so there's really no concrete choice between the two if you're working in base 10.
So why does it always come up as the answer to questions, asking how to convert to numbers/what NaN is?
Because the spec included these functions to convert strings into numbers and converting strings into numbers using binary operators like you're doing is a side effect, not the primary purpose. Also you can parse integers in different bases using parseInt, which isn't possible with type coercion.
It may simplify certain functions and stuff, but why not just copy those functions to your page then and leave out the remaining functions you don't use?
If you load jQuery from a CDN, then there's a really good chance that a user's browser has already downloaded it and has it cached, making download times and bloat almost nonexistent. If you make a "custom build", I'd bet that it'll make the site slower on first load.
And animation isn't excused either here - because that too can be done with native Javascript.
So can everything. There's no point in reinventing the wheel every time you write something.

tunable diff algorithm

I'm interested in finding a more-sophisticated-than-typical algorithm for finding differences between strings, that can be "tuned" via some parameters, to balance between such things as "maximize count of identical characters" vs. "maximize the length of spans" vs. "try to keep whole words intact".
Ultimately, I want to be able to make the results as human readable as possible. For instance, if a long sentence has been replaced with an entirely new sentence, where the only things it has in common with the original are the words "the" "and" and "a" in that order, I might want it treated as if the whole sentence is changed, rather than just that 4 particular spans are changed --- just like how a reasonable person would see it.
Does such a thing exist? Although I'm working in javascript/node.js, an algorithm in any language would be helpful.
I'm actually ok with something that uses Monte Carlo methods or the like, if its results are better. Computation time is not an issue (within reason), nor is determinism.
Note: although this is beyond the scope of what I'm asking, I'll throw one more thing out there just in case: It would also be great if it could recognize changes that are out of order....for instance if someone changes the order of two paragraphs while leaving them otherwise identical, it would be awesome if it recognized it as a simple move, rather than as one subtraction and and one unrelated addition.
I've had good luck with diff_match_patch. There are some good options for tuning it for readability.
Try http://prettydiff.com/ Its code is already formatted for compatibility with CommonJS, which is the framework Node uses.

what's more efficient? checking == or just mutating the variable?

Imagine I had a variable called X.
Let's say every 5 seconds I wanted to make X = true. (it could be either true or false in between these 5 seconds, but gets reset to true when the 5 seconds are up).
Would it be more efficient to check if the value is already true, then if not, reassign it to true? Or just have X = true?
in other words, which would run faster?
if(x==false){
x = true;
}
vs
x = true;
On one hand, the first program won't mutate the variable if it doesn't have to. On the other hand, the second program doesn't need to check what X is equal to; it dives straight in.
It nearly always doesn't matter. Write the code that is easiest to understand and maintain. Only optimize it if necessary.
The best way to be sure is to test it. Profile your code.
Which is faster might depend on the browser.
Which is faster depends on whether the variable is usually true or usually false.
Having said that, I'd guess in most scenarios setting a variable without testing it will be faster.
Really depends on your data :)
If x == false 90% of the time, then a straight assignment to x would be faster.
This is one of those places where you probably don't want to worry about efficiency, and if you really do, profile it ..
Disclaimer/Warning:
This is a micro-optimization, and will never affect the efficiency of your program in a way that is measurable by users. If you turn off all compiler optimizations, and run an excellent profiler, you may be able to quantify the effects - but no user will ever notice.
This is especially true for your situation, where the code in question is only run every few seconds. The time spent profiling would probably be better spent improving other parts of your application.
Also, in these situations readability should always prevail over non-bottleneck micro-optimizations (although my answer below takes only runtime efficiency into account, as requested). Therefore my recommended code for you to use in this situation is x=true, since it's the easiest to read and understand.
Finally, if adding the check will improve speed, the compiler probably already knows that and will do it for you, so you can't go wrong with x=true (that's why you should turn off optimizations before running the profiler).
Answer:
The only true way to figure this out is by profiling. You may find that the 0 test (x==false) basically takes no time at all, and therefore it is worth including due to the time it saves when x turns out to be true. Or you may find that the test takes long enough that it wastes too much time when x turns out to be false.
My guess is that the test is unecessary. That's because 0-testing and other bitwise operations (and, or, etc) are all so fast that I usually treat them as taking the same elementary amount of time. And if 0-testing takes the same amount of time as an OR operation (setting to true), then the 0-test is a redundant waste of time. Profiling could prove me wrong of course, and my guess is based on loose assumptions about bitwise operations, so if you choose to run a profiler and figure this out I'd definitely be interested in the results.
The efficiency your are trying to attain by this is minute compared the efficiency attained by the quality of your overall design.

Categories

Resources