In a team project we need to remove the first element of an array, thus I called Array.prototype.shift(). Now a guy saw Why is pop faster than shift?, thus suggested to first reverse the array, pop, and reverse again, with Array.prototype.pop() and Array.prototype.reverse().
Intiutively this will be slower (?), since my approach takes O(n) I think, while the other needs O(n), plus O(n). Of course in asymptotic notation this will be the same. However notice the verb I used, think!
Of course I could write some, use jsPerf and benchmark, but this takes time (in contrast with deciding via the time complexity signs (e.g. a O(n3) vs O(n) algorithm).
However, convincing someone when using my opinion is much harder than pointing him to the Standard (if it refered to complexity).
So how to find the Time Complexity of these methods?
For example in C++ std::reverse() clearly states:
Complexity
Linear in half the distance between first and last: Swaps elements.
how to find the Time Complexity of these methods?
You cannot find them in the Standard.
ECMAScript is a Standard for scripting languages. JavaScript is such a language.
The ECMA specification does not specify a bounding complexity. Every JavaScript engine is free to implement its own functionality, as long as it is compatible with the Standard.
As a result, you have to benchmark with jsPerf, or even look at the souce code of a specific JavaScript Engine, if you would like.
Or, as robertklep's comment mentioned:
"The Standard doesn't mandate how these methods should be implemented. Also, JS is an interpreted language, so things like JIT and GC may start playing a role depending on array sizes and how often the code is called. In other words: benchmarking is probably your only option to get an idea on how different JS engines perform".
There are further evidence for this claim ([0], [1], [2]).
An undisputable method is to do a comparative benchmark (provided you do it correctly and the other party is not bad faith). Don't care about theoretical complexities.
Otherwise you will spend a hard time convincing someone who doesn't see the obvious: a shift moves every element once, while two reversals move it twice.
By the way, a shift is optimal, as every element has to move at least once. And if properly implemented as a memmov, it is very fast (while a single reversal cannot be as fast).
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
It seems that the phrase "Premature Optimization" is the buzz-word of the day. For some reason, iphone programmers in particular seem to think of avoiding premature optimization as a pro-active goal, rather than the natural result of simply avoiding distraction. The problem is, the term is beginning to be applied more and more to cases that are completely inappropriate.
For example, I've seen a growing number of people say not to worry about the complexity of an algorithm, because that's premature optimization (eg Help sorting an NSArray across two properties (with NSSortDescriptor?)). Frankly, I think this is just laziness, and appalling to disciplined computer science.
But it has occurred to me that maybe considering the complexity and performance of algorithms is going the way of assembly loop unrolling, and other optimization techniques that are now considered unnecessary.
What do you think? Are we at the point now where deciding between an O(n^n) and O(n!) complexity algorithm is irrelevant? What about O(n) vs O(n*n)?
What do you consider "premature optimization"? What practical rules do you use to consciously or unconsciously avoid it?
EDIT
I know my description is a bit general, but I'm interested in specific, practical rules or best practices people use to avoid "pre-mature optimization", particularly on the iphone platform.
Answering this requires you to first answer the question of "what is pre-mature optimization?". Since that definition clearly varies so greatly, any meaningful answer requires the author to define the term. That's why I don't really think this is a CW question. Again, if people disagree, I'll change it.
What is premature optimization?
Premature optimization is the process of optimizing your code (usually for performance) before you know whether or not it is worthwhile to do so. An example of premature optimization is optimizing the code before you have profiled it to find out where the performance bottleneck is. An even more extreme example of premature optimization is optimizing before you have run your program and established that it is running too slowly.
Are we at the point now where deciding between an O(n^n) and O(n!) complexity algorithm is irrelevant? What about O(n) vs O(n*n)?
It depends on the size of n and how often your code will get called.
If n is always less than 5 then the asymptotic performance is irrelevant. In this case the size of the constants will matter more. A simple O(n * n) algorithm could beat a more complicated O(n log n) algorithm for small n. Or the measurable difference could be so small that it doesn't matter.
I still think that there are too many people that spend time optimizing the 90% of code that doesn't matter instead of the 10% that does. No-one cares if some code takes 10ms instead of 1ms if that code is hardly ever called. There are times when just doing something simple that works and moving on is a good choice, even though you know that the algorithmic complexity is not optimal.
Every hour you spend optimizing rarely called code is one hour less that you can spend on adding features people actually want.
My vote goes for most people optimize what they think is the weak point, but they don't profile.
So regardless of how well you know algorithms and regardless of how well you've written your code, you don't know what else is happening outside your module. What do the APIs you've called do behind the scenes? Can you always gaurantee that the particular order of ops is the fastest?
This is what is meant by Premature Optimization. Anything that you think is an optimization that has not been rigorously tested by way of a profiler or other definitive tool (clock cycles by ops is not a bad thing, but it only tells you performance characteristics ~ actual calls is more important than timing, usually), is a premature optimization.
#k_b says it well above me, and it's what I say too. Make it right, make it simple, then profile, then tweak. Repeat as necessary.
Order of priority: 1. It has to work
2. It has to be maintainable
3. It has to be machine-efficient
That was from the first week of my first programming course. In 1982.
"Premature optimization" is any time Priority 3 has been considered before Priority 1 or 2.
Note that modern programming techniques (abstractions and interfaces) are designed to make this prioritization easier.
The one gray area: during the initial design, you do have to check that your solutions are not inherently cripplingly slow. Otherwise, don't worry about performance until you at least have some working code.
For some people, optimization is part of the fun of writing code, premature or not. I like to optimize, and restrain myself for the sake of legibility. The advice not to optimize so much is for the people that like to optimize.
iphone programmers in particular seem
to think of avoiding premature
optimization as a pro-active goal
The majority of iPhone code is UI related. There is not much need to optimize. There is a need not to choose a poor design that will result in bad performance, but once you start coding up a good design there is little need for optimization. So in that context, avoiding optimization is a reasonable goal.
What do you consider "premature
optimization"? What practical rules do
you use to consciously or
unconsciously avoid it?
Using the Agile approach (rapid iterations with refinement of requirements through interactions with users) is helpful as the awareness that the current interface is probably going to change drastically after the next session with the users makes it easier to focus on developing the essential features of the application rather than the performance.
If not, a few iterations where you spent a lot of time optimizing a feature that was entirely discarded after the session with the user should give you the message.
Algorithm complexity, and even choice, is an issue that should be hidden behind an interface. For example, a List is an abstraction that can be implemented various ways with different efficiencies for different situations.
Sometimes avoiding premature optimization can aid design, because if you design with the idea that you will need to optimize later, then you are more inclined to develop at the abstract level (e.g. list) rather than the iimplementation (e.g. Array or linked list) level.
This can result in simpler, and more readable code, in addition to avoiding distraction. If programmed to the interface, different implementations can be swapped in later to optmize. Prematurely optimizing leads to the risk that implementation details may be prematurely exposed and coupled with other software components that should not see these details.
What practical rules do you use to
consciously or unconsciously avoid it?
One way to avoid unnecessary optimization is to consider the relative cost benefit:
A) Cost for programmer to optimize code + cost to test said optimization + cost of maintaining more complex code resulting from said optimization
vs.
B) Cost to upgrade server on which software runs or simply buy another one (if scalable)
If A >> B consider whether it's the right thing to do. [Ignoring for the moment the environmental cost of B which may or may not be a concern to your organization]
This applies more generally than just to premature optimization but it can help instill in your developers a sense that spending their time doing optimization work is a cost and should not be undertaken unless there is a real measurable difference in something that actually matters like: number of servers required or customer satisfaction through improved response times.
If management can't see the benefit in reduced $ and customers can't see the benefit in better response times, ask yourself why you are doing it.
I think this is a question of common sense. There's a need to understand the big picture, or even what's happening under the hood, to be able to consider when a so-called "premature" move is justified.
I've worked with solutions where web service calls were necessary to calculate new values based on the contents of a local list. The way this was implemented was by making a web request per value. It wouldn't be premature optimization to send several values at a time. The same thing goes for the use of database transactions for multiple operations, i.e. multiple inserts.
When it comes to algorithms, initially the most important thing is to get it right and as simple as possible. Worrying about stack vs heap issues at that point would be madness.
I recently learned about the rolling hash data structure, and basically one of its prime uses to searching for a substring within a string. Here are some advantages that I noticed:
Comparing two strings can be expensive so this should be avoided if possible
Hashing the strings and comparing the hashes is generally much faster than comparing strings, however rehashing the new substring each time traditionally takes linear time
A rolling hash is able to rehash the new substring in constant time, making it much quicker and more efficient for this task
I went ahead and implemented a rolling hash in JavaScript and began to analyze the speed between a rolling hash, traditional rehashing, and just comparing the substrings against each other.
In my findings, the larger the substring, the longer it took for the traditional rehashing approach to run (as expected) where the rolling hash ran incredibly fast (as expected). However, comparing the substrings together ran much faster than the rolling hash. How could this be?
For the sake of perspective, let's say the running times for the functions searching through a ~2.4 million character string for a 100 character substring were the following:
Rolling Hash - 0.809 seconds
Traditional Rehashing - 71.009 seconds
Just comparing the strings (no hashing) 0.089 seconds
How could the string comparing be so much faster than the rolling hash? Could it just have something to do with JavaScript in particular? Strings are a primitive type in JavaScript; would this cause string comparisons to run in constant time?
My main confusion is as to how/why string comparisons are so fast in JavaScript, when I was under the impression that they were supposed to be relatively slow.
Note:
By string comparisons I'm referring to something like stringA === stringB
Note:
I asked this question over on the Computer Science Community and was informed that I should ask it here as well because this is most likely JavaScript specific.
After some testing and analysis, I've come to the conclusion that there were a few reasons as to why my rolling hash approach was running slightly slower than simply comparing the two strings.
If the rolling hash claims to run in constant time, how can it be slower than comparing strings?
Functions are relatively slow - calling a function is slightly slower than simply executing code inline. In my particular case, a
function had to be called on my object every time the rolling hash
rehashes its internal window, therefore taking slightly longer to run
compared to the string comparison, since that code was simply inline. Especially since my benchmark has the rolling hash "shift" over 2 million iterations, this function slow down can be seen more clearly.
But why is the string comparison so fast?
Strings are primitive - Basically, because strings are a primitive type in JavaScript, the attempting to compare two strings will most
likely invoke some routine that is coded directly within the
interpreter. This low level evaluation can be done as fast as the
architecture possibly can (similar to comparing numbers).
In Conclusion
Comparing strings in JavaScript will end up being faster than a rolling hash in this scenario because the strings are primitive, therefore allowing the interpreter to work with these elements very quickly, and because simply calling functions will create a slight overhead and slow down the process on a very small scale.
When using the new Array(size) ctor, if size is not a constant, JS seems to create a sparse array in some browsers (at least in Chrome), causing access to be much slower than when using the default ctor, as shown here.
That is exactly the opposite of what I want: I pre-allocate an array of given size to avoid dynamic re-allocation and thereby improving performance. Is there any way to achieve that goal?
Please note that this question is not about the ambiguity of the new Array(size) ctor. I posted a recommendation on that here.
100000 is 1 past the pre-allocation threshold, 99999 still pre-allocates and as you can see it's much faster
http://jsperf.com/big-array-initialize/5
Pre-allocated vs. dynamically grown is only part of the story. For
preallocated arrays, 100,000 just so happens to be the threshold where
V8 decides to give you a slow (a.k.a. "dictionary mode") array.
Also, growing arrays on demand doesn't allocate a new array every time
an element is added. Instead, the backing store is grown in chunks
(currently it's grown by roughly 50% each time it needs to be grown,
but that's a heuristic that might change over time).
You can find more information ..here.Thanks ..:)
Enumerating the keys of javascript objects replays the keys in the order of insertion:
> for (key in {'z':1,'a':1,'b'}) { console.log(key); }
z
a
b
This is not part of the standard, but is widely implemented (as discussed here):
ECMA-262 does not specify enumeration order. The de facto standard is to match
insertion order, which V8 also does, but with one exception:
V8 gives no guarantees on the enumeration order for array indices (i.e., a property
name that can be parsed as a 32-bit unsigned integer).
Is it acceptable practice to rely on this behavior when constructing Node.js libraries?
Absolutely not! It's not a matter of style so much as a matter of correctness.
If you depend on this "de facto" standard your code might fail on an ECMA-262 5th Ed. compliant interpreter because that spec does not specify the enumeration order. Moreover, the V8 engine might change its behavior in the future, say in the interest of performance, e.g.
Definitely do not rely on the order of the keys. If the standard doesn't specify an order, then implementations are free to do as they please. Hash tables often underlie objects like these, and you have no way of knowing when one might be used. Javascript has many implementations, and they are all competing to be the fastest. Key order will vary between implementations, if not now, then in the future.
No. Rely on the ECMAScript standard, or you'll have to argue with the developers about whether a "de facto standard" exists like the people on that bug.
It's not advised to rely on it naively.
You should also do your best to stick to the spec/standard.
However there are often cases where the spec or standard limits what you can do. I'm not sure in programming I've encountered many implementations that deviate or extend the specification often for reasons such as the specification doesn't cater to everything.
Sometime people using specifics of an implementation might have test cases for that, though it's hard to make a reliable test case for beys being in order. It most succeed by accident or rather it's difficult behavior to reliably produce.
If you do rely on an implementation specific then you must document that. If your project requires portability (code to run on other people's setups out of your control and you want maximum compatibility) then in this case it's not a good idea to rely on an implementation specific such as key order.
Where you do have full control of the implementation being used then it's entirely up to you which implementation specifics you use while keeping in mind you may be forced to cater to portability due to the common need or desire to upgrade implementation.
The best form of documentation for cases like this is inline, in the code itself, often with the intention of at least making it easy to identify areas to be changed should you switch from an implementation guaranteeing order to one not doing so.
You can make up the format you like but it can be something like...
/** #portability: insertion_ordered_keys */
for(let key in object) console.log();
You might even wrap such cases up in code:
forEachKeyInOrderOfInsertion(object, console.log)
Again, likely something less overly verbose but enough to identify cases dependent on that.
For where your implementation guarantees key order you're just trans late that to the same as the original for.
You can use a JS function for that with platform detection, templating like CPP, transpiling, etc. You might also want to wrap the object creation and to be very careful about things crossing boundaries. If something loses order before reaching you (like JSON decode of input from a client over the network) then you'll likely not have a solution to that solely withing your library, this can even be just if someone else is calling your library.
Though you'll likely not need those, just make cases where you do something that might break later as a minimum and document that potential exists.
An obvious exception to that is if the implementation guarantees consistency. In that case you will probably be wasting your time decorating everything if it's not really a variability and is already documented via the implementation. The implementation often is a spec or has its own, you can choose to stick to that rather than a more generalised spec.
Ultimately in each case you'll need to make a judgement call, you may also choose to take a chance. As long as you're fully aware of the potential problems including the potential of wasting time avoiding problems you wont necessarily actually have, that is you know all the stakes and have considered your circumstances, then it's up to you what to do. There's no "should" or "shouldn't", it's case specific.
If you're making node.js public libraries or libraries to be widely distributed beyond the scope of your control then I'd say it's not good to rely on implementation specifics. Instead at least have a disclaimer with the release notes that the library is only catering to your stack and that if people want to use it for others then can fix and put in a pull request. Otherwise if not documented, it should be fixed.
I'm interested in finding a more-sophisticated-than-typical algorithm for finding differences between strings, that can be "tuned" via some parameters, to balance between such things as "maximize count of identical characters" vs. "maximize the length of spans" vs. "try to keep whole words intact".
Ultimately, I want to be able to make the results as human readable as possible. For instance, if a long sentence has been replaced with an entirely new sentence, where the only things it has in common with the original are the words "the" "and" and "a" in that order, I might want it treated as if the whole sentence is changed, rather than just that 4 particular spans are changed --- just like how a reasonable person would see it.
Does such a thing exist? Although I'm working in javascript/node.js, an algorithm in any language would be helpful.
I'm actually ok with something that uses Monte Carlo methods or the like, if its results are better. Computation time is not an issue (within reason), nor is determinism.
Note: although this is beyond the scope of what I'm asking, I'll throw one more thing out there just in case: It would also be great if it could recognize changes that are out of order....for instance if someone changes the order of two paragraphs while leaving them otherwise identical, it would be awesome if it recognized it as a simple move, rather than as one subtraction and and one unrelated addition.
I've had good luck with diff_match_patch. There are some good options for tuning it for readability.
Try http://prettydiff.com/ Its code is already formatted for compatibility with CommonJS, which is the framework Node uses.