"Premature optimization is the root of all evil"
It's an oft-repeated mantra among people in the industry – everybody rightly presumes that slow working code is better than fast broken code.
However this is based on the dubious assumption that a given programmers code is bound to be less ridden with bugs if he codes a slow approach. And the further weak premise that code is ever fully correct.
Let's look at another industry, the automotive one – Any serious vehicle is designed with a certain performance metric from the start, and unless it meets that, it is a failure. In the software world, fast code is often seen as complex code, but that is a very naive view of optimization. A true optimization, in theory, reduces the amount of work being done, and often the amount of code. Any complexity induced is more of a syntactic nature, depending on the expressive skill of the programmer.
Compilers are damn smart these days, but then again things haven't really changed that much – Take a look at the following code that copies C style NUL terminated strings.
int j = 0;
while(src[j])
{
dst[j] = src[j];
++j;
}
Then this :
while(src[j])
{
dst[j] = src[j];
++j;
}
Then this :
char *pSrc = src, *pDst = dst;
while(*pDst++ = *pSrc++);
The second version is full 6% faster measured over a million iterations for 64KB strings, compiled with MSVC 2010, all optimizations on.
One would assume that the compiler is capable of generating the same code for both, but in the first case it reloads the value for src[j] in order to check it for NUL in the while loop. In the second version, --which generates the exact same code as an inline call to strcpy() --, one memory access is avoided.
6% is not an insignificant fraction, for a trivial optimization like this.
Of course, this is a contrived example, but the same principles hold in a number of scenarios.
Some would argue that the second version is less readable. I would counter that it indicates a lack of fluency in the language involved.
Any non-trivial piece of code requires a certain amount of programming skill, and in most cases, someone who is qualified enough to write a correct implementation is capable of writing a correct and fast version too.
If someone makes errors during optimization, that is essentially carelessness that would have happened even while writing a vanilla implementation. Sloppy coding won't be cured by using naive approaches.
Starting out with the intent that code must be fast as well as correct, one must gather good habits and make it almost reflex like to perform small optimizations and diligently work to understand fast ways of doing common tasks. You don't become a boxer by first concentrating on accuracy, then strength, then speed – You do all three at once. An awareness of what is happening behind the scenes in the generated machine code is very important; not only for speed optimization, but also for figuring out what is wrong when things go south. This applies to even non-compiled languages – Understanding how memory is organized and how data structures end up on the metal leads to good coding approaches.
Dumbing down the code initially will most probably lock it down in such a fashion that reworking it to be faster at a higher level will become impractically difficult, and doing low level optimizations later, may not be of much value.
It's the difference between a super car built ground up for performance versus a clunker with bolt on supercharging.
There is a lot of running code out there, almost all of it is buggy, but very little is efficient – Bloat abounds, since hardware cost goes down and performance goes up regularly. However, the brick wall is near, since improving serial code execution speeds has become more and more difficult. The only way out is parallelism, and that is being marketed aggressively; the only problem is that no one got the memo on the inapplicability of parallelism for most code.
You can have a zillion cores and parallelize all the possible parts of your program, but the serial path will remain, and the only choice is to go back to school and read some Michael Abrash
TANSTAFC - There ain't no such thing as the fastest code
while(*pDst++ = *pSrc++);
The second version is full 6% faster measured over a million iterations for 64KB strings, compiled with MSVC 2010, all optimizations on.
One would assume that the compiler is capable of generating the same code for both, but in the first case it reloads the value for src[j] in order to check it for NUL in the while loop. In the second version, --which generates the exact same code as an inline call to strcpy() --, one memory access is avoided.
6% is not an insignificant fraction, for a trivial optimization like this.
Of course, this is a contrived example, but the same principles hold in a number of scenarios.
Some would argue that the second version is less readable. I would counter that it indicates a lack of fluency in the language involved.
Any non-trivial piece of code requires a certain amount of programming skill, and in most cases, someone who is qualified enough to write a correct implementation is capable of writing a correct and fast version too.
If someone makes errors during optimization, that is essentially carelessness that would have happened even while writing a vanilla implementation. Sloppy coding won't be cured by using naive approaches.
Starting out with the intent that code must be fast as well as correct, one must gather good habits and make it almost reflex like to perform small optimizations and diligently work to understand fast ways of doing common tasks. You don't become a boxer by first concentrating on accuracy, then strength, then speed – You do all three at once. An awareness of what is happening behind the scenes in the generated machine code is very important; not only for speed optimization, but also for figuring out what is wrong when things go south. This applies to even non-compiled languages – Understanding how memory is organized and how data structures end up on the metal leads to good coding approaches.
Dumbing down the code initially will most probably lock it down in such a fashion that reworking it to be faster at a higher level will become impractically difficult, and doing low level optimizations later, may not be of much value.
It's the difference between a super car built ground up for performance versus a clunker with bolt on supercharging.
There is a lot of running code out there, almost all of it is buggy, but very little is efficient – Bloat abounds, since hardware cost goes down and performance goes up regularly. However, the brick wall is near, since improving serial code execution speeds has become more and more difficult. The only way out is parallelism, and that is being marketed aggressively; the only problem is that no one got the memo on the inapplicability of parallelism for most code.
You can have a zillion cores and parallelize all the possible parts of your program, but the serial path will remain, and the only choice is to go back to school and read some Michael Abrash
TANSTAFC - There ain't no such thing as the fastest code
No comments:
Post a Comment