Modern C++
I've been primarily a C++ programmer, even though I have dabbled with Assembler, C, Object Pascal, Java, C#, and Javascript (in that order, chronologically). Over the years, I learned to adapt my style as much as possible from the "C with classes" style to "C++ with STL" to "Modern C++".
There's a lot of awesome material online from
Andrei Alexandrescu, Herb Sutter, Bjarne Stroustrup and Scott Meyers, that mostly popular since the C++0x standard came into the light and after looking at those, my understanding of the C++ toolbox improved quite a bit.
One of the oft repeated tenets in the modern C++ community is that well written generic code using type-rich templates will produce faster code than C style if used properly. I always took this on faith, since the people who claim this (like the above 4 gentlemen) have forgotten more C++ than I ever learned.
Now that I started work on
BABEL, I decided to test this out and see with my own eyes, if modern C++ with abstract,type-rich idioms can beat plain old C style code, while remaining extremely high level and readable.
A numerical sequence class
I often deal with problems that involve the use of intervals or ranges... I wanted to create an abstraction that lets me deal with these in a simple manner.
Thus I want a type that holds a range like
[begin, end) and define operator overloads and other member functions for slicing, disjunction and so on. One useful feature would be to have it support an iterator interface, so that it plugs into STL numeric algorithms easily.
Here's the basic public interface I ended up with :
template<typename t> struct TNumSeq
{
public:
friend inline bool operator<(const TNumSeq &range, const T &val);
friend inline bool operator>(const TNumSeq &range, const T &val);
friend inline TNumSeq operator&(const TNumSeq &seq1, const TNumSeq &seq2);
friend inline TNumSeq operator&(const TNumSeq &seq, T val);
friend inline TNumSeq operator&(T val, const TNumSeq &seq);
inline TNumSeq();
inline TNumSeq(const T &b, const T &e);
inline explicit TNumSeq(const T &b);
inline TNumSeq &operator=(const TNumSeq &that);
inline TNumSeq& operator&=(const TNumSeq &that);
inline TNumSeq operator&(const TNumSeq &that);
inline bool has(const T& val) const;
inline operator bool() const;
class const_iterator : public std::iterator<std::forward_iterator_tag, T>
{
public:
const_iterator(const TNumSeq &seq, const T &inc, const T &val);
inline T operator*() const;
inline const_iterator operator++(int);
inline const_iterator &operator++();
inline bool operator==(const const_iterator &that) const;
inline bool operator!=(const const_iterator &that) const;
};
const_iterator begin(T inc = 1) const;
const_iterator end(T inc = 1) const;
};
It's quite simple, you can define a numeric range, use the
< and
> operators to check if a value or another range is outside the range on either side, you can use
& to get the part of a range that overlaps with another.
Empty ranges work fine, implicit bool conversion
(Danger! Will Robinson) checks for emptiness. The
has() member gives you membership test. And there is a forward
const_iterator that lets this class masquerade as if the sequence were a real set of stored integers. The
begin() and
end() functions take a stepping increment for iteration by any "jump".
So to test this out... I tested the following code :
int i = 0, j = 1 << 30, sum = 0;
for(int n = i; n < j; n ++)
{
sum += n;
}
versus
int i = 0, j = 1 << 30;
TNumSeq<int> s(i, j);
int sum = std::accumulate(s.begin(), s.end(), 0);
Ignore the fact that the default int will overflow the summation, for the range
0 to
2^30, that's immaterial for this test. One (and certainly, yours truly) would imagine that by no stretch of imagination, the generated code for the second example would be faster that the bare bones loop in the first - Look at all that is happening in the
accumulate() call :
- Two method invocations to get the iterators, which are at least 12 byte objects, have a constructor, and passed by value to accumulate, thus copying them twice each (apparently).
- A loop inside, which calls the comparison,increment and dereference operators on the iterator object, both of whose implementations contain member access.
Seems like an awfully complicated code path for a very trivial task - but when I timed this I found that the two versions run at the same speed. Go figure!
Here is the exact assembly code generated by the compiler for the two cases :
009A18E0 add ecx,eax
009A18E2 inc eax
009A18E3 cmp eax,40000000h
009A18E8 jl testNumSeq+60h (9A18E0h)
versus
009A1BE0 add ecx,eax
009A1BE2 add eax,esi
009A1BE4 cmp eax,40000000h
009A1BE9 jne testNumSeq+360h (9A1BE0h)
And for increment by an arbitrary number :
00BA18E4 add ecx,eax
00BA18E6 add eax,2
00BA18E9 cmp eax,40000000h
00BA18EE jl testNumSeq+64h (0BA18E4h)
versus
00BA1BF4 add ecx,eax
00BA1BF6 add eax,edx
00BA1BF8 cmp eax,40000000h
00BA1BFD jne testNumSeq+374h (0BA1BF4h)
No difference except the fact that the for loop uses a constant as the increment, whereas the TNumSeq uses a variable on a register as the increment. How the hell does this happen? It seems like magic, the way that the compiler took away all that 50 odd lines of code in my class and came up with the succinct loop above. The answer is that it is due to being able to express intention more clearly to the compiler. There are many little things that happen, I will detail a few...
- RVO and move semantics - The iterator pass by value copying is elided by RVO (return value optimization). With C++11, the move semantics feature takes care of most cases of this at the language level, and whenever an object is moved, it actually reuses the storage and state, rather than copy to a new place and destroy the old one. That means, you can treat any old object, even if it is large as if it were like an int and pass it around by value with impunity, knowing that the language/compiler is taking care of efficiency concerns. No overhead induced or change to existing code!
- const correctness - since the parent sequence of the iterator is stored within the iterator as a const reference, and the iterator dereference operator is const, The compiler knows that the state of the iterator cannot change durin operator* and that the sequence cannot change during the that as well as operator++. This means it can keep those values in registers.
- template inlining - templates are not code, they are actually code generators with zero overhead. All the above code becomes a single chunk of some meta-C++ which, unlike the C style code, has information alongside about what the code means. The compiler is able to use this extra information to generate the best possible machine code. Trust me, it's way more smarter than you or me, when it comes to this. As an assembly programmer I often observe compiler generated machine code to see if there is any scope of improvement and 9 times out of 10, there is none, unless I radically change the algorithm or use vectorization instructions...
So, in conclusion, we can see that writing idiomatic C++ code in a modern way sacrifices nothing, in performance while still providing high level abstractions. This is the future! To quote Sutter from
GoingNative2012 "Performance has always been the king, since the entire history of programming, except the last decade,but now it's 'The Return of the king'
More on this as i progress with BABEL