- A rectangle is chosen (initially the entire image), and for both vertical and horizontal directions, at every possible location to split was tested.
- At each split point, the pixel counts for the two sub rectangles were taken for each of the three input images.
- The split point was assigned a score based on how uniform the two sub rectangles were in terms of the labeled pixels.
- The best split point was chosen and we now had two rectangles.
- The process repeated recursively for both these rectangles, until they reached a certain score, or a certain minimum size.
After I got the initial implementation working, it was horrendously slow, because for every rectangle, at every horizontal or vertical split position, the number of “on” pixels was counted by iterating entirely over the two split rectangles The first optimization that came to mind was to initially do the count for the whole rectangle and maintain the pixel count at every split line too. The pixel counts for both sides could then be incrementally calculated, as the split point moved across the rectangle. However, this process would have to be repeated for the sub rectangles, so it would still be slow. I scratched my head for about a minute and then suddenly I had it – If we could calculate and cache the counts for every possible rectangle starting at (0, 0) then we could get the count for any arbitrary rectangle with a handful of arithmetic ops – thusly :
To get the count for the green rectangle K, we take the sum of the entire image up to the bottom right corner of K and from that we subtract the top region and the left region. We ended up subtracting the purple top-left region twice, so we add it back once. All these rectangles start at (0, 0) can be read directly from our precomputed cache.So in essence for any rectangle in the image, we can get the pixel count by doing 4 memory accesses and three subtractions.
Blazing fast! So fast in fact, that this becomes the fastest part of the whole algorithm.
After a few days, I find out that this technique is a known one called by the fancy name “integral images”, and can be used in a number of applications where arrays of any number of dimensions have to be summed up.
I'm thinking of implementing a 2D LZ77 like compression technique based on this that would be suitable for screen-shot like images. The idea would be to quickly identify two equal sized rectangular regions that sum up to the same value. Two patches that had the same sum would be a potential match. Differing sums would mean the patches could never match exactly, so we avoid a lot of pixel-wise comparison. That means we can test every possible rectangle against every other one to find matches, and do it quite fast.
What was a bit funny to me was that this technique seems so obvious (as I said before, it took me all of a minute to come up with this and realize it would work), but has a pretty fancy name and was one of the key “innovations” of the famous Viola-Jones face detection paper, which convinces me further of how academic literature tends to make quite simple stuff seem complex and path-breaking.
Moral of the story: