Antics with semantics: Reinventing the wheel for fun and profit

A few months back, I had been involved in an image processing project, which dealt with image binarization for OCR. One of the steps during that process was to segment an image into regions labeled red, white and black. They represented black on white text, white on black text and blank regions respectively. The input consisted of three monochrome images with non regular regions filled with white. The segmentation was done as follows:

A rectangle is chosen (initially the entire image), and for both vertical and horizontal directions, at every possible location to split was tested.

At each split point, the pixel counts for the two sub rectangles were taken for each of the three input images.

The split point was assigned a score based on how uniform the two sub rectangles were in terms of the labeled pixels.

The best split point was chosen and we now had two rectangles.

The process repeated recursively for both these rectangles, until they reached a certain score, or a certain minimum size.

Suffice to say that most of the work involved counting the number of “on” pixels in an arbitrary rectangular region of an image. Here is a visualization of how the splitting ends up for a web page image – The regions that are predominantly white on black are green, those that are black on white are marked red and the split lines are blue.

After I got the initial implementation working, it was horrendously slow, because for every rectangle, at every horizontal or vertical split position, the number of “on” pixels was counted by iterating entirely over the two split rectangles The first optimization that came to mind was to initially do the count for the whole rectangle and maintain the pixel count at every split line too. The pixel counts for both sides could then be incrementally calculated, as the split point moved across the rectangle. However, this process would have to be repeated for the sub rectangles, so it would still be slow. I scratched my head for about a minute and then suddenly I had it – If we could calculate and cache the counts for every possible rectangle starting at (0, 0) then we could get the count for any arbitrary rectangle with a handful of arithmetic ops – thusly :

To get the count for the green rectangle K, we take the sum of the entire image up to the bottom right corner of K and from that we subtract the top region and the left region. We ended up subtracting the purple top-left region twice, so we add it back once. All these rectangles start at (0, 0) can be read directly from our precomputed cache.So in essence for any rectangle in the image, we can get the pixel count by doing 4 memory accesses and three subtractions.

Blazing fast! So fast in fact, that this becomes the fastest part of the whole algorithm.

After a few days, I find out that this technique is a known one called by the fancy name “integral images”, and can be used in a number of applications where arrays of any number of dimensions have to be summed up.

I'm thinking of implementing a 2D LZ77 like compression technique based on this that would be suitable for screen-shot like images. The idea would be to quickly identify two equal sized rectangular regions that sum up to the same value. Two patches that had the same sum would be a potential match. Differing sums would mean the patches could never match exactly, so we avoid a lot of pixel-wise comparison. That means we can test every possible rectangle against every other one to find matches, and do it quite fast.

What was a bit funny to me was that this technique seems so obvious (as I said before, it took me all of a minute to come up with this and realize it would work), but has a pretty fancy name and was one of the key “innovations” of the famous Viola-Jones face detection paper, which convinces me further of how academic literature tends to make quite simple stuff seem complex and path-breaking.

Moral of the story:

Invent as many wheels as you can - many of them will be re-inventions, but the process of invention is more fun and beneficial than that of using a pre-invented wheel.

Antics with semantics

Monday, December 5, 2011

Reinventing the wheel for fun and profit

No comments:

Post a Comment