Recently I worked on an interesting project, which involved creating rudimentary 3d rendering on a 2d HTML5 canvas using javascript.

Why not WebGL or CSS 3d transforms?

Because this was meant to run on some smart phones that did not support those in their browsers. The client required a way to display an image rotated in 3d space, with perspective projection. Now, while the canvas object supports 2d transformations like rotation, scaling and shearing, it is not possible to do perspective projection using these alone, so a from scratch implementation was needed.

Considering that it had to be coded in javascript, the implementation had to be as fast as possible so that it could be used for real time animation effects.

**How it's done**

The first thing to do was to create a representation of a 3d co-ordinate, that could be rotated, scaled and translated in any dimension.

Now I'm not a big fan of plugging in formulas into code directly, so I usually rework the math from scratch every time. I opted not to use the so called simple approach of homogenous transformation matrices, since (at least to me), the real meaning of what happens to a 3d point when transformed gets obscured in the matrix. Instead I chose a more direct approach. When we consider a 3d rotation, we see that it can be represented by rotation along two axes.

I opted to use the intuitive representation of yaw and pitch - Imagine a sphere like the earth with a point on it - Yaw makes the point orbit along its latitude, while pitch moves it along its longitude.

After some scribbled diagrams, a little sine here and a little cosine there, I got the rotation math right. Translation and scaling are very simply addition and multiplication of the co ordinate values by some factor.

While coding this, I realized one long forgotten trick - About two decades ago, I remember a book I read titled **Electron Graphics and Sound** which first taught me the fundamentals of computer graphics.

Now the Electron microcomputer, and its more advanced cousin the BBC micro were 8 bit machines from the early 80s and ran at 1 Mhz with 32KB of RAM. In the book, the author detailed three methods of drawing circles, each faster than the previous.

*Both these approaches were very slow, because square-roots and trigonometric operations were very expensive on these machines.*

- Solve Y for every X, given R and the Pythagoras theorem.
- Use X = R cos(theta) and Y = R sin(theta)

The third method calculated a sine and cosine only once outside the loop, and had nothing but arithmetic inside the loop. It was blazing fast. I never fully understood how that worked back then, but knew that there was something to it. For years afterwards I tried to remember what the code was meant to do, but I could not, and I didn't have the book anymore.

Now suddenly, while I was implementing the rotation code for this 3d stuff, it dawned on me, and I rediscovered the third algorithm finally. I'll detail that in the next post.

So now that I had the 3d co-ordinates representation worked out, I created an abstraction for an image floating around in 3d space, with each of its corners represented as a 3d point. The next step is...

**2d Projection**

We need to map those image edge points onto the 2d screen, correcting for perspective. This is quite a simple operation...

In most 3d scenes represented on a 2d view, there is a vanishing point and as things get further away, they get closer to the vanishing point. At infinite distance, all 3d points map onto the vanishing point. So it's a simple matter of moving the X and Y co-ordinates towards the vanishing point based on the Z.

You need to choose some projection parameters such as the view angle/scale, how many pixels represents one unit of space along the Z axis, and where along the Z axis the observers eye is located. If we call the unit zunit, the scale factor zscale and the eye position zeye then : *The object appears zscale times smaller for every zunit away from zeye ...*

That sounds very German when spoken out loud but it's English! Now we end up with a trapezoid that represents how the image edges would look in 3d space, when drawn on the 2d screen. Next we move on to...

**Texture mapping**

We need to map the image texture from its original rectangular shape onto the projected trapezoid. The naive way to do this would be :

- Loop over every image co-ordinate
- Transform the point according to the current pitch/yaw/scale
- Project the point onto 2d
- Draw that pixel with the image pixel

This would be embarrassingly slow and moreover it would also be wrong! First of all, doing several trigonometric ops per pixel will defenestrate performance completely and secondly this assumes that every single source pixel on the image rectangle maps to a single pixel on the destination - which is wrong.

An image pixel can actually map to any number of pixels on the trapezoid, since a region of the trapezoid can be bigger or smaller than the corresponding region on the image. Thus, we need to use the well established technique of doing mappings in reverse - the logic is as follows :

*OK that deals with one of the problems, above but what about performance? First of all, how do we iterate over all the pixels on the quad? The fast way to do this is to always is scan-line by scan-line, which ensures that memory access is as linear as possible when we access the pixel data. So the first step is to find out the vertical span of the trapezoid - which is nothing but the minimum and maximum Y of all of the four corners. Now we iterate over the Y span and figure out the X span at each Y thusly:*

- We need to draw every pixel on the destination trapezoid
- We need to draw each pixel once and only once!
- Thus we should iterate over the destination pixels, find the corresponding source point on the image, and then paint that on

Now that we have a Y and a range of X to draw on the destination, we need to find which point on the image maps here.

Say we are on point P on the red scan-line... we are a fraction PR/RS along the line RS. R is a fraction AR/AB along line AB and S is a fraction AS/AC along line AC. We could find out where P would end up on the source image using the same relation, interpolating linearly, and using the image vertices for A, B and C. This ought to be the pixel that's drawn at P on the trapezoid...

Except that things are not so linear - The results looked good on photographic images, but when I used a grid image as a test, the output looked extremely warped when the perspective was high. There was some curvature induced on horizontal lines like this :

Looks somewhat like relativistic contraction effects! But we're supposed to be stationary! This approach was super-fast - It was down to about 13ms for rendering a 800x600 image on Chrome.

There were only about 8 arithmetic operations in the inner loop. The warping is quite undetectable on photographic images, but still, it's wrong!

**The right way**

So I decided to rework the math to solve the following problem :

Once we had f1 and f2, we could look up the same point on the image rectangle easily. I started out on paper, wrote the equations for f1 and f2, and filled several sheets, getting nowhere.

I started again, this time using a text editor, so I could avoid mistakes when substituting expressions. I also wrote a C program to assert the validity of each step of the equation as I progressed, using actual values, and this helped me catch some errors I had made. This seems like a good way to do algebra, since later on, you can simply copy and paste the code into your program!

I won't bore you with the algebra, since it's quite involved... but finally I ended up with a quadratic equation, that could be solved for f1, and f2 could be derived easily from f1.

Now, I knew this was going to be quite slow! Solving a quadratic for every pixel is heavy, since there is a square root involved, and besides the co-efficients were complex expressions involving many of the input co-ordinates and distances. More variables accessed in the inner loop always means more code-sloth. I tested it out and the rendering was perfect, but the performance abysmal. That needed fixing.

Then I recalled my client had mentioned something about implementing Quake style optimizations, since it had been mentioned in a web page that demonstrated perspective texture mapping in C. After looking it up, I realized quite soon what the idea was (vaguely remember reading about this in **Michael Abrash's Black Book Of Graphics Programming). **

In a nutshell, instead of doing the mapping math on every pixel, do it every N pixels and in between use linear interpolation for the texture co-ordinates. Over a small region such as 16x16, the error is not visible, so it's a great compromise between speed and accuracy. The funny thing is that if we increase the step from N to say, the entire X span width, we end up with exactly the original incorrect warped output!

Then I did a few vanilla optimizations - loop hoisting, making the texture image's width padded to a power of two (a simple trick to make i = y * w + x into i = y << n + x while accessing the canvas data buffer), and reducing variable access in the inner loop. After implementing all this, the performance ended up at about 60ms per frame to render a 800x600 image texture. In a pinch, the step N can be increased to about 64 to double performance and it still looks good.

Nice! Any sample output or code snippets?

ReplyDeleteCan't put the code since I've sold it, but I could make a demo Java applet to demonstrate the technique.

ReplyDeleteI am the client who commissioned the work. If Naren leaves some contact info I could get in touch to discuss how it will be used and if he/she wishes to be involved in the development process and get access to the code.

ReplyDeleteThis work was done to go into a library which I'll release as open source, but the library is not ready yet. If any developers wish to work with the code and assist me in releasing it as open source as a single component I'd be interested. I want some more work to be done on the API, documentation and demos before releasing it as open source (MIT or LGPL or other permissive license), probably to GitHub.