A Camera Class

For those like me that are interested in Game Development as a hobby, one of the main challenges to build a 3D game is understanding the OpenGL framework.

The distinction between the different coordinates systems is mandatory if one doesn’t want to get ultimately lost in coordinates transformations. A world coordinate system is a system of coordinates with which the scene is best described. It is the system that arises naturally from the game constrains and scene. The eye coordinate system is the system OpenGL understands, with the z coordinate sticking out from the screen and x and y running along the screen edges.

Even further, we also need a camera in our scene, that will be the point of view from where we render the scene.

It is very useful to split an OpenGL program into things our virtual objects do in the scene and how the camera moves in the scene. To achieve that split of responsibilities, it is necessary to have a Camera class that encapsulates the state of the camera at all times. The camera should know how to return its view transformation matrix to modify the point of view from which the objects will be rendered. This camera or view transformation will be the matrix that transform the vertices from the world coordinates to the eye coordinates.

For the impatient: a direct link to the github repository containing the Camera Class implementation on Objective-C can be found on this link. It can be easily modified for C++.

This article will try to explain hot to set a view transformation for a Camera class.

One way to implement a camera is to set the position and orientation of the camera in world coordinates. With this one can work out the transformation matrix to be applied to the vertices on the scene. The following will assume that we have a vector that points where the camera is looking at (lookAt), an approximate vector for what is considered “up” in the camera (upVector), and the coordinates of the camera (pos) on the scene. All three vectors in world coordinates. The upVector doesn’t necessarily must be the true upwards vector. It only must have a non-zero component along the true “up” direction.

In what follows, {i, j, k} are the basis vectors of the world coordinate system and {i’, j’, k’} are the basis vectors of the eye coordinate system. In the eye coordinate system -the only one that OpenGL cares about- the eye is always pointing in the -z direction. From the pos and lookAt vectors, we can find what is the k’ direction of the camera in world coordinates.

k’ = normalize(lookAt – pos)

k’ = normalize(poslookAt)

“Up” is in the y direction in the eye coordinate system. The upVector would be j’ if it weren’t for the fact that it’s not necessarily completely upwards. Nevertheless, the cross product between upVector and k’ will be i’, as long as upVector has a component in the true j’ direction.

i’ = normalize(upVector x k’).

Finally the cross product of k’ and i’ will give us the true j’.

j’ = k’ x i’

Now that we have the relative orientation of the camera {i’, j’, k’} with respect to world coordinates {i, j, k} we can figure out the transformation (it’s a rotation) between the two frames. This can be done easily noting that i’ = (i’.i) i + (i’.j) j + (i’.k) k etc. which can be written formally (it’s not a true matrix-vector multiplication)

\left[ \begin{array}{c}    i' \\    j' \\    k'    \end{array} \right]    = \begin{bmatrix} i'.i & i'.j & i'.k\\    j'.i & j'.j & j'.k \\    k'.i & k'.j & k'.k    \end{bmatrix}    \left[ \begin{array}{c} i \\    j \\    k \end{array} \right] =    \left[ \begin{array}{c} i' \rightarrow \\    j' \rightarrow \\    k' \rightarrow \end{array} \right]    \left[ \begin{array}{c} i \\    j \\    k \end{array} \right] \equiv R \left[ \begin{array}{c} i \\    j \\    k \end{array} \right]

where i' \rightarrow means write the components of i’ in the {i,j,k} base in that row. For any other vector (x’,y’,z’) the transformation is the same. It can be seen by writing

\begin{array}{l} v = x' i' + y' j' + z'k' = (x',y',z') (i', j', k')^t = (x',y',z') R (i, j, k)^{t} = (x,y,z).(i, j, k)^t \\    \Rightarrow (x',y',z') R = (x,y,z) \\    \Rightarrow (x',y',z') = (x,y,z) R^t \\    \Rightarrow (x',y',z')^t = R.(x,y,z)^t    \end{array}

(R^t=R^{-1} since it is an orthonormal matrix).

R is the rotation that takes a vector from world coordinates to eye coordinates. Its transpose will take from eye to world coordinates. To put the camera in the world we did two operations. First we rotated the camera at the origin by matrix R^t and then we translated it by vector pos.

Mcam = T(pos).RT

The matrix we have to apply to vertices to simulate the camera motion is the inverse of this Mcam

Mv = R . T(-pos)

Implementation in OpenGL|ES:

OpenGL|ES stores the matrices in a column-major format. (We can equivalently think that it stores the transpose of the matrix in a row-major format.) To construct the matrix R, we have to fill each row with the primed direction versors in the array R[16].

\left[ \begin{array}{c}    i' \rightarrow 0 \\    j' \rightarrow 0 \\    k' \rightarrow 0 \\    0 \ldots 1    \end{array}    \right] =    \begin{bmatrix} R_0 & R_4 & R_8 & R_{12} \\    R_1 & R_5 & R_9 & R_{13}    \\ R_2 & R_6 & R_{10} & R_{14}    \\ R_3 & R_7 & R_{11} & R_{15}    \end{bmatrix}

 for (int i = 0; i < 3; i++) {
   for (int j = 0; j < 3; j++) {
     viewMatrix[i*4 + j] = prime[j][i];
   viewMatrix[i*4 + 3] = 0.;
 for (int i = 12; i < 15; i++) viewMatrix[i] = 0.;
 viewMatrix[15] = 1.;

The translation as an array T[16] is the usual one and has the form:

\begin{bmatrix} 1 & 0 & 0 & t_x    \\ 0 & 1 & 0 & t_y    \\ 0 & 0 & 1 & t_z    \\ 0 & 0 & 0 & 1    \end{bmatrix} =    \begin{bmatrix} T_0 & T_4 & T_8 & T_{12} \\    T_1 & T_5 & T_9 & T_{13} \\    T_2 & T_6 & T_{10} & T_{14} \\    T_3 & T_7 & T_{11} & T_{15}    \end{bmatrix}

But since T is mostly empty we can make the multiplication R.T without storing the array T. The rotation block will remain untouched and R_{12}, R_{13}, R_{14} will be replaced by a linear combination of the rotation terms and the translation terms.

\begin{array}{l} R_{12} \leftarrow t_x R_0 + t_y R_4 + t_z R_8 \\    R_{13} \leftarrow t_x R_1 + t_y R_5 + t_z R_9 \\    R_{14} \leftarrow t_x R_2 + t_y R_6 + t_z R_{10}    \end{array}

In code:

 viewMatrix[12] = - position[0] * viewMatrix[0] \
                  - position[1] * viewMatrix[4] \
                  - position[2] * viewMatrix[8];

 viewMatrix[13] = - position[0] * viewMatrix[1] \
                  - position[1] * viewMatrix[5] \
                  - position[2] * viewMatrix[9];

 viewMatrix[14] = - position[0] * viewMatrix[2] \
                  - position[1] * viewMatrix[6] \
                  - position[2] * viewMatrix[10];

The whole project (WIP) can be found on this link: https://github.com/martinberoiz/CameraClass

Note: I will update this post with more details.

A Camera Class