Homography in changing camera frames(1): Context

This post aims at modeling the image transformation between two camera frames by a homography. I have searched on the internet, most of the posts/forums present the homography in a general way. But in a specific context, for instance, a moving camera frame, a detailed study can reveal some interesting properties.


Given 2 camera frames F_c and F_d, where F_c is the current frame and F_d is the desired frame. The transformation between the two camera frames can be estimated as a rotation R and a translation t.

Let’s consider a planar surface(the planar assumption is a strong assumption for homography estimation) which is a composition of n points. For a point P in the planar surface, we have a Cartesian coordinate:

P = (x, y, z) in F_c and P^* = (x^*, y^*, z^*) in F_d

Since the transformation is relative, namely, the transformation of the camera frame can be considered as the transformation of points in the camera frames. We can then model the homogeneous transformation matrix T for the 3D points as follows:

T =\begin{bmatrix}R  & t \\0 & 1\end{bmatrix}

As a reminder, the homogeneous coordinate for a 3D point (x, y, z) would be (kx, ky, kz, k), where k is a scale factor. Note that the transformation is from frame F_d to frame F_c. Figure 1 [1] visualize the context to help better understand the configuration.

Figure 1: context of the moving camera frames

We assume then one image is taken at each position. That means the surface would be projected to a plane that is parallel to the XoY plane of the camera frame(imagine that the camera looks along z axis).

In other words, the projected points of the surface would be considered to have the same z coordinate. We can normalize the z coordinate to be 1, then we have the projection of the point P. In Figure 1, m_i^{*} is the projection of P_i in frame F_c and m_i is that in the frame F_c, respectively. The following equation holds:

m_i^{*} = (x^*, y^*, 1) and m_i = (x, y, 1)

In the end, due to the camera’s setting and quality, there exists some kind of distortion related to the camera itself. We model this effect by the intrinsic parameters of the camera, denoted as K. In general, K is an upper triangular matrix containing the camera intrinsic parameters as follows:

K =\begin{bmatrix}f_x & s & x_0 \\0 & f_y & y_0\\0 & 0 & 1\end{bmatrix}

where f_x and f_y are focal length of the camera, x_0 and y_0 are principal points offset and s is the axis skew. More details can be found here. K is also named as a camera calibration matrix.

In the end, we can recover the homogeneous image coordiante, denoted as p using the following transformation.

p = K.m

For further calculation of homography, we define the distance from the camera frame to the plane as d^* and d(dashed line in Figure 1), the normal of the plane surface as n^* and n in two different frames.

That is the first part of the homography estimation in moving camera configuration. It takes a while to set up properly the configuration. I think it is important since “A good beginning is half done”.

With a well-defined system, we will explore some interesting properties of homography in this specific context. Please see the next post!




Leave a Reply

Your email address will not be published. Required fields are marked *