This post aims at modeling the image transformation between two camera frames by a homography. I have searched on the internet, most of the posts/forums present the homography in a general way. But in a specific context, for instance, a moving camera frame, a detailed study can reveal some interesting properties.
Context:
Given 2 camera frames and
, where
is the current frame and
is the desired frame. The transformation between the two camera frames can be estimated as a rotation
and a translation
.
Let’s consider a planar surface(the planar assumption is a strong assumption for homography estimation) which is a composition of points. For a point
in the planar surface, we have a Cartesian coordinate:
in
and
in
Since the transformation is relative, namely, the transformation of the camera frame can be considered as the transformation of points in the camera frames. We can then model the homogeneous transformation matrix for the 3D points as follows:
As a reminder, the homogeneous coordinate for a 3D point would be
, where
is a scale factor. Note that the transformation is from frame
to frame
. Figure 1 [1] visualize the context to help better understand the configuration.

We assume then one image is taken at each position. That means the surface would be projected to a plane that is parallel to the plane of the camera frame(imagine that the camera looks along
axis).
In other words, the projected points of the surface would be considered to have the same coordinate. We can normalize the
coordinate to be
, then we have the projection of the point
. In Figure 1,
is the projection of
in frame
and
is that in the frame
, respectively. The following equation holds:
and
In the end, due to the camera’s setting and quality, there exists some kind of distortion related to the camera itself. We model this effect by the intrinsic parameters of the camera, denoted as . In general,
is an upper triangular matrix containing the camera intrinsic parameters as follows:
where and
are focal length of the camera,
and
are principal points offset and
is the axis skew. More details can be found here.
is also named as a camera calibration matrix.
In the end, we can recover the homogeneous image coordiante, denoted as using the following transformation.
For further calculation of homography, we define the distance from the camera frame to the plane as and
(dashed line in Figure 1), the normal of the plane surface as
and
in two different frames.
That is the first part of the homography estimation in moving camera configuration. It takes a while to set up properly the configuration. I think it is important since “A good beginning is half done”.
With a well-defined system, we will explore some interesting properties of homography in this specific context. Please see the next post!
Reference:
https://hal.inria.fr/inria-00174036v3/document
http://ksimek.github.io/2013/08/13/intrinsic/