This post aims at modeling the image transformation between two camera frames by a homography. I have searched on the internet, most of the posts/forums present the homography in a general way. But in a specific context, for instance, a moving camera frame, a detailed study can reveal some interesting properties.
Given 2 camera frames and , where is the current frame and is the desired frame. The transformation between the two camera frames can be estimated as a rotation and a translation .
Let’s consider a planar surface(the planar assumption is a strong assumption for homography estimation) which is a composition of points. For a point in the planar surface, we have a Cartesian coordinate:
in and in
Since the transformation is relative, namely, the transformation of the camera frame can be considered as the transformation of points in the camera frames. We can then model the homogeneous transformation matrix for the 3D points as follows:
As a reminder, the homogeneous coordinate for a 3D point would be , where is a scale factor. Note that the transformation is from frame to frame . Figure 1  visualize the context to help better understand the configuration.
We assume then one image is taken at each position. That means the surface would be projected to a plane that is parallel to the plane of the camera frame(imagine that the camera looks along axis).
In other words, the projected points of the surface would be considered to have the same coordinate. We can normalize the coordinate to be , then we have the projection of the point . In Figure 1, is the projection of in frame and is that in the frame , respectively. The following equation holds:
In the end, due to the camera’s setting and quality, there exists some kind of distortion related to the camera itself. We model this effect by the intrinsic parameters of the camera, denoted as . In general, is an upper triangular matrix containing the camera intrinsic parameters as follows:
where and are focal length of the camera, and are principal points offset and is the axis skew. More details can be found here. is also named as a camera calibration matrix.
In the end, we can recover the homogeneous image coordiante, denoted as using the following transformation.
For further calculation of homography, we define the distance from the camera frame to the plane as and (dashed line in Figure 1), the normal of the plane surface as and in two different frames.
That is the first part of the homography estimation in moving camera configuration. It takes a while to set up properly the configuration. I think it is important since “A good beginning is half done”.
With a well-defined system, we will explore some interesting properties of homography in this specific context. Please see the next post!