In the last post, we have properly set up the context of the problem. Figure 1 might help you visualize it and recall the forms that we have defined.
Now that we have all the recipes that we need, let’s focus on the image transformation. Recall that the goal of the post is to estimate a homography between images taken from 2 different camera frames.
What’s a homography?
It is believed that two images of the planar object can be associated with a homography, with the assumption of a pinhole camera model. More generally speaking, a homography is a synonym of projective transformation, which preserves the collinearity between points.
Considering the homogeneous coordinates and in the image plane, a typical homography can be expressed by a matrix as follows:
In the document, they give the property without further explanation, which makes me confused for a long time. So I try to explain it step by step.
First of all, we need to distinguish an image-level homography and a projective homography. The former can be estimated numerically by finding corresponding coplanar points in two images. That means solving a system having 8 unknown(since there is a scale factor in homogeneous coordinates, there are 8 unknown instead of 9). In contrast, the latter needs the camera properties to be taken into account. That’s why I emphasize from the very beginning that we are talking about homography with camera frames.
An image-level homography can also be estimated with the camera frames configuration. Furthermore, the calculation might be more efficient due to its matrix form. the homography can be estimated by:
where and are the rotation matrix and translation vector from to , respectively. is the normal of the planar surface and is the distance to the surface, both measured in the desired camera frame.
To explain this equation, we assume the plane equation of the plane surface, in the desired camera frame as . Note that can be seen as a vector from origine pointing to , so that is the projection of vector on the normal, equaling to . So we have the translation . Given that we have the equivalence: , we obtain the equation above.
However, we cannot directly deduce , since is, in fact, the projection of using the camera calibration matrix . Idem to the . So to formally establish the relationship between and , we have:
where and are the z coordinate of the points. Since we assume that they are normalized, we can further simplify the equation to be:
Voilà, that is how we can deduce about the homography in moving camera frames. Please don’t hesitate to tell me if some points are not clear enough. 🙂
Thank you for your reading!