Homography in changing camera frames(2): Properties

In the last post, we have properly set up the context of the problem. Figure 1 might help you visualize it and recall the forms that we have defined.

Figure 1: context of a moving camera frame

Now that we have all the recipes that we need, let’s focus on the image transformation. Recall that the goal of the post is to estimate a homography between images taken from 2 different camera frames.

What’s a homography?

It is believed that two images of the planar object can be associated with a homography, with the assumption of a pinhole camera model. More generally speaking, a homography is a synonym of projective transformation, which preserves the collinearity between points.

Considering the homogeneous coordinates p = (u, v, 1) and p^* = (u^*, v^*, 1) in the image plane, a typical homography can be expressed by a 3 \times 3 matrix as follows:

H =\begin{bmatrix}h_{11} & h_{12} & h_{13} \\h_{21} & h_{22} & h_{23} \\h_{31} & h_{32} & h_{33} \end{bmatrix}

Properties:

In the document[2], they give the property without further explanation, which makes me confused for a long time. So I try to explain it step by step.

First of all, we need to distinguish an image-level homography and a projective homography. The former can be estimated numerically by finding corresponding coplanar points in two images. That means solving a system having 8 unknown(since there is a scale factor in homogeneous coordinates, there are 8 unknown instead of 9). In contrast, the latter needs the camera properties to be taken into account. That’s why I emphasize from the very beginning that we are talking about homography with camera frames.

An image-level homography can also be estimated with the camera frames configuration. Furthermore, the calculation might be more efficient due to its matrix form. the homography can be estimated by:

H = R + \frac{tn^{*\top}}{d^*}

where R and t are the rotation matrix and translation vector from F_c to F_d, respectively. n^{*} is the normal of the planar surface and d^* is the distance to the surface, both measured in the desired camera frame.

To explain this equation, we assume the plane equation of the plane surface, in the desired camera frame as n^{*\top}P_i^* + d^* = 0. Note that P_i^* can be seen as a vector from origine pointing to P_i^*, so that n^{*\top}P_i^* is the projection of vector OP_i^* on the normal, equaling to -d^*. So we have the translation t = t \times 1 = t(\frac{n^{*\top}P_i^*}{d^*}). Given that we have the equivalence: HP_i^* = RPi^* + t, we obtain the equation above.

However, we cannot directly deduce p^* = Hp, since p is, in fact, the projection of m using the camera calibration matrix K. Idem to the p^*. So to formally establish the relationship between p and p*, we have:

p_i^* = \frac{z_i}{z_i^*}K^*HK^{-1}p

where z_i and z_i^* are the z coordinate of the points. Since we assume that they are normalized, we can further simplify the equation to be:

p_i^* = K^*HK^{-1}p

Voilà, that is how we can deduce about the homography in moving camera frames. Please don’t hesitate to tell me if some points are not clear enough. 🙂

Thank you for your reading!

Reference:

https://en.wikipedia.org/wiki/Homography_(computer_vision)

https://hal.inria.fr/inria-00174036v3/document

Leave a Reply

Your email address will not be published. Required fields are marked *