# Homography (computer vision)

{{#invoke:Hatnote|hatnote}}

Geometrical setup for homography: stereo cameras O1 and O2 both pointed at X in epipolar geometry. Drawing from Neue Constructionen der Perspective und Photogrametrie by Hermann Guido Hauck (1845 — 1905)

In the field of computer vision, any two images of the same planar surface in space are related by a homography (assuming a pinhole camera model). This has many practical applications, such as image rectification, image registration, or computation of camera motion—rotation and translation—between two images. Once camera rotation and translation have been extracted from an estimated homography matrix, this information may be used for navigation, or to insert models of 3D objects into an image or video, so that they are rendered with the correct perspective and appear to have been part of the original scene (see Augmented reality).

## 3D plane to plane equation

We have two cameras a and b, looking at points ${\displaystyle P_{i}}$ in a plane. Passing the projections of ${\displaystyle P_{i}}$ from ${\displaystyle {}^{b}p_{i}}$ in b to a point ${\displaystyle {}^{a}p_{i}}$ in a:

${\displaystyle {}^{a}p_{i}=K_{a}\cdot H_{ba}\cdot K_{b}^{-1}\cdot {}^{b}p_{i}}$

where the homography matrix ${\displaystyle H_{ba}}$ is

${\displaystyle H_{ba}=R-{\frac {tn^{T}}{d}}.}$

${\displaystyle R}$ is the rotation matrix by which b is rotated in relation to a; t is the translation vector from a to b; n and d are the normal vector of the plane and the distance to the plane respectively. Ka and Kb are the cameras' intrinsic parameter matrices.

The figure shows camera b looking at the plane at distance d. Note: From above figure, assuming ${\displaystyle n^{T}P_{i}+d=0}$ as plane model, ${\displaystyle n^{T}P_{i}}$ is the projection of vector ${\displaystyle P_{i}}$ into ${\displaystyle n^{T}}$, and equal to ${\displaystyle -d}$. So ${\displaystyle t=t\left(-{\frac {n^{T}P_{i}}{d}}\right)}$. And we have ${\displaystyle H_{ba}P_{i}=RP_{i}+t}$ where ${\displaystyle H_{ba}=R-{\frac {tn^{T}}{d}}}$.

This formula is only valid if camera b has no rotation and no translation. In the general case where ${\displaystyle R_{a},R_{b}}$ and ${\displaystyle t_{a},t_{b}}$ are the respective rotations and translations of camera a and b, ${\displaystyle R=R_{a}R_{b}^{T}}$ and the homography matrix ${\displaystyle H_{ba}}$ becomes

${\displaystyle H_{ba}=R_{a}R_{b}^{T}-R_{a}{\frac {(t_{b}-t_{a})n^{T}}{d}}R_{b}^{T}=R_{a}\left(I-{\frac {(t_{b}-t_{a})n^{T}}{d}}\right)R_{b}^{T}.}$

where d is the distance of the camera b to the plane.

## Mathematical definition

In higher dimensions Homogeneous coordinates are used to represent projective transformations by means of matrix multiplications. With Cartesian coordinates matrix multiplication cannot perform the division required for perspective projection. In other words, with Cartesian coordinates a perspective projection is a non-linear transformation.

Given:

${\displaystyle p_{a}={\begin{bmatrix}x_{a}\\y_{a}\\1\end{bmatrix}},p_{b}^{\prime }={\begin{bmatrix}w^{\prime }x_{b}\\w^{\prime }y_{b}\\w^{\prime }\end{bmatrix}},\mathbf {H} _{ab}={\begin{bmatrix}h_{11}&h_{12}&h_{13}\\h_{21}&h_{22}&h_{23}\\h_{31}&h_{32}&h_{33}\end{bmatrix}}}$

Then:

${\displaystyle p_{b}^{\prime }=\mathbf {H} _{ab}p_{a}\,}$ where ${\displaystyle {\mathbf {H} }_{ba}={\mathbf {H} }_{ab}^{-1}.}$

Also:

${\displaystyle p_{b}=p_{b}^{\prime }/w^{\prime }={\begin{bmatrix}x_{b}\\y_{b}\\1\end{bmatrix}}}$

## Affine homography

When the image region in which the homography is computed is small or the image has been acquired with a large focal length, an affine homography is a more appropriate model of image displacements. An affine homography is a special type of a general homography whose last row is fixed to

${\displaystyle h_{31}=h_{32}=0,\;h_{33}=1.}$