1. Technical Field
The present invention relates to a method for registering at least one part of a first and second image using a collineation warping function. Moreover, the present invention relates to a computer program product comprising software code sections for implementing the method.
2. Background Information
Such a method is often required in computer vision applications such as augmented reality (AR) applications. For example, given a first image T as template image, many applications in the field of computer vision require to spatially register at least part of a second image I which is a current camera image to it. Examples include panorama stitching and camera pose estimation, for example for a video-see-through augmented reality application. Assuming a perfect pinhole camera that does not have any lens distortions, the transformation of any part of the template corresponding to a planar surface to its corresponding position in a current camera image is composed of a 3D translation followed by a 3D rotation and finally a perspective projection onto the image plane. This transformation can be fully described with a collineation W representing a perspective transformation between the 2D points on the template and the 2D corresponding points in the camera image.
The collineation is often represented by a (3×3) matrix that is invertible. The matrix is defined up to a scale and has eight degrees of freedom and could be written as:
  W  =                                                                        [                                                                                                    p                        ⁢                                                                                                  ⁢                        1                                                                                                            p                        ⁢                                                                                                  ⁢                        2                                                                                                            p                        ⁢                                                                                                  ⁢                        3                                                                                            ]                                                                                        [                                                                                                    p                        ⁢                                                                                                  ⁢                        4                                                                                                            p                        ⁢                                                                                                  ⁢                        5                                                                                                            p                        ⁢                                                                                                  ⁢                        6                                                                                            ]                                                                                  [                                                                      p                  ⁢                                                                          ⁢                  7                                                                              p                  ⁢                                                                          ⁢                  8                                                            1                                              ]                    
The collineation defines a one-to-one and onto warping. The collineation warping function transforms a point x=[u,v] from a first image into a point x′=[u′,v′] in a second image as follows:u′=(p1 u+p2 v+p3)/(p7 u+p8 v+1)v′=(p4 u+p5 v+p6)/(p7 u+p8 v+1)
Such warping preserves collinearity, concurrency, order of contact and cross ratio. There are two divisions per warped point which makes the warping expensive in terms of computational cost.
The collineation warping is called affine warping when the entries p7 and p8 of the corresponding matrix are equal to zeros. The collineation can then be represented with a matrix
  A  =                                                                        [                                                                                                    p                        ⁢                                                                                                  ⁢                        1                                                                                                            p                        ⁢                                                                                                  ⁢                        2                                                                                                            p                        ⁢                                                                                                  ⁢                        3                                                                                            ]                                                                                        [                                                                                                    p                        ⁢                                                                                                  ⁢                        4                                                                                                            p                        ⁢                                                                                                  ⁢                        5                                                                                                            p                        ⁢                                                                                                  ⁢                        6                                                                                            ]                                                                                  [                                                    0                                            0                                            1                                              ]                    
Therefore, the affine warping function transforms a point [u,v] from the first image into a point [u′,v′] in a second image as follows:u′=p1 u+p2 v+p3v′=p4 u+p5 v+p6
Note that in this case the number of operations for an affine warping is lower than for a standard collineation warping function. Especially, since there is no division in the affine warping function, it is much faster on limited computational power devices.
Additionally, and among others, the affine warping preserves parallelism and ratio of distances.
When the the first and the second image are acquired such that the image plane is parallel to a certain planar surface then the collineation warping corresponding to that planar surface is an affine warping.
In the art, the following references have been published in this field:                [1] C. Steger. Occlusion, clutter, and illumination invariant object recognition. Int. Arc. Photo. Remote Sensing, XXXIV, part 3A:345-350, 2002.        [2] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, 1981.        [3] Wonwoo Lee et al. Point-and-Shoot for Ubiquitous Tagging on Mobile Phones. Proc. International Symposium on Mixed and Augmented Reality 2010.        [4] Myung Hwangbo, Jun-Sik Kim, and Takeo Kanade. Inertial-Aided KLT Feature Tracking for a Moving Camera. The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.        [5] Lobo, J. and Dias, J. Vision and Inertial Sensor Cooperation Using Gravity as a Vertical Reference. IEEE Transactions on Pattern Analysis and Machine Intelligence. Volume: 25 Issue: 12, pages 1597-1608.        [6] Simon Baker and Iain Matthews. Equivalence and Efficiency of Image Alignment Algorithms. Proceedings of the 2001 IEEE Conference on Computer Vision and Pattern Recognition.        [7] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab. Learning Real-Time Perspective Patch Rectification. International Journal of Computer Vision, 2010.        
A standard approach for finding a transformation as mentioned above is displayed in FIG. 1.
In a first step S1 one or multiple template or reference images (first image) are captured by a camera or loaded from a source. Then at first, a current image (second image) is either captured or loaded (step S2). In the next step S3 the actual estimation takes place. A collineation warping function has to be found that registers at least part of the current image and the corresponding position in the template image. Among other techniques, this can be done in an iterative minimization process where a first set of pixels in the template image is compared with a computed set of pixels in the current camera image and the computed set of pixels in the camera image used for the comparison varies at each iteration, see for example [2].
Mathematically all approaches for registering at least part of a camera image with at least part of the template image carry out an eight-dimensional non-linear optimization. The goal is to find a vector of warping parameters that result in an extremum of a similarity measure between the template and the warped current image over all pixels corresponding to the template image. This is usually a computational very expensive task.
Finally in step S4 the found collineation warping function W is used in an application.
The standard approaches provide the following limitations:
The eight-dimensional non-linear optimization needed to find a collineation warping function that registers at least part of the current image with a template image is expensive and makes applications relying on it particularly challenging on mobile devices with limited processing power. In iterative approaches, such as in Lucas-Kanade [2], expensive nonlinear warping of pixels in the current image has to be computed in every iteration in order to compute the similarity with the template.
Besides computational complexity, the memory consumption can be tremendous in approaches that transform the template in many possible ways in an offline step. The number of pre-computed transformations and therefore memory consumption increases (exponentially) with the degrees of freedom (DoF). For an arbitrary rigid transformation there are 6 DoF (3D translation and 3D rotation). For each current image such approaches try to find the pre-computed transformation closest to the current one, see e.g. [1]. The enormous amount of pre-computed data needed makes them not feasible on memory limited devices such as mobile phones.
There exist already proposed solutions, in which weak perspective projection approximates perspective by scaled orthographic projection (i.e., a linear transformation). While this approximation allows for linear warping which is in general faster to compute than non-linear warping, it can only be used for image registration if the template is located close to the optical axis and far away from the camera in the current image.
Affine transformations only support translation, in-plane rotation and scale. This again results in fast linear warping but since the plane where all points lie on has to be always parallel to the image plane, the range of applications is very limited.
For example, the authors in [3] use the orientation of a mobile capturing device measured by accelerometers to rectify images they use as a template image. During alignment of current camera images with the template image, they do however not consider the orientation of the device at all.
The authors in [4] use a gyroscope attached to the camera to predict the position and orientation of features they track from a current image to the next current image in a KLT tracker. This is particularly useful for fast camera movements.
The authors in [5] use inertial sensors attached to a stereo camera to determine which features lie on the ground plane and which do not. Also they are able to detect vertical features originating from the ground plane such as the corners of a room.
While the approaches to combine inertial sensors with computer vision described above do not aim at registering at least part of a camera image with a template image, affine transformations and weak perspective projections only deliver approximations to the problem. These only work for very specific cases.
It would therefore be desirable to have a technique to gain the collineation warping function needed to register at least part of a current camera image with at least part of a template image for arbitrary camera positions and orientations that delivers similar results as standard approaches at lower computational costs.