The fast, robust, and accurate localization of a given 2D object template in images is the natural prerequisite for numerous computer vision and particularly machine vision applications. For example, for pick and place applications, an object recognition method must determine the location of the object that is imaged. Given its location in conjunction with the known geometry of the imaging device, a pose of the object can be calculated by methods that are well known in the art. Given this pose, a robot can grasp the object from, e.g., a conveyor belt. In the visual tracking of objects, an object recognition method must determine the location of an object in a sequence of images. For example, in image-based visual servo control, the object location can be used to control the robot motion. In traffic monitoring, an object recognition method must detect traffic participants such as cars or pedestrians in a video stream or image sequences.
Several methods have been proposed in the art to determine the position of an object in an image. Most of the methods compare a similarity between a set of possible matching model poses and the image. Positions that exceed a threshold and are local maxima with respect to this similarity measure are chosen as the location of the object.
Depending on the similarity measure that is used, a certain invariance against adverse imaging conditions is achieved. For instance, with normalized correlation as the similarity measure, invariance against linear gray value changes between the model image and the search image is achieved.
Many methods in the art represent the 2D object template by a matching model that consists of a plurality of model points, e.g. approaches that are based on Chamfer-Matching (Borgefors, 1988), approaches that are based on the Hausdorff Distance (Rucklidge, 1997, Kwon et al. 2001), or approaches that are based on geometric hashing (Lamdan and Schwartz, 1990). The model points may either be a sparse set of feature points such as edges or corner points, or a dense set of points that cover the 2D object template. The similarity between a possible matching model pose and the image is determined through the similarity of a subset of the matching model points and the image. Other methods use model points and directions for matching a model to an image. Directions can be represented, for example, by direction vectors or angles. Common examples of object recognition methods in the art that use directions are approaches that are based on the generalized Hough transform (Ballard, 1981, Ulrich et al. 2003) or approaches that are based modifications of the Hausdorff-Distance (Olson and Huttenlocher, 1997). Furthermore, there are approaches that use the dot product of the normalized directions of image and model feature as similarity measure, which is invariant against partial occlusion, clutter, and nonlinear contrast changes (EP 1193642, EP 2081133, EP 1394727, EP 2048599). Other approaches use angles for matching, where the model points may further include other attributes, such as an individual per-point weight (e.g. U.S. Pat. Nos. 6,850,646, 7,190,834, 7,016,539).
Typically, an exhaustive search over all pose parameters is computationally very expensive and prohibitive for most real-time applications. Most of the prior art methods overcome this speed limitation by building an image pyramid from both the model and the search image (see e.g., Tanimoto (1981) or Brown (1992)). Then, the similarity measure is evaluated for the full search range only at the highest pyramid level. At lower levels, only promising match candidates are tracked until the lowest pyramid level is reached.
A further way to speed up the search is to restrict the possible transformations the model may undergo in the search image. An affine transformation that maps input points (x, y)T to output points (x′, y′)T can be described in the geometrically meaningful parameterization
      (                                        x            ′                                                            y            ′                                )    =                    (                                                            cos                ⁢                                                                  ⁢                φ                                                                                      -                  sin                                ⁢                                                                  ⁢                φ                                                                                        sin                ⁢                                                                  ⁢                φ                                                                    cos                ⁢                                                                  ⁢                φ                                                    )            ⁢              (                                            1                                                                        -                  sin                                ⁢                                                                  ⁢                θ                                                                        0                                                      cos                ⁢                                                                  ⁢                θ                                                    )            ⁢              (                                                            S                x                                                    0                                                          0                                                      S                y                                                    )            ⁢              (                                            x                                                          y                                      )              +                  (                                                            t                x                                                                                        t                y                                                    )            .      
The parameters describe a scaling of the original x and y axes by the different scaling factors sx and sy, a skew transformation of the y axis with respect to the x axis, i.e., a rotation of the y axis by an angle θ, while the x axis is kept fixed, a rotation of both axes by an angle φ, and finally a translation by a vector (tx,ty)T. Typically, an object recognition system evaluates these parameters only for a reduced subset, e.g., only translation and rotation. Furthermore, the parameters are restricted to a certain fixed range, e.g., a reduced rotation range. This reduces the space of possible poses that an object recognition system must check on the highest pyramid level and hence speeds up the search. In some situations, the object that must be found is transformed according to a more general transformation than an affine transformation or a subset thereof. Two such transformations are a perspective transformation or a non-linear deformation.
The state-of-the-art methods for object recognition assume a constant model for object localization. Although some approaches allow a certain variation of the model points (see, e.g., Cootes (1995)), the model is not adapted after the object localization in the search image. This has several shortcomings when localizing an object that may change its appearance. Two common computer vision and machine vision applications are visual tracking, where an object is to be localized throughout a sequence of images, and the localization of deformable objects. An adaptation of the model after the object is successfully localized improves the localization quality in future images.
In visual tracking scenarios, most objects are 3D and their movement and rotation throughout the 3D world is generally not an affine transformation, which most of the 2d object recognition approaches assume. It is an open question how to adapt a point-based matching model to a localized object to a more general transformation than an affine transformation or a subset thereof.
The aim of the presented invention is to provide a general approach to adapt a point-based matching model to ensure a robust recognition even if the object changes its appearance.