1. Field of the Invention
The present invention relates to a method and a device for object recognition in at least one digital image.
2. Description of the Related Art
Generally, the aim of a method of recognition of shapes is to recognize an object or a type of object that has been photographed when the relative position of the object and of the real or simulated photographing device are unknown, or when the object has possibly been distorted. The object itself can be a graphic and non-physical object (for example a digital logo or the result of a simulation). For simplicity the photographing device (or device for simulating photographing) will be called “camera” hereinafter, but the invention relates to any acquisition of images, and any distortions or geometric deformations of the view of an object caused by the change in position of the camera relative to the object, or by the particular characteristics of the device for acquisition or simulation of images. Moreover, the objects photographed or simulated do not need to be identical, it is sufficient that they are similar, a common situation for objects resulting from industrial or graphical production. One or more images of the object to be recognized are available: these are the “query” images. The image or images where the object is sought do not necessarily contain it. The purpose is to find reliable signs for knowing whether the object is present in the images analysed, and to give its position in the image.
The first simplification proposed by all methods dealing with the recognition problem is to assume that the object has a sufficiently regular relief for the local deformations in the target images to be interpreted as planar affine deformations of the query image. Most physical objects of interest are in fact volumes whose surface has plane or slightly curved faces. Exceptions are rare. As an example of an exception, consider a tree without leaves, the appearance of which can change dramatically on changing the viewing angle, or the ripples of a liquid. Now, any regular deformation in the mathematical sense of the (differentiable) term is, locally in the image, close to an affine deformation. This is the case in particular for the apparent deformation of the optical image of a fairly regular object, when this apparent deformation of the image is caused by the movement of the camera, or by the optical distortions of the camera, or by the movement of the object, or even by a gradual deformation of the object itself. For example, in the case of a flat object the deformation of its image caused by a change of position of the camera observing it is a plane homography, which is at every point tangent to an affine application. If, moreover, the camera is quite distant from the object observed, this deformation of the image resembles an overall affine transformation more and more. Conversely, any affine transform of the image plane with positive determinant can be interpreted as a deformation of the image due to the movement in space of a camera observing the image and located far from the image (virtually at infinity). It should be recalled that an affine deformation of the (x,y) coordinate plane is written in the formx′=ax+by+e,y′=cx+dy+f, 
and the parameters a, b, c, d form a matrix with two rows and two columns, which we shall designate A. The affine deformation of an image u(x,y) is therefore writtenu(x′,y′)=u(A(x,y)+(e,f))
For the reasons given above, the problem of recognition of shapes can be reduced to finding local characteristics of images that are invariant modulo an affine transformation. These characteristics are then robust to the apparent local deformations caused by the relative movements of the object and of the camera, as well as to the distortions caused by the acquisition device, for example the optical distortion of a lens, and finally to the distortions due to the deformations of the object itself.
Hereinafter, the terms “tilt” and “digital” will be used; these are terms commonly used by a person skilled in the art and which mean tilt and digital respectively. The terms SIF and SIFT will also be used; these are abbreviations known to a person skilled in the art, signifying respectively “scale invariant feature” and “scale invariant feature transform”.
Document U.S. Pat. No. 6,711,293 (Lowe) describes a method called the SIFT method for “scale invariant feature transform” making it possible to recognize objects in an image taken from the front by a camera. In this document U.S. Pat. No. 6,711,293 it is considered that exploring the entire affine space would be prohibitive and inefficient. Lowe finally comments that the defect of invariance of his SIFT method could be compensated by taking real views of 3D objects spaced 30 degrees apart.
The document “Cloth Motion Capture”, by D. Pritchard and W. Heidrich, Eurographics 2003/volume 22, Number 3, describes a method for determining SIFT characteristics, in which, from an initial image taken from the front, four simulated images are produced with a tilt equal to two. The first simulated image is obtained for a tilt realised on the horizontal, the second on the vertical, the third and fourth on two axes of 45 degrees. This method therefore provides four simulated images in order to improve recognition.