One of the most difficult tasks in machine vision is recognition of objects within a scene (i.e. in an image frame captured by an image acquisition device). A human can recognize such an object quite easily in a variety of environmental conditions, even when the object is partially obscured, has variations or imperfections, or is in front of or behind other objects, and from different perspective and scales, and in lighting and other conditions that are difficult for machines to emulate.
In order to determine if a particular object is present in a scene, the system must be able to distinguish objects within the image. A human can easily and naturally perform this task in both 2D and 3D images of scenes. For a single camera machine vision system, however, the data arrives as a 2D image and individual objects are not identified. The machine vision system may use a number of known techniques to identify an object, such as edge detection, feature extraction and the like.
Once an object is detected in the image, it must then be recognized. A typical machine system compares a detected object to a reference model stored in a database. If the object is rotated or skewed, or is viewed from a different perspective, object rotation and positioning algorithms may be applied to normalize the detected object for recognition. The detected object may also be scaled up or down to improve the chances of matching a reference model of the object.
Current systems do not perform well in varying lighting and environmental conditions. The changing of incident light angles, reduced brightness or very bright lighting, and the like, affect the ability of the system to even extract features or edges of an object to allow for object recognition. In current systems, object detection and recognition are linked problems that have meaningful impact on each other. Poor object detection leads to a reduced likelihood of accurate object recognition. Furthermore, when the case includes a database of multiple reference images of multiple objects, recognition becomes harder and confusion may occur. An object under some light conditions and animation transformation may suddenly look like another object in the database and lead to a false match.
This problem can be made worse depending on the object to be recognized. Simple, planar, geometrical objects are easier to recognize, but limit the system to such objects. A non-planar object is more sensitive to light than a planar object as its curves create a shadow over the object itself. When the scene has more than one light and/or non-homogenic light this problem becomes even worse. The information that the recognition or tracking system is looking for may change or even disappear from the scene. Current systems try to solve the problem of lighting variation by extra processing methods to the original image, such as smoothing the image, blurring its features, working in gray scale, manipulating the color density of the image so it will better represent the object in real situations, etc. However, these solutions are problematic as they rely on the assumption that the effect of the light over the object is homogeneous. That is, the distribution of light over the whole surface of the object is mistakenly assumed to be exactly the same. Furthermore, those assumptions cannot deal with different light sources from different angles.