Machine vision (or computer vision) refers to technology that allows a computer to use visual information, e.g., to extract information from an image, to solve some task, or perhaps “understand” the scene in either a broad or limited sense. In general, machine vision is concerned with the extraction of information from image data. The image data can take many forms, such as single images, video sequences, views from multiple cameras, or higher dimensional data (e.g., three dimensional images from a medical scanner).
Machine vision has numerous applications, ranging from relatively simple tasks, such as industrial systems used to count objects passing by on a production line, to more complicated tasks such as facial recognition, and perceptual tasks (e.g., to allow robots to navigate complex environments). A non-limiting list of examples of applications of machine vision include systems for controlling processes (e.g., an industrial robot or an autonomous vehicle), detecting events (e.g., for visual surveillance or people counting), organizing information (e.g., for indexing databases of images and image sequences), modeling objects or environments (e.g., industrial inspection, medical image analysis or topographical modeling), and interaction (e.g., as the input to a device for computer-human interaction).
In many applications, machine vision involves highly computationally expensive tasks. A single color digital image may be composed of millions of pixels or more, each pixel having an associate value, such as a multiple (e.g., 8 or 24) bit value defining the coordinates of the pixel in a color space (e.g., the familiar RGB color space, the YCbCr space, the HSV space, etc.). Video streams may include sequences of such images at frame rates of, e.g., dozens of frames per second, corresponding to bit rates of hundreds of megabits per second or more. Many machine vision applications require quick processing of such images or video streams (e.g., to track and react to the motion of an object, to identify or classify an object as it moves along an assembly line, to allow a robot to react in real time to its environment, etc.).
Processing such a large volume of data under such time constraints can be extremely challenging. Accordingly, it would be desirable to find techniques for processing image data to reduce the raw amount of information while retaining (or even accentuating) the features of the image data that are salient for the machine vision task at hand. This pre-processed image data, rather than the raw data, could then be input to a machine vision system, reducing the processing burden on the system and allowing for sufficiently speedy response and potentially improved performance.
It has been recognized that the retina of the vertebrate eye provides image processing of this just this nature, taking in a visual stimulus and converting the stimulus into a form that can be understood by the brain. This system (developed over the course of millions of years of evolution) is remarkably efficient and effective, as evidenced by high level of complex visual perception in mammals (particularly monkeys and humans).
Several approaches have been proposed for developing image data pre-processing schemes for machine vision based on abstract models of the operations of the retina. However, these models have been based on rough approximations to the actual performance of the retina.