Micro-robots, unmanned aerial vehicles (UAVs), imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene.
Computer vision is the task of extracting high-level information from raw images. A goal of generic or general-purpose synthetic vision systems includes the elaboration of a model that captures the relationships between high dimensional data (images, videos) into a low-dimensional decision space, where arbitrary information can be retrieved easily, e.g. with simple linear classifiers or nearest neighbor techniques. The exploration of such models has been an active field of research for the past decade, ranging from fully trainable models—such as convolutional networks—to hand-tuned models—HMAX-type architectures, as well as systems based on dense SIFT (Scale-Invariant Feature Transform) or HoG (Histograms of Gradients).
Many successful object recognition systems use dense features extracted on regularly-spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D gabor functions), a non-linear operation (quantization, winner-take-all, sparsification, normalization, and/or point-wise saturation) and finally a pooling operation (max, average or histogramming). For example, the scale-invariant feature transforms operator-applied oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form local orientation histograms. Some recognition systems use a single stage of feature extractors. Other models, like HMAX-type models and convolutional networks, use two or more layers of successive feature extractors.
Graphics Processing Units (GPUs) are specialized chips designed to process image data. GPUs are becoming a common alternative to custom hardware in vision applications. Their advantages over custom hardware are numerous: they are inexpensive, available in most recent computers, and easily programmable with standard development kits. However, development of custom hardware solutions, however, continues. Reasons for continued development of custom hardware architectures include performance and power consumption considerations. Development of custom architectures can improve performance and power consumption compared to the performance of general CPUs and GPUs.