Object recognition technology detects and identifies objects (e.g., targets) in an image or a video sequence. Various automatic target recognition (ATR) systems have been designed to obtain accurate classification from features of a target/object extracted from imagery obtained by one or more sensors. Such systems generally attempt to recognize a target and classify a target type from a set of target templates (e.g., stored in a database), created based on models generated from the extracted features from one sensor data and/or fused features (e.g., features from multiple sensors and/or multiple scenes).
In image processing, feature extraction and Regions of Interest (ROI) extraction are two forms of processing employed to automatically recognize objects. The features set includes the relevant information about an object that helps in the classification.
Feature extraction has been widely used in image processing and, object detection and recognition which use different algorithms. Object recognition is the task of classifying an object in an image or a video sequence. In a typical feature-based approach, first a search is used to find the ROIs and then features are extracted from the ROIs. The vector of these extracted features are then compared with the models created for each object type during the training of the classifier. The model of an object type that the extracted features closely matches is declared as the type of object. The more accurate the extracted features are, the more accurate the trained models are and also results of object classification. However, to extract accurate feature from an image, complex processing is required and if the feature extraction and classification are to be performed in real-time, there is a limit to the ability of the systems, especially the hardware, to perform an accurate and real-time feature extraction and classification.
The features are extracted from the digitized image or a video sequence. The important features related to objects of interest are contained in only a small set of digitized data. As a result, most of the digitized data is discarded. To digitize images/video sequences intelligently and thus speed up the feature extraction/object recognition processes, some have recently developed and used compressive sensing (CS) approaches. To reconstruct the images in entirety from the intelligently sampled data need to apply inverse compressive sensing (ICS) techniques. CS and ICS are signal processing techniques to efficiently/intelligently acquire and reconstruct a signal, by finding solutions to underdetermined linear system of equations. These techniques use the sparsity property of a signal to recover the signal from far fewer samples than required by the Shannon-Nyquist sampling theorem (Nyquist rate), by optimization. With no a priori knowledge or assumptions about the signal, it is possible to reconstruct the signal from a series of CS measurements. Compressive sensing takes advantage of the fact that a signal can be sparsely represented in a transformed domain (e.g., when a sinusoidal or cosine signal is transformed to Fourier domain by applying the Fourier transform, it can be represented by just two coefficients.). Many signals can be sparsely represented in a transformed domain and thus contain many coefficients in that domain close to or equal to zero (e.g., Fourier or Wavelet). The approach typically starts with taking a weighted linear combination of samples (compressive measurements) using a set of basis functions that are different from the set of basis functions in which the signal is known to be sparse.
A hardware-based single-pixel camera (SPC) was developed based on the CS mathematics and was used to sample images at much lower than the Nyquist's rate. The SPC directly acquires random samples of a scene without first digitizing pixels/voxels. The camera architecture employs a digital micromirror device (DMD) array to optically apply linear projections of pseudorandom binary patterns on to a scene. These pseudorandom binary patterns turn “on” (1) or “off” (0) the DMD mirrors. The light reflected from all the “on” mirrors are collected by a single photo detector. The photo detector converts light in to voltage. Each voltage value is digitized by an analog to digital converter that is associated with a photo detector. Each digitized value corresponds to a sample or a CS measurement. In this fashion, a SPC measures or samples a scene much fewer times than the number of pixels/voxels in a scene (i.e., under-sampling of the scene image). This camera can be adapted to capture images at wavelengths where conventional charge coupled device (CCD) and CMOS imagers are not capable of capturing, because the SPC relies on a single photo detector.
Some recent approaches utilize a SPC using spatial light modulator—a DMD array to capture the entire scene. These approaches project the scene on to an array of DMDs and control the array with some random patterns. The light reflected from the DMD mirrors are then projected on to a detector. DMDs are controlled with one pattern at a time of the same size as a scene and the detector values were digitized and considered as samples or measurements. Several of these measurements are collected and used in reconstructing a scene using an ICS reconstruction method. However, capturing large scenes with a SPC is difficult, and in some cases, impractical due to its computational complexity. Moreover, matching the size of the patterns used in sampling to the size of an image that is being sampled is difficult because it is not always possible to know the size of a scene a priori.
To improve these approaches, the scene may be divided into a number of blocks of equal fixed sizes, for example, 32×32 pixel blocks. This block-based CS approach is based on generalizing the SPC approach to sense large scenes in smaller blocks using a large array of programmable DMD array. The DMD array is also divided into the same size blocks and is controlled using a pattern of the same size. Light reflected from each block of the DMD is projected on to each detector in a detector array, for example a focal plane array. Pixel values at each detector are collected and digitized. Each of these pixel values is considered as a CS measurement. Each block of DMDs are controlled by the same pattern and pixel values of each block is measured. The measurements from each block of DMDs include spatial information that correspond to one frame of an image. Different patterns are used per frame and many frames of images are collected. Using measurements across different frames, a scene is reconstructed using an inverse CS reconstruction method.
However, this approach, like the SPC approach, uses only one channel imaging system and thus cannot be used to simultaneously capture different types of information about an image.