Representing an image is a fundamental problem in many image/video analysis and synthesis applications, such as 3D modeling, motion tracking, correspondence matching, image recognition/categorization/retrieval and other applications in computer vision. Image representations can be categorized as global methods and local methods. For example, an image (as a whole) can be globally represented by global intensity histogram. However, such histograms are often not distinctive enough to characterize the appearance of the image. An example of a local method is image representation through sparse local features, which decomposes an image into multiple parts or patches, and the image is described as a constellation of these local features.
In image processing and analysis, a feature generally is a piece of information that is relevant for the particular processing or analysis task. A local feature typically has two components, a detector and a descriptor. The detector identifies features for further processing and analysis. Normally, the detector only selects a small subset of highly distinctive pixels from the whole image. The descriptor characterizes the local image content of patches centered at the detected points using a feature vector. Thus, the feature detectors attempt to select stable and reliable image locations that are informative about image content and the descriptor describes the local patch in a distinctive way with a feature vector (usually a much lower dimension than the original patch). The overall usefulness of the local feature is affected by the reliability and accuracy of the detection (localization) and distinctiveness of the description.
In order to achieve reliable operation of high level applications, it is desirable to design local features such that they are invariant (robust) to various geometric and/or photometric changes. For example, some local features are designed to be invariant to camera rotation, meaning that even if the camera is rotated, the detected feature points together with their descriptions remain unchanged or undergo only slight changes.
While many known local features are designed to be invariant to many geometric changes, few of them are designed to be invariant to complex photometric changes (pixel brightness changes). However, complex (e.g. nonlinear and spatially varying) brightness changes often occur in many real scenarios. For example, image pixel intensities are affected by the locations of lighting sources, the properties of object surface reflectance, object surface normal and camera capture parameters. Variation in any of these may lead to complex pixel intensity changes in the corresponding patches between frames. Also, changes in the relative position of an object with respect to a lighting source, or even the deformation of the object itself may result in corresponding image intensity changes.
Pixel intensity changes can cause problems in high level vision applications that assume constant intensity. Many types of known motion estimation algorithms generally require the pixel intensities to remain unchanged or go through only simple changes across frames. However, in reality, changes in capture parameters can cause dramatic change to the pixel values. Some known local feature methods claim to be invariant to linear brightness changes. However, brightness changes typically are more complicated and a linear model is not sufficient in many applications. For instance, in the applications of multi-frame computational photography, multiple image frames may be captured with significantly varying capture parameters.
For these and other reasons, there is a need for the present invention.