Representing an image is a fundamental challenge in many image/video analysis and synthesis applications, such as three-dimensional modeling, motion tracking, correspondence matching, image recognition/categorization/retrieval and other applications in computer vision. Image representations can be categorized as global methods and local methods. For example, an image (as a whole) can be globally represented by global intensity histogram. However, such histograms are often not distinctive enough to characterize the appearance of the image. An example of a local method is image representation through sparse local features, which decomposes an image into multiple parts or patches, and the image is described as a constellation of these local features.
In image processing and analysis, a feature generally is a piece of information that is relevant for the particular processing or analysis task. A local feature typically has two components, a detector and a descriptor. The detector identifies features for further processing and analysis. Normally, the detector selects a small subset of highly distinctive pixels from the whole image. The descriptor characterizes the local image content of patches centered at the detected points using a feature vector. Thus, the feature detectors attempt to select stable and reliable image locations that are informative about image content and the descriptor describes the local patch in a distinctive way with a feature vector (usually a much lower dimension than the original patch). The overall usefulness of the local feature is affected by the reliability and accuracy of the detection (localization) and distinctiveness of the description.
Local feature based image representations are described herein.