Many problems in the fields of image processing and computer vision relate to creating good representations of information in images of objects in scenes. A great variety of computer vision tasks, for example, that involve image understanding, image synthesis, or image compression, rely on improved representations of structure in images. It is required to improve such computer vision tasks not only in terms of improved functional ability but also in improved processing times, robustness and ability to learn the representations or models automatically.
For example, one problem is that there is immense variability of object appearance due to factors confounded in image data such as illumination, viewpoint etc. Shape and reflectance are intrinsic properties of an object but an image of an object is a function of several other factors. Some previous approaches to computer vision tasks have attempted to infer from images information about objects that is relatively invariant to these sources of image variation. For example, template based representations or feature based representations have been used to extract information from images such that intensity values of the original image are completely removed from the final representation. Other previous approaches to computer vision tasks have instead used appearance-based representations. For example, stored images of objects rather than 3D shape models. An example of an appearance-based approach is to use correlation to attempt to match image data to previously stored images of objects to carry out object recognition.
The range of choices for appearance-based models vary from histogram-based representation that throws away spatial information, to complete template-based representation that tries to capture the entire spatial layout of the objects. In the middle of this spectrum lies patch-based models that aim to find the right balance between the two extremes. These models aim to find representations that can be used to describe patches of pixels in images that contain repeated structure. However, patch sizes and shapes have previously been hand-selected, for example, as being rectangles of a given size. This is disadvantageous because it is not simple to select an appropriate patch size and shape. In addition, performance of such patch-based models is required to be improved.