1. Technical Field
A “Transform Invariant Low-Rank Texture” (TILT) Extractor, as described herein, accurately and robustly extracts both textural and geometric information defining regions of low-rank planar patterns from 2D images of a scene, thereby enabling a large range of image processing applications.
2. Background
One of the basic problems in computer vision is to identify certain feature points or salient regions in images. Those points and regions are the basic building blocks for almost all high-level vision tasks such as 3D reconstruction, object recognition, scene understanding, etc. Throughout the years, a large number of methods have been proposed in the computer vision literature for extracting various types of feature points or salient regions. The detected points or regions typically represent parts of the image which have certain distinctive geometric or statistical properties such as, for example, Canny edges, Harris corners, textons, etc.
Since distinctive points or regions in images are often used to establish correspondence or measure similarity across different images, they are desired to have properties that are somewhat stable or invariant under transformations incurred by changes in viewpoint or illumination. In the past decade, numerous so-called “invariant” features and descriptors have been proposed, studied, compared, and tuned in the literature. A representative, and widely used, type of feature is the “scale invariant feature transform” (SIFT), which is, to a large extent, invariant to changes in rotation and scale (i.e., similarity transforms) and illumination.
Unfortunately, if the images undergo significant affine transforms (e.g., perspective transforms), SIFT based techniques often fail to establish reliable correspondences and its affine-invariant version becomes a more appropriate choice. However, while deformation of a small distant patch can be well-approximated by an affine transform, projective transform is used instead to describe the deformation of a large region viewed by a perspective camera. Consequently, it is believed that the current state of the art does not provide features or descriptors that are truly invariant (or even approximately so) under projective transforms (i.e., homography). Despite tremendous effort in the past few decades to search for better and richer classes of invariant features in images, this seems to be a problem that none of the existing methods have been able to fully resolve. Therefore, the numerous “invariant” image features proposed in the current vision literature (including the ones mentioned above) are, at best, approximately invariant, and often only to a limited extent.
For example, considering typical classes of transformations incurred on the image domain by changing camera viewpoint and on the image intensity by changing contrast or illumination, in strict mathematical sense, invariants of the 2D image are extremely sparse and scarce—essentially, only the topology of the extrema of the image function remains invariant. This has been referred to as the “attributed Reeb tree” (ART).
On the other hand, a 3D scene is typically rich with regular structures that are full of invariants (with respect to 3D Euclidean transforms). For instance, in many urban environments, the scene is typically full of man-made objects that may have parallel lines, right angles, regular shapes, symmetric structures, repeated patterns, etc. All these geometric structures are rich in properties that are invariant under all types of subgroups of the 3D Euclidean group. As result, 2D (affine or perspective) images of such 3D scenes typically encode extremely rich 3D information about objects in the scene. Unfortunately, existing feature extraction techniques generally deal poorly with this information, especially in the case of projective transforms.