In many applications, one is concerned with the efficient representation, search, compression, and retrieval of signals or signal patches. For example, in example-based image (video) search and retrieval, one considers whether a given image (video) or various modified forms of it are present in a library of images (video sequences). To tackle such a task, one typically derives a signature from the example image or video and compares it to signatures of the images and video sequences in the library. These signatures are obtained through linear and nonlinear representations of images and video sequences. In compression, one represents images/video in terms of typically linear representations that are expected to provide compact descriptions. The parameters of the image/video in terms of these representations are then transmitted to a destination where a faithful reconstruction of the image is constructed. Similarly, in applications for image/video analysis, recognition, etc., one considers image parameters in terms of given representations and formulates the problem in terms of these parameters.
Existing techniques tackling representation, search, and compression problems are formulated in terms of representations that are constructed to perform reasonably over a model set, Ψ, of images and video sequences. The per-image or per-video-sequence adaptivity of these techniques is thus very limited which becomes an important problem when the target image or video sequence is not conforming to the model set Ψ. For example, in compression, established techniques use fixed linear representations such as Discrete Cosine Transform (DCT), wavelet transform, etc. The model set for such linear representations is the set of image/video that can be closely approximated with few coefficients using the given representation. For example, for the wavelet transform, this set is well known to be the set of signals with point singularities.
Existing representations perform very well when target images or video sequences are in their model set but have very inadequate performance when they are not. For example, on images and video sequences depicting textures, such as shown in FIG. 1, the performance of techniques based on such representations tend to be completely inadequate, leading to very inefficient descriptions. Similarly in image/video search, analysis, and recognition applications, established techniques use predefined features that are expected to perform reasonably well over the model set under simple distortions/deformations. These techniques become completely inadequate when the application requires operation under different types of distortions and they generate non-compact descriptions which computationally complicate and sometimes completely disrupt good performance. As they are not adaptive and as they are typically designed for worst-case performance, these techniques are neither optimal nor adequate over a given narrow set of signals and over given set of distortions.
Existing methods are also not generalizable to other types of signals as they are based on adhoc observations on images made by humans: corners are deemed important, lines are deemed important, etc. Since existing methods are based on human observations, they provide no clear guidelines as to what the features/signatures/representations should be for different types of signals, such as audio data, seismic data, medical signals like cardiograms, time series data, higher dimensional medical images depicting volumes, etc.
Another important issue that is lacking in established work is the lack of any guarantees of successful compression, search, recovery, etc. Nor can existing techniques guarantee a certain level of success for a given amount of computational complexity. For example, if infinite complexity is allowed, then clearly one can design very accurate search/retrieval strategies. When complexity is limited however, one needs to utilize the available resources effectively by making the optimal tradeoffs. Established techniques are by no means making the optimal tradeoff when allowed complexity is limited. In fact, complexity issues are not even directly addressed since features are adhocly designed with the hope that they will result in good performance.