In various fields of signal and data processing, e.g. in multimedia asset management, small-sized, compact descriptors are calculated for multimedia items in order to compare two items or to search items in a database similar to a given item.
For instance, images in a database—e.g. personal photographs or images from a video—may have associated descriptors to ease database organization into groups of similar images and retrieval of images similar to a given one.
A problem of descriptors is that they should best reflect similarity of two items while being small-sized.
One type of known and commonly used descriptors is based on a frequency decomposition of the signal of the multimedia item. Therefore, a bank of filters is used to generate each a filtered signal corresponding to a frequency band. Then, often the power of the filtered signals in each band is calculated. The totality of power values builds the descriptor. The use of filter banks is common for example in audio processing. Also for images, filter banks such as wavelets or Gabor filter banks are widely used in image analysis and retrieval.
In order to enhance the capacity of a descriptor to reflect the characteristics of images and the similarity of images, one of the following measures is commonly applied:                1. The number of filters in increased;        2. The repartition and type of filters is optimised;        3. The precision of each filter is increased.        
The first measure can be realised for example by taking 12 instead of 8 filters. By this, the signal's frequency spectrum is better described.
The second measure can be realised—in the case of images—by replacing wavelet filters by Gabor filters. While wavelet filters cover the 2-dimensional frequency spectrum by considering horizontal, vertical and diagonal frequencies, Gabor filters are more flexible and can describe frequencies in more directions. Hereby, the images, and notably the texture in images, can be better described.
The third measure addresses the implementation of filters, notably digital filters, and can be realized by increasing the number of samples used to represent the filter kernel. For example, a Gabor filter can be enhanced when replacing a 16×16 kernel by a 32×32 kernel.
A problem of filter banks is often, that the spectrums of filters overlap and thus the frequency bands are not properly calculated. For example, Gabor filters have Gaussian-shaped spectra. These spectra do inherently overlap. This overlap lowers performance of image retrieval notably when one or several filters include considerable parts of frequency zero.
Let us take as an example two images showing stripes. Direction and frequency of stripes is identical in both images. The only difference is a spatially constant offset between both images. We calculate a descriptor for each image based on the power of Gabor subbands. Even if the images show the same type of texture, the descriptors will be the more different the higher the offset is.
Let us take another example of two images showing the same scene at different daytimes. The more different the illumination is the more different the descriptors will be. For example, images showing cars are searched in a database using a given image showing a car at daytime. Then, images showing cars at lower light levels such as in the evening may not be found.
This effect makes the performance of retrieval in databases more difficult, notably when semantically similar items are searched. For example, audio clips are searched having a similar rhythm to a given one. When audio clips have different signal offsets by technical reasons, some audio clips with same rhythm but different offset may not be found.
A negative effect can also occur when descriptors based on filter banks are used to classify multimedia items. Hereby, the descriptor is fed into a classifier that attributes one or several labels to the image. For example, a classifier for outdoor scenes in images can detect an outdoor scene in a given image and generate the label “outdoor” for this image. A classifier is usually trained by a set of typical images. When these images include only daylight images, the classifier may not detect outdoor scenes with lower light level, for example in the morning.