Many technical applications, for example in the field of audio processing, video processing or signal processing, involve obtaining a similarity information on the basis of one or more input signals. For example, it is sometimes desirable to compare two time-shifted sections of a single input signal, for example, in order to obtain an information about a periodicity of the signal input signal. Such a concept may be used to prepare audio processing (audio manipulation) operations or to determine the characteristics of an audio signal. For example, a fundamental frequency may be extracted from an audio signal using this concept. Also, the information about the similarity between different portions of the same audio signal can be used in situations in which a temporal extension or a temporal shortening of the audio signal is desired.
On the other hand, it may also be desirable to compare two different input signals and to obtain the information about the similarity of the input signals. For example, a similarity information may be obtained without applying a time shift to one of the input signals, or for a single time shift between the input signals, or for multiple values of the time shift of the input signals. By comparing two input signals, which may, for example, be audio signals, it may be possible to classify at least one of the audio signals. Alternatively, it may be possible to find an appropriate time for performing an overlap-and-add between the audio signals.
However, many different applications in the field of audio processing, or more generally, signal processing, are possible on the basis of a similarity information describing a similarity between two different input signals (audio signals) or a similarity between different, time-shifted portions of a single input signal (audio signal).
In embedded systems, such as digital signal processors (DSP), naturally only limited resources of memory and processor cycles are available. To be able to compute the desired algorithms in real time, it may be desirable to perform an optimization for the respective platform. These optimizations may roughly be divided into two categories. The first category includes optimizations which take advantage of the specific processor architecture. This includes, for example, approximations of trigonometric functions or use of fast FFTs or so-called single-instruction-multiple-data operations.
A second category concerns itself with, for example, an optimization of algorithms themselves. It has been found that if, for example, a cross-correlation for determining a time offset between two audio signals had to be computed, both processor cycles as well as storage space would limit the maximum detectable latency.
In the following, some conventional concepts will be described. It has been found that, for reducing memory and computational load, downsampling can be used frequently. It has been found that using downsampling by a factor of 4, ¾ (i.e., 75%) of the involved memory would be saved, or the detectable latency would be increased by a factor of four. It also has been found that these savings are offset by drawbacks. For example, there is a reduction of accuracy. Results that were sample-accurate before, are now obtainable with a maximum accuracy of n samples, when n describes the downsampling factor.
Furthermore, our robustness decreases with an increasing downsampling factor. Interferences, which may occur during an audio transmission, exceedingly deteriorate a result. This includes noise, dynamic range compression, audio encoding, limiter and filtering (for example, equalizer).
It has been found that downsampling may also be understood as follows: an audio sample is used from the audio stream at equidistant intervals and is, so to speak, a representative of its surrounding samples. A number of surrounding samples may also be referred to as a block size. In the example above, the block size n would equal 4. Every fourth sample from the audio stream would be used to function as a representative for this block. For the explanation regarding downsampling, it is assumed that an upstream downsampling filter reduces a highest occurring frequency by a factor n to satisfy the Nyquist criterion.
Moreover, it has been found that a conventional downsampling brings along significant disadvantages, for example in terms of robustness.