1. Field of the Invention
The present invention relates to signal analysis and particularly to signal analysis for the purpose of identification of signal content.
2. Description of the Related Art
In order to archive the ever increasing stock of audio and video material, establish databases that are easy to search or distribute them via various ways of distribution, automatic information recognition systems are necessary that assist to identify audio and video material or, more generally, information material unambiguously based on the contents.
One application for this is the so-called “broadcast monitoring”. With the help of such an audio-video monitoring system, it is for example intended to ensure that only legal contents are distributed or that the respective royalties for the right holders of the audio and video material are paid correctly.
A further application is, for example, the recognition of audio material that is to be exchanged between partners via peer-to-peer networks.
A further application is the monitoring possibility for the advertising industry to monitor a television or radio station as to whether the booked advertising times have really been broadcast, or whether only parts of the booked advertising share have been broadcast, or whether parts of the commercials have been disturbed during transmission, which may, for example, be the responsibility of the television or radio station. At this point, it is to be noted that particularly the costs for television commercials in popular programs at good broadcasting times are so high that the advertising industry, particularly in view of these high costs, has a vital interest in a monitoring possibility, so that they do not merely have to trust the word of the broadcasting stations. Currently, the monitoring possibility is based on paid “test hearers” or “test viewers”, who continuously watch a certain television program and record, for example, the exact times at which a commercial is transmitted, and who further monitor whether, during the transmission, there has been no disturbance, or whether the whole commercial has been transmitted correctly, i.e. whether there has been no picture distortion, etc.
The disadvantages of this concept are evident. On the one hand, the costs are significant and, on the other hand, the reliability or strength of evidence of statements of test hearers and/or test viewers is problematic, particularly if considerable repayment demands are made that solely depend on test watchers with regard to their provability.
Various known systems may be used for automated broadcast monitoring. For example, WO 02/11123 A2 or the specialist publication: “Invited Talk: An Industrial-Strength Audio Search Algorithm”, Avery Wang, ISMIR 2003, Baltimore, October 2003, disclose systems and methods for recognizing audio and music signals in an environment of strong noise and high distortions. A first step is an examination whether there is a match between hash values of a reference audio object and the currently determined hash value of the audio object still unidentified. If this is the case, the associated time offset, i.e. the relative distance from the beginning of the audio object, of the hash value in the still unidentified audio object and the time offset of the hash value in the reference audio object is stored under the respective identification of the reference audio object. When all input hash values have been processed, a so-called scanning phase starts. During this phase, there is an examination of how many time offset pairs per reference audio object time match continuously. If a certain number is detected, an identification of the corresponding reference audio object is assumed. The time offset pairs are considered to be continuous in time, i.e. temporally associated with each other, when they form a straight line in a two-dimensional scatter plot with one time offset as the x-coordinate and the other one as the y-coordinate.
In the specialist publication “Robust Audio Hashing for Content Identification” by J. Haitsma, T. Kalker, J. Oostveen, in Proceedings of the Content-Based Multimedia Indexing, 2001, url:citeseer.ist.psu.edu/haitsma01robust. html, a system for robust audio hashing for content identification is presented. For content-based music recognition, a hash function is used that associates a bit sequence with a portion from an audio signal, namely such that audio signals acoustically similar for the human sound perception also generate a similar bit sequence. For the calculation of a hash value, the audio signal is first windowed and subjected to a transform to finally perform a division of the transform result into frequency bands with logarithmic bandwidth. For these frequency bands, the signs of the differences in the time and frequency directions are determined. The bit sequence resulting from the signs constitutes the hash value. One hash value is always calculated for an audio signal length of 3 seconds. If the Hamming distance between a reference hash value and a test hash value to be examined for such a portion is below a threshold s, a match is assumed and the test portion is associated with the reference element.
In order to perform a recognition of audio material, the audio signal is typically split into small units of length Δt. These individual units are each analyzed individually to have at least a certain time resolution.
This causes several problems.
The recognition results of the small analyzed time periods of the audio signal have to be put together so that an unambiguous correct statement on the recognized audio signal can be made for a longer time period.
For the analysis of a continuous audio data stream, transitions from one audio element to another, i.e. a transition from a piece of music A to a piece of music B, should be detected correctly.
There is further the situation in which there are several versions of a piece of music, which, for example, have the same beginning and only start to differ after a certain time. Just think of, for example, short versions or maxi versions of a song. Alternatively, there are also situations in which pieces of music that are based on the same song differ, for example, at the beginning, have an identical middle part and again differ from each other towards the end of at least one of the two pieces of music. For the payment of royalties to copyright holders, it may be important, whether, for example, the maxi version of a song may be played for a higher charge, whether only a normal version may be played for a medium charge, or whether, for a low charge, there may already be played the short version of a song. In this case, it should be possible to reliably distinguish several versions of a song.
The above prior art is unsatisfactory in that it results in detection errors when the results of the individual recognitions are simply put together. In particular, no information is given as to whether and how a continuous audio data stream from several different audio objects may be analyzed, and how corresponding transitions between various audio objects may be detected. In addition, although particularly in the latter prior art the ambiguity of reference hash values is mentioned, no explicit solution for the problem of the determination of an unambiguous candidate is given. If an audio object is considered to be identified for a hash value, for the directly subsequent hash value there is only an examination whether it fits the identified audio object. If this is not the case, there is a new search including all reference audio objects.
Particularly for distinguishing different versions of one and the same song, no solution is known in prior art.