1. Field of the Invention
The present invention relates to the processing of information signals, and in particular to the feature extraction of information signals for example for the characterization of the information signals or for the identification and database search.
2. Description of the Related Art
Concepts by which the time signals with harmonic content, such as audio data, can be identified and referenced are useful for many users. In particular, in a situation in which an audio signal whose title and author are unknown is present, it is often desirable to find out by whom the corresponding song is. A need for this exists for example when there is the wish to acquire e.g. a CD of the performer concerned. If the present audio signal only includes the time signal content, but no name about the performer, the music publishers etc., identification of the original of the audio signal, or by whom a song is, is not possible. The only hope was then to hear the audio piece once again together with reference data concerning the author or the source where the audio signal may be acquired, so as to then be able to get the desired title.
On the Internet, it is not possible to search for audio data using conventional search engines, because the search engines can only deal with textual data. Audio signals, or more generally speaking, time signals having a harmonic content cannot be processed by such search engines if they do not include textual search indications.
A realistic inventory of audio files lies at several thousand audio files stored up to hundreds of thousands of audio files. Music database information may be filed on a central Internet server, and potential search queries could take place via the Internet. Alternatively, with today's hard disk capacities, central music databases are also possible on users' local hard disk systems. It is desirable to be able to search such music databases to find out about reference data about an audio file from which only the file itself but no reference data is known.
In addition, it is likewise desirable to be able to search music databases using default criteria, which for example go so as to be able to find out similar pieces. Similar pieces are for example the pieces with a similar melody, a similar set of instruments, or simply with similar noises, such as roaring of the sea, twittering of birds, male voices, female voices, etc.
U.S. Pat. No. 5,918,223 discloses a method and an apparatus for a content-based analysis, storage, recovery and segmentation of audio information. This method is based on extracting several acoustic features from an audio signal. Volume, bass, pitch, brightness and mel frequency-based cepstral coefficients are measured in a time window of determined length in periodical interval distances. Each measurement data set consists of a series of measured feature vectors. Each audio file is specified by the complete set of the feature series calculated per feature. Furthermore, the first derivatives for each series of feature vectors are calculated. Then, statistical values, such as average value and standard deviation, are calculated. This set of values is stored in an N vector, i.e. a vector with n elements. This procedure is applied on a multiplicity of audio files to derive an N vector for each audio file. With this, a database of a multiplicity of N vectors is gradually built up. Using the same procedure, a search N vector is then extracted from an unknown audio file. In a search query, a distance calculation of the default N vector and the N vectors stored in the database is then ascertained. Finally, the N vector having the minimum distance to the search N vector is output. Data about the author, the title, the acquisition source etc. are associated with the output N vector, so that an audio file may be identified with regard to its origin.
This method has the disadvantage that several features are calculated and arbitrary heuristics are introduced for the calculation of the characteristic quantities. By average value and standard deviation calculations across all feature vectors for an entire audio file, the information given by the progress in time of the feature vectors is reduced to few feature quantities. This leads to a high loss of information.
Basically, all so-called features employed for identification of information signals have to fulfill two opposing requirements. The one requirement is to provide a characterization of an information signal that is as good as possible. The other requirement is that the feature must not require particularly much storage space, i.e. have as little information as possible. With regard to the storage space, smaller features immediately lead to smaller information signal databases and also result in faster database search to be able to make a qualitative statement on an information signal to be tested or even a quantitative statement on such an information signal.
A further requirement, which is also important, for the feature to be extracted from the information signal is that the feature should be robust against changes. Such changes consist in system-immanent noise, a distortion e.g. due to a lossy encoding method. Other signal changes are for example alteration of the volume, taking as an example an audio signal, as well as distortions due to playing an audio signal via a loudspeaker and re-recording the audio signal via a microphone etc.