Concepts by means of which time signals having a harmonic portion, such as audio data, are identifiable and able to be referenced are useful for many users. Especially in a situation where there is an audio signal whose title and author are unknown, it is often desirable to find out who the respective song originates from. A need for this exists, for example, if there is a desire to acquire, e.g., a CD of the performer in question. If the present audio signal includes only the time-signal content but no name concerning the performer, the music publishers, etc., no identification of the origin of the audio signal or of the person or institution a song originates from will be possible. The only hope then has been to hear the audio piece once again, including reference data with regard to the author or the source where the audio signal is to be purchased, so as to be able to procure the song desired.
It is not possible to search audio data using conventional search machines on the Internet since the search engine know only how to deal with textual data. Audio signals, or, more generally speaking, time signals having a harmonic portion may not be processed by such search engines unless they include textual search indications.
A realistic stock of audio files comprises several thousand stored audio files up to hundred thousands of audio files. Music database information may be stored on a central Internet server, and potential search enquiries may be effected via the Internet. Alternatively, with today's hard disc capacities, it would also be feasible to have these central music databases on users' local hard disc systems. It is desirable to be able to browse such music databases to obtain reference data about an audio file of which only the file itself but no reference data is known.
In addition, it is equally desirable to be able to browse music databases using specified criteria, for example such as to be able to find out similar pieces. Similar pieces are, for example, such pieces which have a similar tune, a similar set of instruments or simply similar sounds, such as, for example, the sound of the sea, bird sounds, male voices, female voices, etc.
The U.S. Pat. No. 5,918,223 discloses a method and an apparatus for a content-based analysis, storage, retrieval and segmentation of audio information. This method is based on extracting several acoustic features from an audio signal. What is measured are volume, bass, pitch, brightness, and Mel-frequency-based Cepstral coefficients in a time window of a specific length at periodic intervals. Each set of measuring data consists of a series of feature vectors measured. Each audio file is specified by the complete set of the feature sequences calculated for each feature. In addition, the first derivations are calculated for each sequence of feature vectors. Then statistical values such as the mean value and the standard deviation are calculated. This set of values is stored in an N vector, i.e. a vector with n elements. This procedure is applied to a plurality of audio files to derive an N vector for each audio file. In doing so, a database is gradually built from a plurality of N vectors. A search N vector is then extracted from an unknown audio file using the same procedure. In a search enquiry, a calculation of the distance of the specified N vector and the N vectors stored in the database is then determined. Finally, that N vector which is at the minimum distance from the search N vector is output. The N vector output has data about the author, the title, the supply source, etc. associated with it, so that an audio file may be identified with regard to its origin.
The disadvantage of this method is that several features are calculated, and arbitrary heuristics may be introduced for calculating the characteristic quantities. By mean-value and standard-deviation calculation across all feature vectors for one whole audio file, the information being given by the feature vector's temporal form is reduced to a few feature quantities. This leads to a high information loss.
Prior art methods for a sound signal analysis are, therefore, disadvantageous in that they all rely on a certain kind of time/frequency transform or on a kind of time or frequency pattern recognition etc. All these algorithms either completely ignore the fact that the receiver of the sound signal is a human being or include this fact only to a small degree into a sound analysis procedure. Although it is known from audio-signal compression techniques which are based on a psycho-acoustic model that sound signals include a huge amount of irrelevant portion, i.e., sound signal information, which is not used by the human being for audio recognition, the prior art methods for sound signal analysis ignore such things. Although one might consider to perform a music analysis on signals, from which irrelevant portions have been removed such as by means of a quantization procedure based on a perceptual model, such concepts also are problematic in that they are not consequently driven by the fact that—in the final analysis—the solely intended receiver for music is a human being rather than a computer or a sound signal data base etc.