The present invention relates to the field of music information retrieval systems, in particular to a system that can be applied to retrieve information about a played, sung or hummed melody stored e.g. in a database.
Traditional ways of querying music databases where a user has to type in the title of a song, the name of an interpreter or any other information referring to a specific song, are limited by the growing number of songs stored in said music databases, which makes it difficult for the user to find the song he/she wishes to hear.
An example for a content-based retrieval method is query-by-humming (QbH). QbH systems particularly aim at searching a desired piece of music by accepting queries in the form of sung, hummed or whistled tunes, e.g. in order to find a song from a music library but has forgotten its title or composer.
One of the first QbH system was developed and described in 1995 by the authors A. Ghias, J. Logan, D. Chamberlin, and B. C. Smith in their article “Query by Humming, Musical Information Retrieval in an Audio Database” (Proc. of ACM Multimedia Conf., pp. 231-236, 1995). The QbH system makes it possible to find a song even though the user only knows its melody. It thereby provides a very fast and effective query method when looking for a particular song in a large music database.
As depicted in FIGS. 1a-c, 2a and 2b, a QbH system basically takes a hummed melody as input data and compares it with the songs stored in an integrated database. The output data of the QbH system is usually a ranked list of songs classified in the order of similarity. The first song listed should therefore be the searched song. Since the comparison has to be done between two media of the same type, the hummed melody as well as the files of the database have to be transformed into a format that allows the comparison to be made. For this reason, the hummed melody is first transcribed into musical notation from which the relevant information is then extracted. The extraction of note information from a stave is also known as a “description”. As depicted in FIG. 2b, the files stored in said database, which contain the scores of stored songs, go through the same description procedure. Thereby, musical key characteristics (descriptors) are extracted from said files, and a new database of files is created which are in the same format as the transformed hummed melody.
Recent works on QbH are mainly focused on melody representations, similarity measures and matching processing. In some works, only pitch contours (which means the intervals and interval directions of a melody) are used to represent a song. A three-state QbH system, a so-called “UDS system”, is based on the assumption that a typical person does not hum correctly. This is actually the case for two reasons: First, people make mistakes in remembering the song they wish to hum and second, people make mistakes in actually humming correctly the song. Based on this assumption, scientists have created a UDS system which supports these kinds of errors.
A UDS system consists of a description of the musical notation obtained by the transcription of a hummed tune into a string of U, D and S letters, and comparing this string to the UDS strings derived from the songs stored in a database. The description is based on the intervals between the recognized notes of the hummed tune. As illustrated in FIGS. 3a-c, an ascending interval is coded by the letter U (“up”), a descending interval is coded by the letter D (“down”), and a “null interval” (a perfect prime) is coded by the letter S (“same”). Finally, the same description is applied to the melodies of various songs stored in the database, and a comparison is made between the UDS string derived from the hummed tune and the UDS string of each stored melody.
As this method deals with interval directions and not with the particular notes of a hummed song's melody, the system works independently from the key of the hummed melody and tolerates wrong notes as long as the interval directions of the hummed tune are correct. The QbH system thus gives a lot of freedom to the hummer, who just needs to be able to make the difference between ascending intervals (U), descending intervals (D) and so-called “null intervals” (S), which means perfect primes.