1. Field of the Invention
The present invention relates to multimedia asset management systems, and in particular to the location and retrieval of multimedia files based on a graphically entered music search.
2. Background Information
Audio searching of large multimedia databases has many important applications. Large multimedia databases or collections can contain both audio and video files. Conventional systems store and retrieve specific information from a database using, for example, descriptive information regarding the image file, such as file creation date, file name, file extension and the like. This form of data search and retrieval is not significantly different from the any other digital information.
By relying on the file information, only cursory information can be obtained about the file and nothing at all specifically related to the audio content of the file. For example, an audio file could have a name that has no relation to the features or content of the file, such as a file containing samples of barking dogs could have the file name xe2x80x9ccatsxe2x80x9d. Other systems can provide additional information based on the content of the file. However, this is usually done by keyword annotation, which is a laborious task.
Multimedia databases containing music files can have a variety of formats. However the Musical Instrument Digital Interface (MIDI) format, which has been used since 1983, is the most prevalent. The MIDI format has many advantages for representing music in a digital form. One of the most relevant features of the MIDI format for musical searching is the standardization of the musical scale into a range of integers, from 0 to 127. For example, middle C is assigned integer value of 60, corresponding notes above and below middle C are represented by corresponding integers (i.e., the C# above middle C is MIDI note 61). Additionally, the MIDI format allows for multiple tracks containing musical notes, percussion, timing, and the like, which provides a rich environment for digitally describing a musical piece. Therefore, the MIDI format is used in the following description. However, those skilled in the art will appreciate that the invention can be practiced on any file format that can be stored in a searchable format. Further, those skilled in the art will appreciate that the music files can be stored in related databases, where a searchable data set (e.g., MIDI files) is linked to a data set containing music files that are not easily searchable (e.g., raw audio files).
The amount of multimedia information available today due to the evolution of the internet, low-cost devices (e.g., digital video cameras, digital cameras, video capture cards, MIDI devices, audio cards, digital audio, and the like) to generate multimedia content, and low-cost storage (e.g., hard disks, CDs, DVD, flash memory, and the like) increases the need to search and retrieve relevant multimedia data efficiently. Unlike text-based retrieval, where keywords are successfully used to index into documents, multimedia data retrieval has no easily accessed indexing feature.
One approach to searching audio portions of multimedia collections is to hum a portion of the audio as the search criteria. A query by humming system is described in Ghias et al., xe2x80x9cQuery by Humming: Musical Information Retrieval in an Audio Databasexe2x80x9d, ACM Multimedia 95 Proceedings, 1995, pages 231-236, which is hereby incorporated by reference in its entirety. Query by humming requires a user to hum a portion of the audio file, which is then converted into a musical contour (i.e., a relative pitch stream of audio symbols). The musical contour can be represented as a simple string of characters, such as xe2x80x9cU, D, Sxe2x80x9d, where U represents that the current note is higher than previous note, D represents that the current note is lower than previous note, and S represents that the current note is the same pitch as previous note. Files in a multimedia database that are being searched can be converted to this nonstandard string representation, so these files can be compared to the hummed query.
Although the query by humming system allows the user to search a database containing audio files based on the content of the audio file, it is limited to only the melody of the music. Important features of the audio file are not searched, such as the rhythm section, bass line and percussion of the musical piece. Additionally, the string representation does not correspond to traditional musical timing, which eliminates another valuable part of the musical composition, i.e., the duration of the notes (eighth, quarter, whole, etc.) is not used to search the audio files. The query by humming system relies on the user""s ability to accurately reproduce a portion of the desired audio file by humming. Therefore, the user""s ability to hum is a key and uncontrollable variable in determining the performance of the system.
Another system is disclosed in an article by Bainbridge et al., xe2x80x9cTowards a Digital Library of Popular Musicxe2x80x9d, Proceedings of the Fourth ACM Conference on Digital Libraries, 1999, pages 161-169, which is hereby incorporated by reference in its entirety. The system provides two modes of searching music databases. First, a text-based query allows a user to search for keywords that have been associated with the music file. For example, a user can search on composers, artists, lyrics, and the like. However, these keywords are only descriptive and are only as reliable as the association (i.e., a music file can have incorrect keywords associated). Therefore, the keyword searching does not provide any search capability of the actual audio portion of the file. A second search mode provides melody-based searching. The melody-based searching converts an audio input to a sequence of notes. The sequence of notes is then used to search a database of musical files. The second search mode allows a user to search the actual content of the musical files to retrieve appropriate matches. However, the audio input is based on a user that sings, hums, whistles, etc. the melody. Therefore, the skill of the user still is a problem for use by the general population (e.g., the ability to successfully hum or whistle a tune).
Alternatively, the search can be entered by a MIDI keyboard, thereby avoiding potential conversion errors from raw audio input to notes. However, the musical knowledge and playing skill are still required by the user to successfully input the melody searches. Combined searches using both the text and melody-based searching modes are also disclosed by Bainbridge et al. These combined searches allow users to narrow scope of the melody-based search to, e.g., certain artists, etc. Although, the combined search can narrow the scope of melody-based searching, it does not overcome previously mentioned problems of each system. For example, an incorrect label in the artist field may exclude the desired music file and a poorly hummed input will not be helped by restricting the scope of the database that is searched.
Still another musical search system is described in an article by Lemstrxc3x6m, K. and Perttu, S., xe2x80x9cSEMEXxe2x80x94An Efficient Music Retrieval Prototypexe2x80x9d, First International Symposium on Music Information Retrieval (ISMIR 2000), 2000, which is hereby incorporated by reference in its entirety. The SEMEX (Search Engine for Melodic Excerpts) system relies on the pitch level of notes, similar to the prior discussed systems. The pitch levels are represented as integers, ranging from 0 to r that correspond to various representations of the musical files (e.g., r=2 for musical contouring such as used by the Ghias et al. system, r=10 for QPI classification, and r=127 for MIDI files). The input is a sequence of sets of integers that correspond to individual notes or chords and is a subset of the pitch levels. The query is in the form of integers that can be a subset of the input sequence, if the input sequence is long. The music query is generated by a pitch estimator that receives a digital audio input and converts it into a symbolic form (i.e., a string of integers that is compatible with the music files in the database). In addition to melody searching (i.e., monophonic), the SEMEX system allows for searching of chords (i.e., polyphonic).
A non-profit collaborative project of the Center for Computer Assisted Research in the Humanities (CCARH) at Stanford University and the Cognitive and Systematic Musicology Laboratory at the Ohio State University, entitled Themefinder, is directed to a music search engine for music themes. The search engine is a nongraphical system which conducts searches based on specialized musicological features.
Current systems for creating queries for searching musical databases are based on the users musical talent to recreate (i.e., hum, whistle, sing, etc.) a portion of the musical file that the user desires to retrieve. Therefore, the performance of these systems varies wildly from user to user. It is desired to have a method and system for graphically creating musical queries that provide audio feedback to improve the accuracy of the query generation. Additionally, it is desired to have an interactive system that allows a user to adjust the musical query based on the audio feedback.
The present invention is directed to methods and systems for querying and retrieving multimedia data files. An exemplary method comprises: graphically generating a musical segment that represents a portion of a desired piece of music; providing audio feedback to a user by playing the musical segment; and generating a musical query based on the musical segment.
An exemplary system for creating a musical query comprises logic that graphically generates a musical segment that represents a portion of a desired piece of music; logic that provides audio feedback to a user by playing the musical segment; and logic that generates a musical query based on the musical segment.