The present application is directed to a method for recognizing and distributing music, and more particularly to a method for recognizing a musical composition from a specimen that is provided by a customer (as by humming, singing, or otherwise vocalizing the specimen or by picking it out on a simulated piano or other tone generator), and for permitting a customer to preview a musical composition before distributing the composition to the customer over the internet.
The internet (and particularly the worldwide web) is becoming an important vehicle for distributing music, usually in encoded form. Web sites currently (1999) exist that distribute music in an encoded format known as “MP3. ” So-called “juke box” programs are also available which permit MP3 files that have been downloaded over the internet to be stored and played on audio systems. Some authorities speculate that distribution of music over the internet will eventually replace conventional record shops.
Some customers who desire to purchase a recording at a record shop may be familiar with the music itself, but may not be sure of the singer or group that produced the music, or possibly the title of the relevant song or album. In a music shop, such a customer is able to question a shopkeeper, and possibly hum a few bars of the musical composition for the shopkeeper to attempt to identify. Alternatively, music stores frequently permit patrons to sample recordings before buying them, so a customer who is not sure which recording he or she would like to purchase may select a few possible recordings and listen to them until the desired recording is located. There is no harm in permitting a customer to listen to as much of a recording as the customer would like, since the customer cannot legally take a recording from the shop without paying for it.
Speech recognition technology is highly developed. Typically, features are extracted from spoken words and then normalized to provide patterns that are compared to patterns in a pattern library. When a pattern derived from a spoken word matches a pattern in the library sufficiently, a phoneme of the spoken word has been found. The features that are extracted from the spoken words may identify a range of frequencies that are present during extremely brief slices of time and the power at those frequencies. Sophisticated mathematical operations are then performed on the extracted features in order to generate the patterns for pattern matching.