The invention pertains to analysis of digital music.
As proliferation and end-user access of music files on the Internet increases, efficient techniques to provide end-users with music summaries that are representative of larger music files are increasingly desired. Unfortunately, conventional techniques to generate music summaries often result in a musical abstract with music transitions uncharacteristic of the song being summarized. For example, suppose a song is one-hundred and twenty (120) second long. A conventional music summary may include the first ten (10) seconds of the song and the last 10 seconds of the song appended to the first 10 seconds, skipping the middle 100 seconds of the song. Although this is an example, and other song portions could have been appended to one-another to generate the summary, this example emphasizes that song portions used to generate a conventional music summary are typically not contiguous in time with respect to one another, but rather an aggregation of multiple disparate portions of a song. Such non- contiguous music pieces, when appended to one another, often present undesired acoustic discontinuities and unpleasant listening experiences to an end-user seeking to hear a representative portion of the song without listening to the entire song.
In view of this, systems and methods to generate music summaries with representative musical transitions are greatly desired.
Systems and methods for extracting a music snippet from a music stream are described. In one aspect, the music stream is divided into multiple frames of fixed length. The most-salient frame of the multiple frames is then identified. One or more music sentences are then extracted from the music stream as a function of peaks and valleys of acoustic energy across sequential music stream portions. The music snippet is the sentence that includes the most-salient frame.