Computer technology is continually advancing, providing computers with continually increasing capabilities. One such increased capability is audio information retrieval. Audio information retrieval refers to the retrieval of information from an audio signal. This information can be the underlying content of the audio signal (e.g., the words being spoken), or information inherent in the audio signal (e.g., when the audio has changed from a spoken introduction to music).
One fundamental aspect of audio information retrieval is classification. Classification refers to placing the audio signal (or portions of the audio signal) into particular categories. There is a broad range of categories or classifications that would be beneficial in audio information retrieval, including speech, music, environment sound, and silence. Currently, techniques classify audio signals as speech or music, and either do not allow for classification of audio signals as environment sound or silence, or perform such classifications poorly (e.g., with a high degree of inaccuracy).
Additionally, when the audio signal represents speech, separating the audio signal into different segments corresponding to different speakers could be beneficial in audio information retrieval. For example, a separate notification (such as a visual notification) could be given to a user to inform the user that the speaker has changed. Current classification techniques either do not allow for identifying speaker changes or identify speaker changes poorly (e.g., with a high degree of inaccuracy).
The improved audio segmentation and classification described below addresses these disadvantages, providing improved segmentation and classification of audio signals.