1. Field of the Invention
This invention generally relates to the field of speech recognition, and more particularly relates to a system and method for segmenting audio signals into different classes that separate segments of voice activity from silence and tones in order to more accurately transcribe speech.
2. Description of Related Art
The process of automatic voice recognition and transcription has gained tremendous popularity-and importance in recent years. Today, voice recognition techniques are used in numerous applications such as closed captioning, speech dictation, and surveillance.
In automated speech recognition, the ability to separate segments of voice activity from other audio has become increasingly important as the desire to apply automatic voice processing to real world audio signals increases. Often, these types of audio signals consist of voice segments interspersed with segments of silence and other sounds such as tones or music. Certain anomalies within a segment of audio signals, such as a random burst of noise, silence, or music will cause errors when attempting to process or transcribe the speech segments. Therefore, prior to automatic processing of these voice segments, they must first be separated from the other audio.
Hidden Markov models (HMM) are commonly used to model random processes such as speech production. Others have tried segmenting speech and music with a single (HMM) using minimum duration constraints. However, with these methods there is a need to know the duration of the different segments beforehand. They also do not allow for segments smaller than the predetermined duration.
Therefore a need exists to overcome the problems with the prior art as discussed above, and particularly for a system and method for segmenting audio into different classes in order to more accurately transcribe speech.