1. Field of the Invention
The present invention relates generally to voice recognition and, more particularly, to speaker identification and segmentation.
2. Description of the Background Art
Speaker segmentation is the automatic detection and tracking of the beginning and end of a speaker's speech, with the detected speech segment corresponding only to the speaker. A number of applications for speaker segmentation in conversational speech exist, such as automatic speech to text translation, speaker online adaptation for automatic speech recognition, and automatic information retrieval and extraction, with the conversational speech received from a number of different sources, including over telephone. These applications demand speaker segmentation systems that can automatically detect, track and segment multiple speakers' speech in a conversation.
Traditional approaches to identifying speakers in a conversational speech sample typically rely on prior training, and therefore some degree of knowledge of the speaker's speech characteristics. There are many approaches available for speaker segmentation such as Bayesian Information Criterion (“BIC”) and Generalized Likelihood Ratio (“GLR”). A simple approach is based on the difference between a speaker model and a feature vector. This approach however does not have good performance, especially when the speech segments are short. The model based approach has good performance in speaker segmentation, but it also needs a long manually labeled speech segment to train the speaker model.
Accordingly, what is desired is a robust speaker segmentation methodology for the detection and tracking of individual speakers in conversational speech.