Speaker recognition technology operates by generating a plurality of speaker models associated with respective users. For example, the speaker recognition technology can generate a plurality of Gaussian Mixture Models (GMMs) using the Expectation-Maximization (EM) technique. In a common training technique, the speaker recognition technology produces the speaker models from a training data set of representative audio segments; in one scenario, for example, each user is asked to provide an audio segment that characterizes his or her voice in a controlled environment with minimal noise. In a real-time phase of operation, the speaker recognition technology detects speech produced by a particular user in question. The speaker recognition technology matches the detected speech with one of the generated speaker models. The matching speaker model can then be mapped to the identity of the particular user.
While generally effective, known speaker recognition technology is not optimally suited for some environments and use scenarios.