Speaker recognition systems are typically trained to recognize and identify the voices of a number of users who are enrolled in the system. As will be appreciated, therefore, a speaker as used herein refers to a person. During the enrollment process, a user will generally utter a few words into a microphone which captures the audio for use in building a speaker model. In most practical enrollment scenarios the user is in close proximity to the microphone. During subsequent operation of the recognition system, however, the speaker may be located further away from the microphone. For example, speakers may be seated around a large conference table with a central microphone, or a user may be speaking from a distance to a “smart home” controller device that responds to audio commands. Because training typically occurs in the near-field of the microphone, while later operational usage (e.g., authentication) may occur in the far-field of the microphone, recognition performance may be degraded and may be unusable in some circumstances depending on the audio environment. This is due, at least in part, to the fact that the sound can be reflected off walls and other objects which distorts the signal when captured from a distance.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent in light of this disclosure.