Existing Automatic Speech Recognition (ASR) solutions can use either pre-determined speaker-independent models or be trained to a particular individual's voice as speaker-dependent. For the speaker-dependent training, there can be a separate training mode in which the user “actively enrolls” their voice into a system. This active enrollment/training typically requires the user to utter pre-defined and/or user-defined key-words or phrases to be associated with specific commands. The collected utterances are processed to generate a template or model for the speaker's voice. This model is subsequently used in operation of, for example, a mobile device.
However, speaker-dependent training requires users to spend time actively enrolling their voice and does not allow using a system “out of the box.” Additionally, existing solutions typically “lock” the training data after a user has uttered a keyword a few times. This results in poor quality of the training due to limited training data.