Recently, voice-based digital assistants, such as Apple's SIRI, have been introduced into the marketplace to handle various tasks such as web searching and navigation. Currently, voice-based digital assistant systems utilize either speaker-independent models or speaker-dependent models in order to generate speech-to-text (STT) input to the digital assistant. The speaker-dependent model increases accuracy in generating the STT input, and therefore, enables the digital assistant to provide better results to the user. However, speaker-dependent models require significant training data in order to function with increased accuracy. Reciting many lines of predefined text in order to train a speaker-dependent model has several drawbacks. Many users would prefer not to expend the time and effort in providing training data for a model. In addition, a user's speech is markedly different when reading as opposed to ordinary conversation, therefore the accuracy of a speech model trained with data obtained from a user reading is worse than one trained with data obtained from a user's ordinary conversation. Finally, a user's speech changes with time and environment.