Intelligent automated assistants (or virtual assistants) provide an intuitive interface between users and electronic devices. These assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can access the services of an electronic device by providing a spoken user input in natural language form to a virtual assistant associated with the electronic device. The virtual assistant can perform natural language processing on the spoken user input to infer the user's intent and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more functions of the electronic device, and a relevant output can be returned to the user in natural language form.
Some natural language processing systems can perform speaker identification to verify the identity of a user. These systems typically require the user to perform an enrollment process during which the user speaks a series of predetermined words or phrases to allow the natural language processing system to model the user's voice. While this process can be used to effectively model the user's voice, it can produce unreliable results if the user speaks in an unnatural manner during the enrollment process and/or if the user performs the enrollment process in an acoustic environment that is different than those in which the speaker identification is later performed. Thus, improved processes for modeling a user's voice are desired.