Recently, the use of mobile devices such as smartphones and tablet computers has become widespread. These devices typically provide voice and/or data communication functionalities over wireless networks. In addition, such mobile devices typically include other features that provide a variety of functions designed to enhance user convenience.
One of the features in mobile devices that is being used increasingly is a speech recognition function. Such a function allows a mobile device to perform various functions when a voice command (e.g., a keyword) from a user is recognized. For example, the mobile device may activate a voice assistant application, play an audio file, or take a picture in response to the voice command from the user.
In conventional mobile devices, manufacturers or carriers often equip the devices with sound models that may be used to detect associated keywords. However, such devices generally include a limited number of sound models and keywords. Accordingly, users may be limited to using only the keywords and sound models as originally provided in the devices. In some devices, users may generate a sound model for detecting a new keyword by training the sound model based on a number of utterances for the keyword. Such sound models generated in response to a user input may not be very accurate in detecting the new keyword due, for example, to insufficient sampling of the keyword.