In recent years, electronic devices such as smartphones, tablet computers, wearable electronic devices, smart TVs, and the like are becoming increasingly popular among consumers. These devices typically provide voice and/or data communication functionalities over wireless or wired networks. In addition, such electronic devices generally include other features that provide a variety of functions designed to enhance user convenience.
Conventional electronic devices often include a speech recognition function for receiving voice commands from a user. Such a function allows an electronic device to perform a function associated with a voice command (e.g., a keyword) when the voice command from a user is received and recognized. For example, the electronic device may activate a voice assistant application, play an audio file, or take a picture in response to the voice command from the user.
In electronic devices having a speech recognition feature, manufacturers or carriers often equip the devices with predetermined keywords and associated sound models, which may be used in detecting the keywords in an input sound. Some electronic devices may also allow a user to designate a keyword as a voice command. For example, electronic devices may receive several utterances of a keyword from a user and generate a keyword model for the designated keyword from the utterances.
In general, the detection performance of a keyword model is related to the number of utterances from which the keyword model is generated. That is, the detection performance of a keyword model may improve as the number of utterances increases. For example, a manufacturer may provide a keyword model in an electronic device that has been generated from thousands of utterances or more.
In conventional electronic devices, however, the number of utterances of a keyword received from a user is relatively small (e.g., five). Thus, the keyword model generated from such limited number of utterances may not produce adequate detection performance. On the other hand, receiving a substantial number of utterances from a user to generate a keyword model that can provide sufficient detection performance may be time consuming and inconvenient to the user.