This present invention relates generally to receiving and processing speech signals, and more specifically, techniques to identify keywords in speech, for example, for use in command and control of an electronic device.
The presence of electronic devices in personal and professional settings has increased to the extent that manual operation may no longer be sufficient or suitable to take advantage of their full capabilities. Configurations of electronic devices that include voice-activated, command and control abilities offer practical and convenient methods for operation, in addition to direct physical input. However, voice operation requires rapid and accurate processing of speech signals, using reliable models that address user variability, distortions, and detection errors.
Traditionally, isolated word recognition systems were constructed using models for entire words. Although practical for limited vocabulary size, demand for larger vocabulary necessitated the use of sub-word units to enable the sharing of training examples across contexts and permit the modeling of out-of-vocabulary words. Current, state-of-the-art spoken term detection systems typically employ large vocabulary databases and are based on automatic speech recognition (ASR) methods based on lattice searching. Although these systems have demonstrated good recognition performance, generating comprehensive sub-word lattices and spotting keywords in a large lattice is a slow process with a cost of high computational overhead.
Thus far, systems with limited processing and storage capacities, such as portable electronic devices, have had to rely upon network connectivity, remote data storage, and powerful processing servers to perform computationally-intensive tasks and access larger vocabulary databases.
Consequently, there exists a need for a system and method in the art that can achieve fast and reliable recognition of audio input without the explicit need for remote servers, communications with a network, and any other remote data storage and processing units. Also, there exists a need for a system and method in the art that can be implemented on an electronic device with limited processing resources. Furthermore, there exists a need for a system and method in the art that can construct and implement efficient keyword search models in the absence of substantial amounts of training data. In addition to the above, there exists a need for a system and method in the art that can handle out-of-vocabulary keywords.