1. Field of the Invention
The present invention relates generally to speech recognition and, more particularly, to a method, article, and system for noise-resilient spotting of spoken keywords from continuous streams of speech data in mismatched environments.
2. Description of the Related Art
Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to machine-readable input (for example, to binary code for a string of character codes). The term “voice recognition” may also be used to refer to speech recognition, but more precisely refers to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said. Speech recognition applications include voice dialing (e.g., “Call home”), call routing (e.g., “I would like to make a collect call”), appliance control, content-based spoken audio search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., word processors or emails), and in aircraft cockpits (usually termed Direct Voice Input).
Speech pattern matching involves the matching of characteristic parameters extracted from an incoming test speech signal with those of a collection of pre-recorded reference speech templates. Keyword spotting, speech recognition, and speaker detection are typical tasks that employ speech pattern matching techniques for recognition or detection purposes. In keyword spotting and speech recognition tasks, the test speech sample and reference speech templates are uttered words, while speaker detection uses several seconds of individuals' voices.