Voice-activation technology is a rapidly evolving field. Fascinating applications appear almost daily. Prior art in this field is primarily directed toward the interpretation of free-form speech such as dictation and general questions. Most of the emerging applications, however, involve relatively simple devices that perform just a few specific operations. Desirable products that could be fully operated with a few predetermined commands include consumer devices (games, hobby devices, counters and timers, kitchen gadgets, home automation, exercise and sporting applications, toys, learning aids, products for the disabled), industrial systems (hands-free system interfaces, security monitoring, semi-autonomous machining and assembly, devices for rapid counting/sorting/stamping, electronic test and measurement), as well as devices for office, retail, and scientific applications, among many others. Unfortunately, the prior art serves these applications poorly. What is needed is a simple method to recognize a small number of spoken commands, preferably involving minimal software and very low-cost parts.
A patent, U.S. Pat. No. 7,532,038 to Ariav, attempts to resolve this problem by separating the command sounds into two frequency bands using high-pass and low-pass filters. Three commands “Yes”, “No”, and “Stop” are identified. Unfortunately, many commands that consumers expect to use cannot be discriminated on frequency alone, and many command words include brief phonemes for which frequency analysis is ambiguous at best. Consumers are sensitive about their user interface, and do not gladly tolerate awkward commands or arbitrary limitations.
Nearly all prior sound-processing methods operate in the frequency domain, yet sound is transmitted exclusively in the time domain. Converting the sound signal to a frequency spectrum is slower and more expensive than processing the sound as it arrives, in real-time. For example, the dual-bandpass system of Ariav requires multiple gain stages with multiple filter components, and then the low- and high-frequency channels must be digitized separately, all of which increases board cost and software complexity. Perhaps one could digitize an unfiltered signal instead, and then use Fourier analysis to separate the two frequency components; but this would require a greatly expanded processor and memory, negating any savings. Moreover, key features of the sound wave are lost in a conventional FFT because it displaces phase information. The frequency domain is a valid representation of sound only with complex-number or vector Fourier transformation, requiring even larger processors and memories, with costs that more than offset any other savings. A vastly simpler and more versatile approach would be to analyze commands in the time domain by recognizing sound intervals of different types as they occur.
Recent advances in psychology provide useful guidance for voice-command processing. Humans have an amazing ability to focus on one conversation while ignoring other background conversations of equal or greater loudness. This is called selective attention, or informally, the Cocktail-Party effect. Selective attention is basically a signal-processing strategy, not unlike the challenge of picking out a valid voice command from among background noises and non-command speech. Another interesting phenomenon, called attention breakthrough, is the involuntary reaction that occurs when someone calls your name unexpectedly. Your attention is irresistibly diverted by this one particular signal, even while focusing on some other conversation. Possibly these techniques, which have been honed over thousands of years of human evolution, can assist in command identification.
What is needed is voice-activation means for controlling a device involving few predetermined commands and few responsive actions. Preferably the new technology would include simple, compact algorithms to discriminate command sounds, would rapidly recognize a valid command, and would ignore all other sounds. Possibly the new technology would exploit advanced signal processing techniques analogous to those used instinctively by the human brain. Robust, low-cost identification of valid commands would then enable a host of valuable new consumer and industrial applications.