The invention is related to the detection of spoken commands, and particularly to means for setting a detection threshold based on the observed noise and command sound levels.
Smart phones, tablet computers, smart watches, and a host of other emerging products are essentially mobile reprogrammable voice-activated devices that accept user-installed software modules (“apps”), from a wide range of developers who are generally unrelated to the device maker. The ability to merge a voice-activation app from one vendor, with a sound-detecting device from another vendor, has created a novel problem: how to set a command-detection threshold before the first command is spoken? When a new voice-activated application is started on a new device for the first time, nothing is known about the signal levels associated with command sounds in that device. It is impossible to set an empirical threshold based on command signals that have not yet been detected. Prior voice-activated devices did not have this problem because their detection settings were pre-calibrated by design and pre-installed during manufacture of the device. But the situation has changed, now that voice-activated apps run on thousands of different types of devices with widely varying electronic properties. In every case, the signal level of a spoken command is unknown to the software developer. And yet the software must set a command-detection threshold quickly, because users demand that the application detect the first command with high reliability. As application developers well know, users are notoriously intolerant of spoken-command failures. Users may try the command a second time, but if the application misses the first two commands due to threshold uncertainty, most users will trash the app. Therefore it is essential that the application establish an effective command-detection threshold quickly when the application begins.
This problem of setting the initial command-detection threshold, is expected to increase in the coming years due to the expanding availability of embedded and wearable devices, as well as the expected proliferation of implanted bio-devices in the near future, all of which are independently programmable at the user's will. It is safe to assume that, before this patent expires, voice-activated devices will penetrate all personal, home-automation, industrial-robotic, medical, and security applications. In each of these applications, and wherever a user can mix software and hardware from different sources, setting the initial threshold for spoken command detection will be problematic.
Prior art VAD (voice-activity detection) offers little help in this problem, because prior VAD methods assume that the command signal levels are already known. The current problem is just the opposite, to detect the first command when the electronic and acoustical properties are unknown. Therefore there is a growing need for means to set thresholds for detecting spoken commands when an installed software module starts up for the first time, and the signal levels of spoken commands are unknown.