The present invention relates generally to the practice of word recognition in a speech recognition system and, more particularly, to the recognition of words in the presence of high noise.
A long standing problem in speech recognition systems has been the difficulty in effecting acceptable performance in high noise environments. Often speech recognition systems are in an environment where background noise severely degrades the recognition process.
Earlier developments of recognition systems for high noise environments have typically utilized head mounted or hand-held microphones. Such systems have attempted to overcome the problem of background noise by requiring that the microphone be positioned close to the mouth. Although this may be a somewhat effective solution for this type of system, a head mounted or hand-held microphone arrangement is not practical, nor acceptable, for many systems.
The most desirable types of recognition systems are those which operate hands-free. A hands-free system is extremely practical in cases where the operator is required to manually handle controls, such as while driving. Due to high background noise in such an environment, the speech recognition system must be able to accurately distinguish words from the background noise, as well as free the operator from manual control of a microphone. A system of this kind offers substantial improvement to the operator's productivity and concentration.
There have of course been previous attempts to accurately effect a work recognition system in high noise environments. Some approaches subtract an estimate of the background noise from the speech using spectral subtraction and then match the speech to word template memory. Typically, the template memory is segmented into frames of equal time intervals. Likewise, the incoming speech is split into frames before the matching process begins. Each frame from the incoming speech is then compared to frames from the template memory. A match is depicted by a sequence of frames from the incoming speech corresponding to frames of a template in memory. Notwithstanding a particular word template matching technique, spectral subtraction usually requires that an estimate of the background noise be subtracted from the incoming speech before matching to the template.
Some of the more successful recognition systems actually implement the consideration of the background noise into a specific spectral matching technique. However, these systems often require a complex method of comparing frames within word templates to the input frames representing the speech. Adding such complexity has resulted in either a substantially slower recognition process or a restriction to a very specialized, high speed system architecture.
What is needed is a simple method for comparing a wood template frame to an input frame which compensates for the presence of background noise. Such a method should be computationally fast and should not require specialized hardward architecture.