The present invention relates generally to automatic speech recognition systems. More particularly, the invention relates to techniques for adapting the recognizer to perform better in the presence of noise.
Current automatic speech recognition systems perform reasonably well in laboratory conditions, but degrade rapidly when used in real world applications. One of the important factors influencing recognizer performance in real world applications is the presence of environmental noise that corrupts the speech signal. A number of methods, such as spectral subtraction or parallel model combination, have been developed to address the noise problem. However, these solutions are either too limited or too computationally expensive.
Recently, a Jacobian adaptation method has been proposed to deal with additive noise, where the noise changes from noise A to noise B. For example, U.S. Pat. No. 6,026,359 to Yamaguchi describes such a scheme for model adaptation in pattern recognition, based on storing Jacobian matrices of a Taylor expansion that expresses model parameters. However, for this method to perform well it is necessary to have noise A and noise B close to one another in terms of character and level. For example, the Jacobian adaptation technique is likely to work well where noise A is measured within the passenger compartment of a given vehicle travelling on a smooth road at 30 miles an hour, and where Noise B is of a similar character, such as the noise measured inside the same vehicle on the same road travelling at 45 miles per hour.
The known Jacobian adaptation technique begins to fail where noise A and B lie farther apart from one another, such as where noise A is measured inside the vehicle described above at 30 miles per hour and noise B is measured in the vehicle with windows down or at 60 miles per hour.
This shortcoming of the proposed Jacobian noise adaptation method limits its usefulness in many practical applications because it is often difficult to anticipate at training time the noise that may be present at testing time (when the system is in use). Also, improvements in Jacobian noise adaptation techniques are limited in many applications because the computational expense (processing time and/or memory requirements) needed makes them impractical.
The present invention addresses the foregoing shortcoming. Instead of using Jacobian matrices, the invention uses a transformed matrices which resembles the form of a Jacobian matrix but comprises different values. The transformed matrices compensate for the fact that the respective noises at training time and at recognition time may be far apart. The presently preferred embodiment of the inventive method effects a linear or non-linear transformation of the Jacobian matrices using an xcex1-adaptation parameter to develop the transformed matrices. The transformation process can alternatively be effected through other linear or non-linear transformation means, such as using a neural network or other artificial intelligence mechanism. To speed computation, the resulting transformed matrices may be reduced through a dimensionality reduction technique such as principal component analysis.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings.