The present invention relates to pattern recognition apparatus and methods, and more particularly, to apparatus and methods that provide for automatic pattern recognition using category-dependent feature selection.
Machine listening systems often employ rudimentary simulations of the human auditory system to mimic human perception and cognition of sound. For example, in the case of speech recognition, the well-known Linear Predictive Coding (LPC) model spectrum is built on an all-pole model of resonances of the vocal tract, while the well-known Mel-Frequency Cepstral Coefficients (MFCC) are based on an approximation of critical bands. Most such front-end processing methods, however, are based on only crude approximations of the peripheral auditory system, with little or no consideration for latter stages along the auditory cortex where signal representations may undergo further transformations.
It was disclosed by R. Lippmann in “Speech recognition by machines and humans,” Speech Communication, vol. 22, no. 1, pp: 1-15, March 1997, that automatic speech recognition systems perform far worse than human listeners under noisy conditions. Hence, while much research is aimed at developing functional approximations to human capabilities, there is an intense interest in building computational models that accurately and extensively mimic human physiology. Studying such physiological models may also lead to a better understanding of human audition, thereby offering the possibility of improved functional models.
Relatively recent developments discussed by K. Wang and S. A. Shamma in “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and Audio Processing, vol. 3, no. 5, pp. 382-395, September 1995, for example, include simulations of the neural encoding of the primary auditory cortex (A1) in the central auditory system, as an extension to the peripheral auditory model developed in “Auditory representations of acoustic signals,” IEEE Trans. Information Theory, vol. 38, no. 2, pp. 824-839, March 1992, by X. Yang, K. Wang, and S. A. Shamma. K. Wang et al. disclose that the one-dimensional auditory spectrum produced by the peripheral model is transformed into a three-dimensional, data-redundant response in the A1, which may encode auditory features that are relevant to perception and cognition in a more explicit, place-coded manner.
It would be desirable to have apparatus and methods that use such new, elaborate auditory models to improve upon conventional approaches and provide for automatic pattern recognition using category dependent feature selection.