The development of good acoustic features for improving the performance of speech analysis has dominated the overall speech analysis effort since the beginning attempts at automatic speech recognition and speaker verification in the 1950's. Through this effort, the short-term amplitude spectrum as a function of time has become the universally accepted measure upon which virtually all successful speech recognizers are currently based. Given this general measure, usually expressed as a spectral amplitude vector sampled uniformly in time, there have been many and various efforts to perform linear and non-linear transformations of the basic measure into forms that yield improved speech analysis performance.
Many current systems employ a linear transformation of the spectral vector which is explicitly designed to yield uncorrelated features. These features are then scaled so that each exhibits equal variance under model conditions. The resultant transformed features are called orthonormal. These orthonormal features, along with a Euclidean distance measure, provide a maximum likelihood recognition answer under typical multivariate Gaussian models. However, even under multivariate Gaussian analysis, these systems still yield a number of errors in speech analysis. These errors most likely result from the inaccuracies of applying a multivariate Gaussian model to the speech measurements.
Therefore, there has arisen a need for a speech analysis method and apparatus which provides a more accurate measurement in performing speech analysis.