1. Field of the Invention
The present invention pertains to a method of processing speech signals for use in speech recognition applications. More particularly, the present invention relates to a technique for calculating from a speech signal an intermediate set of features for use in speech recognition applications and for use in speech pitch estimation.
2. Description of the Related Art
Various signal processing techniques have been developed for analyzing and digitizing speech signals, which can then be used for various control functions, e.g. computer operation, etc. Some such known techniques employ short-time Fourier spectra or "monograms" of a speech signal, which are computed using windowed Fourier transforms, as explained more fully in Rabiner et al., Fundamentals of Speech Recognition(1993). The resulting sonograms are then further processed to determine, for example, cepstra, fundamental frequencies, etc. A drawback of such known techniques is that they yield non-robust results.
Another problem in speech analysis is that of automated pitch determination. Knowledge of the pitch contour of a speech signal is essential for various speech applications such as coding, speech recognition and speech synthesis. Most known pitch determination techniques are classified as either time domain based or frequency domain based. Time domain techniques rely on the detection of the fundamental period of oscillation in the speech signal, also known as the peak-to-peak measurement in the amplitude of the speech signal. A drawback of such time-based techniques results from the presence of noise may be missing or disguised.
As for frequency domain techniques, these techniques detect a stack of equally spaced lines in the spectrum of a speech signal. The spacing between the lines is a measurement of pitch. For such frequency domain techniques, noise also presents a problem.