1. Field of the Invention
The present invention relates to a method and apparatus for distinguishing between voiced and unvoiced speech elements and more particularly to a method and apparatus wherein a measure of the location of the spectrum of the speech element is determined.
2. Description of the Prior Art
Speech analysis, whether for speech recognition, speaker recognition, speech synthesis, or reduction of the redundancy of a data stream representing speech, involves the step of extracting the essential features, which are compared with known patterns, for example. Such speech parameters are vocal tract parameters, beginnings and endings of words, pauses, spectra, stress patterns, loudness; general pitch, talking speed, intonation, and not least the discrimination between voiced and unvoiced sounds.
The first step involved in speech analysis is, as a rule, the separation of the speech-data stream to be analyzed into speech elements each having a duration of about 10 to 30 ms. These speech elements, commonly called "frames", are so short that even short sounds are divided into several speech elements, which is a prerequisite for a reliable analysis.
An important feature in many, if not all languages is the occurrence of voiced and unvoiced sounds. Voiced sounds are characterized by a spectrum which contains mainly the lower frequencies of the human voice. Unvoiced, crackling, sibilant, fricative sounds are characterized by a spectrum which contains mainly the higher frequencies of the human voice. This fact is generally used to distinguish between voiced and unvoiced sounds or elements thereof. A simple arrangement for this purpose is given in S. G. Knorr, "Reliable Voiced/Unvoiced Decision", IEEE Transactions on Acoustics, Speech, and Signal Processing, VOL. ASSP-27, No. 3, June 1979, pp. 263-267.
It is also known, however, that the location of the spectrum alone, characterized, for example, by the location of the spectral centroid, does not suffice to distinguish between voiced and unvoiced sounds, because in practice, the boundaries are fluid. From U.S. Pat. No. 4,589,131, corresponding to EP-B1-0 076 233, it is known to use additional, different criteria for this decision.