Any sound can be decomposed into a set of simple oscillations. Theses simple oscillations have a frequency spectrum and time distribution pattern.
A most commonly used method of wave analyzing is Fourier Time-Frequency Transformation (FTT). However, FTT has its limitation when being used in harmonious sound analysis and pitch detection.
Harmonious sound is key to sound perception of human beings. It includes the sound of the vowels of human's speech, human's singing, birdcalls, most of animals' roars, and most of music. Harmonious sound not only is pleased to be heard but also carries rich information for us.
FIG. 11 shows, as a time-energy curve, an example of a piece of harmonious sound, which is taken from a man's sound of the vowel “u”.
Another way of analyzing and describing a piece of sound, as opposed to the way as shown in FIG. 11 of using its time-energy curve, is using its frequency-energy spectrum, as obtained from its time-energy curve using FTT. The frequency spectrum of a harmonious sound features in that it comprises a number of narrow peaks. This means that a very large percentage of the total energy of the harmonious sound concentrates on the frequencies corresponding to these peaks. Moreover, the peak pattern of the spectrum of a harmonious sound is relatively stable during a short period of time. In other words, its main frequency components keep stable both in frequency and energy. If the peak pattern of the spectrum of a sound changes rapidly, then the spectrum does not correspond to a harmonious sound but corresponds to a noise or plosive.
Since the frequency spectrum of a harmonious sound needs to be obtained from a piece of sound (for example from a FTT window), it represents the global feature of this piece of sound. This means it is difficult for a frequency spectrum to allow us to examine more detailed features of this piece of sound, and the ability to detect and measure a sound with rapid change, such as a plosive, is therefore limited.
The time-energy curve (wave) of a harmonious sound has the following features:
1) First, a harmonious sound can be divided into sections nearly equal to one another, as shown in FIG. 12. Here, “nearly” means not exactly equal, thus we say that a harmonious sound has “pseudo” periodicity. The shortest of these sections is called “pitch”, which is the basic tone of the harmonious sound. So a harmonious sound is also called a “pitched sound”. If the pitches in a piece of sound are exactly equal to one another (that is, in the frequency spectrum, all the energy of the sound are in the peak frequencies and all the peaks have the width of zero), the sound will become non-euphonious, dull and unclear. This shows that the “pseudo periodicity” or small changes among pitches, which seem random, are not meaningless, rather, they are important for our hearing perception as they make harmonious sound such as a vowel of human speech more standing out from its background sound and noise.
2) The pitch frequency of a normal human speech is limited in a range, as a range between a minimum pitch frequency and a maximum pitch frequency.
3) A harmonious sound should have enough duration. For example, a vowel of human speech should have duration of, for example, at least five of its pitches.
4) A harmonious sound in human speech should have an energy that is higher enough than its surrounding sound. For example, the sound energy of a vowel of human speech is higher than its neighboring consonant (fricative, plosive, nasal, etc.)
Some of these features are also used in the harmonious sound detection and pitch detection method of the present invention.
Detection of pitches in human voice is of great importance in speech recognition.
For harmonious sound detection and pitch detection, the inventors of the present invention tested a wave section comparison method, as described below.
Wave Section Comparison (WSC) Method
The WSC method uses the original wave stream as input data. First, it splits the wave stream into small sections by, for example, zero-crossing points. Then, it compares the current section to a neighboring section, which has nearly the same width as that of the current small section, as shown in FIGS. 13(a) and (b). On the basis of such comparisons, harmonious sound is detected using likelihood scoring, and the sections having the highest likelihood scoring is chosen as the pitch.
The section comparison is performed by calculating the dot-by-dot difference between the two sections.
The WSC method, however, has its problems, which affect the detection of pitch from a piece of sound signal. The problems include:
1) Lower Frequency Disturbing
When a vowel sound is coupled with a relatively strong oscillation of lower frequency, the result of the section comparison will be seriously affected, as shown by example in FIGS. 14(a)-14(c). From the example of FIGS. 14(a)-(c), it can be seen that the WSC method fails to detect the pitch because the section having a width W0 differs too much from its right neighbor section having width W1. Obviously, this big difference is caused by the lower frequency oscillation that is added to the original sound.
In practice, the AC power source often causes such a problem by adding its 50 Hz low frequency oscillation to the sound detected or recorded.
2) Double Pitch Width Error
Sometimes, two pitch sections are detected as one pitch, so that the pitch width detected is doubled. Sometimes, the pitch width is even tripled.
The example as shown in FIG. 14(c) is also an example of the double pitch width error problem, as shown in FIG. 15.
3) High and Narrow Small Section Shift Error
When a vowel sound is composed of some narrow but high small sections, and the positions of the narrow and high section in the a neighboring pitch section shifts, then the result of comparison will be seriously affected, as shown with the example of FIG. 16. This is because the difference between curves in the two sections near the peaks, shown as Pi and Pj in FIG. 16, is large due to the rapid change of the signal levels. The narrower the peaks are, the greater the error is.