1. Field of the Invention
The present invention relates generally to a method and apparatus for extracting pitch information from an audio signal, and in particular, to a method and apparatus for extracting pitch information from an audio signal using morphology to improve accuracy of the extraction of pitch information.
2. Description of the Related Art
In general, an audio signal including a voice signal and a sound signal is classified into a periodic (harmonic) component and a non-periodic (random) component, i.e., a voiced part and an unvoiced part according to statistic characteristics in a time domain and a frequency domain and is called quasi-periodic. The periodic component and the non-periodic component are determined as the voiced part and the unvoiced part according to the existence or non-existence of pitch information, and a periodic voiced sound and a non-periodic unvoiced sound are identified based on the pitch information. Particularly, the periodic component of the audio signal has the most information and significantly affects sound quality. A period of the voiced part is called a pitch. That is, the pitch information is the most important information in all systems using the audio signal, and a pitch error is an element that most significantly affects total system performance and sound quality.
Thus, the degree of accuracy in detecting the pitch information is an important element to improve the performance of the sound quality. Conventional extraction methods of pitch information are based on linear prediction analysis by which a signal of a latter part is predicted using a signal of a foregoing part. In addition, an extraction method of pitch information to represent a voice signal based on a sinusoidal representation and to calculate a maximum likely ratio using the harmonicity of the voice signal has been popularly used because of its excellent performance.
In a linear prediction analysis method which is widely used for voice signal analysis, the performance of this method is affected according to the order of the linear prediction. If the order is increased to improve the performance, the amount of calculation increases, and the performance is nevertheless improved no more than a certain level. The linear prediction analysis method works only when it is assumed that a signal is stationary for a short time. Thus, in a transition area of a voice signal, the prediction cannot follow the rapidly changed voice signal, resulting in failure.
In addition, the linear prediction analysis method uses data windowing. Consequently, it is difficult to detect a spectral envelope if the balance between resolutions of a time axis and a frequency axis is not maintained when the data windowing is selected. For example, for voice having a very high pitch, the prediction follows individual harmonics rather than the spectral envelope because of wide gaps between the harmonics when the linear prediction analysis method is used. Thus, for a speaker, such as a woman or a child, performance shows a tendency to decrease. Regardless of these problems, the linear prediction analysis method is a spectrum prediction method widely used because of a resolution in the frequency domain and an easy application in voice compression.
However, the conventional extraction methods of pitch information have the possibility of pitch doubling or pitch halving. In detail, to extract accurate pitch information from a frame, the length of only a periodic component having pitch information in the frame must be found. However, two (2) times the length of the periodic component may be wrongly found in the pitch doubling, and one half (½) times in the pitch halving. As described above, since the conventional extraction methods of pitch information have a problem in the pitch doubling and the pitch halving, consideration must be given to the pitch error affecting the total system performance and sound quality.
When the pitch error is generated, a frequency considered as the best candidate is selected using an algorithm. The pitch error is classified into a fine error ratio due to the performance limit of the algorithm and a gross error ratio indicating a ratio of the number of frames causing many errors to the number of total frames. For example, when errors are generated in 5 frames of 100 frames, the fine error ratio is a difference between actual pitch information in the 95 frames and pitch information after a checking process. An error range has a tendency to increase according to an increase of noise. The gross error ratio is obtained from an unrecoverable error of around one period in the pitch doubling and around a half period in the pitch halving.
As described above, the conventional extraction methods of pitch information have a tendency to show the bad performance for the pitch error most significantly affecting the total system performance and sound quality due to the pitch doubling or the pitch halving.