1. Field of the Invention
The invention relates in general to a method of pitch mark determination for a speech, and more particularly to a method for detecting a pitch mark of a speech, which is applied to a speech processing system.
2. Description of the Related Art
As speech is the most natural way for human communication and there has been great progress in speech processing over the past few decades, speech has become widely used in the human/machine interface, especially for applying to the information acquisition via telephone, such as the PABX (Private Automatic Branch Exchange) System, the Automated Weather Source System, the Stock Information System, the E-mail Reader System, and so forth. These applications mainly cover fields of speech recognition, speech coding, speaker verification, and speech synthesis.
The speech signals include unvoiced speech and voiced speech. The voiced speech is much more periodic while the unvoiced speech is much more random. In most speech systems, the information of the pitch mark (the start or end point of the pitch period) is first processed by a program automatically and then modified under the control of a hand dial. It is necessary to enhance the program performance for achieving the accuracy of detecting the pitch and pitch mark to decrease the workload of the manual modification. It will be very helpful to the speech synthesis system, which requires establishing new voices quickly or processing a large amount of speech. In addition to the pitch information, the information of the pitch mark is used to analyze the speech characteristics in a period so as to provide help to the promotion of the technology in the speech related fields.
These application fields usually require fundamental frequency or the pitch information. For example, the tone recognition needs to know the pitch contour, the speech coding requires the pitch information, the speaker verification may use fundamental frequency to assist in identity verification, and the speech synthesis of the waveform concatenation requires the pitch information to modify the pitch. Besides, the information of the pitch mark is important to the speech synthesis, and the accuracy of the information of the pitch mark influences the speech quality and the rhythm. As for the speech synthesis and text-to-speech (TTS), the pitch modification requires an accurate pitch mark or pitch-period mark.
It might usually encounter the following two problems while trying to detect the pitch mark: (1) how to acquire the pitch, and (2) how to determine the pitch mark. The acquisition of the pitch can be made by the frequency domain, time domain, or both. Calculating the autocorrelation coefficient is often used. The pitch mark indicates the highest position or the lowest position of the wave in the pitch period. There are several related issued patents as references, which use the following methods: U.S. Pat. No. 5,671,330 searching the local peaks of the dyadic Wavelet conversion as pitch marks, U.S. Pat. No. 5,630,015 performing a cepstrum analysis process to detect a peak of the obtained cepstrum, U.S. Pat. No. 6,226,606 identifying the pitch track according the cross-correlation of two window vectors estimated by the energy of the speech, U.S. Pat. No. 6,199,036 using an auto correlation algorithm to detect the pitch period, U.S. Pat. No. 6,208,958 using spectro-temporal autocorrelation to prevent pitch determination errors, U.S. Pat. No. 6,140,568 filtering out harmonic components to determine which frequencies are fundamental frequencies, U.S. Pat. No. 6,047,254 using order-two Linear Predictive Coding (LPC) and autocorrelation pitch period, U.S. Pat. Nos. 4,561,102 and 4,924,508 finding the peak on the LPC residual, U.S. Pat. No. 5,946,650 using an error function to estimate the low-pass filtering of the speech, U.S. Pat. No. 5,809,453 performing the autocorrelation and cosine transform on the log power spectrum, U.S. Pat. No. 5,781,880 using Discrete Fourier Transform (DFT) to transform the LPC residual, U.S. Pat. No. 5,353,372 introducing Finite Impulse Response (FIR) Filter, U.S. Pat. Nos. 5,321,350 and 4,803,730 finding the point with energy over a predetermined value on the waveform, and U.S. Pat. No. 5,313,553 using two filters.