The present invention relates to computer speech systems. In particular, the present invention relates to pitch tracking in computer speech systems.
Computers are currently being used to perform a number of speech related functions including transmitting human speech over computer networks, recognizing human speech, and synthesizing speech from input text. To perform these functions, computers must be able to recognize the various components of human speech. One of these components is the pitch or melody of speech, which is created by the vocal cords of the speaker during voiced portions of speech. Examples of pitch can be heard in vowel sounds such as the "ih" sound in "six".
The pitch in human speech appears in the speech signal as a nearly repeating waveform that is a combination of multiple sine waves at different frequencies. The period between these nearly repeating waveforms determines the pitch.
To identify pitch in a speech signal, the prior art uses pitch trackers. A comprehensive study of pitch tracking is presented in "A Robust Algorithm for Pitch Tracking (RAPT)" D. Talkin, Speech Coding and Synthesis, pp.495-518, Elsevier, 1995. One such pitch tracker identifies two portions of the speech signal that are separated by a candidate pitch period and compares the two portions to each other. If the candidate pitch period is equal to the actual pitch of the speech signal, the two portions will be nearly identical to each other. This comparison is generally performed using a cross-correlation technique that compares multiple samples of each portion to each other.
Unfortunately, such pitch trackers are not always accurate. This results in pitch tracking errors that can impair the performance of computer speech systems. In particular, pitch-tracking errors can cause computer systems to misidentify voiced portions of speech as unvoiced portions and vice versa, and can cause speech systems to segment the speech signal poorly.