Recent advances in computing power and related technology have fostered the development of a new generation of powerful software applications including web-browsers, word processing and speech recognition applications. Newer speech recognition applications similarly offer a wide variety of features with impressive recognition and prediction accuracy rates. In order to be useful to an end-user, however, these features must execute in substantially real-time.
Despite the advances in computing system technology, achieving real-time performance in speech recognition systems remains quite a challenge. Often, speech recognition systems must trade-off performance with accuracy. Accurate speech recognition systems typically rely on digital signal processing algorithms and complex statistical models, generated from large speech and textual corpora.
In addition to the computational complexity of the language model, another challenge to accurate speech recognition is to accurately model and predict the voice characteristics of the speaker. Indeed, in certain languages, the entire meaning of a word is conveyed in the tone of the word, i.e., the pitch of the speech. Many oriental languages are tonal language, wherein the meaning of the word is partially conveyed in the pitch (or tone) in which it is presented. Thus, speech recognition for such tonal languages must include a pitch tracking algorithm that can track changes in pitch (tone) in near real-time. As with the language model above, for very large vocabulary continuous speech recognition systems, in order to be useful, a pitch tracking system must be fast while providing an accurate estimate of fundamental frequency. Unfortunately, in order to provide acceptably accurate results, conventional pitch tracking systems are often slow, as the algorithms which analyze and track voice content for fundamental pitch values are computationally expensive and time consuming—unsuited for real-time interactive applications such as, for example, a computer interface technology.
Thus, a method and apparatus for pitch tracking in audio analysis applications is required, unencumbered by the deficiencies and limitations commonly associated with prior art language modeling techniques.