The parameter indicative of the pitch period is very important for speech sound analysis and synthesis because the pitch has a material effect on the quality of the synthesized speech sound. An error in the measurement of the pitch seriously affects the quality of the synthesized sound.
Some methods of pitch detection have been disclosed in U.S. Pat. No. 3,717,756 granted Feb. 20, 1973 to Stitt; U.S. Pat. No. 4,282,406 granted Aug. 4, 1981 to Yato; and U.S. Pat. No. 4,081,605 granted Mar. 28, 1978 to Kitawaki et el.
Some methods of pitch period detection use block processing of speech signals in which a finite number of consecutive samples of speech are periodically selected as a group and stored for processing. Such a pitch period detection method is useful in off line analysis. Stream processing of sample speech signals, on the other hand, is useful for real time processing. A continuous group of consecutive signal samples are selected, in stream processing, by passing the signal stream past a window. As each new sample is added to the group, the oldest sample is deleted.
A common problem in known methods of pitch detection relates to the substantial amount of memory required to process speech signal samples. Typically, in stream processing with pitch detection by the autocorrelation function (ACF), a window of about 320 samples at 8 KHz may be used. For each ACF value, there are required about 200 operations comprising multiplications and additions. Assuming about 100 ACF values are necessary, about 20,000 operations are needed for each estimate. Further, assuming about 200 shifts per second, about 4,000,000 operations per second are required. Additional processing, such as searching for the maximum, reading the ACF value from memory, writing the ACF value in memory, and the like, required for the AFC method of pitch detection would increase the number of operations to at least 16,000,000 operations per second.
Microprocessors built from a single chip are available on the market. These microprocessors are desirable, because of their size and cost, for use in speech processing. Some of these microprocessors, however, have small memory capacity for storage of dynamic data, for example, 120 words of 20 bits each, which is substantially less than the amount required as described above. Furthermore, available microprocessors do not meet the computation speed requirements. It is desirable to modify the ACF method of pitch detection to be able to use low cost and small size microprocessors.