The present invention relates to speech analyzing and synthesizing techniques, and particularly to a speech pitch period extracting apparatus.
There have been developed an analyzing method of eliminating redundancy included in a speech signal and coding the speech at a high efficiency by using a characteristic parameter, and a synthesizing method of synthesizing speech from the code. The most typical system thereof is known as a partial auto-correlation (PARCOR) method. Such methods find wide application in the speech research field, and thus are not described in detail. One of the characteristic parameters of speech obtained by this analysis is a speech pitch period, or a fundamental oscillation period of the vocal chords. The pitch period is one of the most important parameters for determining the sound quality of a synthesized speech as well as the PARCOR coefficient, linear prediction coefficient and amplitude information. To reduce the rate of errors in the pitch extraction, a variety of methods have been discussed. The pitch extraction method can be roughly classified into (a) a method using the correlation value of speech, (b) a method using the correlation value of a waveform (residual waveform) left after the parameter of human vocal tract is extracted from a speech signal and (c) the cepstrum method using the maximum value obtained by the inverse Fourier transformation of the logarithm of the Fourier transformation of a speech signal. These methods, when considering the necessary hardware construction, requires large scale operations involving 20 thousands of data multiplying and adding operations performed in 20 msec for one frame, and thus it takes a considerable time to perform these operations. Therefore, the above-mentioned methods are not suitable for the real-time analysis of speech, and hence have been used only for on-line analysis by computer. In other words, in such on-line analysis, speech waveform information is once stored in a memory and then the pitch is slowly determined by calculation. However, the applications of speech analysis are varied and involve, for example, the input to a speech synthesizing apparatus, a variety of control apparatus to which speech is applied, a speech-responsive control apparatus, a speech recording and/or reproducing apparatus, and so on. Such applications must operate in real-time. Therefore, it is required at any cost to develop a method of analyzing speech in real time, particularly the pitch extracting method of simply extracting speech pitch in a short time at a high accuracy using hardware constituting circuits in LSI form.
The pitch extracting techniques using the correlation method and polarity correlation method as given above are described in, for example, Nobuhiko Kitawaki et al. "On Pitch Extraction in Lattice type PARCOR Analysis" in the articles of the Japan Acoustic Society, October, 1975, pp 321-322.