The present invention relates to a pitch detecting device for detecting a fundamental pitch frequency of voice and, more particularly, to a pitch detecting device of a voice analyzer/synthesizer in which voice spectrum data, fundamental pitch frequency data, and so on are used as transmission parameters.
In voice transmission using a digital transmission system, a method such as a linear prediction coding method is used to perform compression of data amount or secret conversation. According to this method, only basic parameters which constitute a voice, such as voice signal spectrum data, voiced/unvoiced data, a fundamental pitch frequency, voice amplitude data, and so on, are extracted at every predetermined periods, digitized and transmitted, and reproduced by a receiver. For example, assume that a voice signal is band-compressed to a digital signal of 2,400 bps. In this case, when a frame period as a basic parameter extraction unit is set to be 20 ms, 48 bits are assigned to each frame.
The spectrum data is called a prediction coefficient in the linear prediction coding method, a PARCOR coefficient in the partial autocorrelation method, and an LSP coefficient in the line spectrum pair analysis method, and represents phonemic data of a voice. The voiced/unvoiced data is data used for selecting a sound source in accordance with whether the analysis frame is a voiced or unvoiced frame when speech synthesis is performed. The fundamental pitch frequency is the fundamental frequency of a voice in a voiced frame. When speech synthesis is performed, the fundamental pitch frequency becomes a pulse interval of a voiced sound source. The amplitude data is data representing electric power of an input voice and is usually expressed by the product of the amplitude mean of an input voice and the prediction residual amplitude upon spectrum data extraction.
A pitch detecting device used in a conventional voice analyzer/synthesizer detects the pitch from a maximum value of the autocorrelation function or a minimum value of the amplitude mean difference function from an input voice waveform or a residual waveform obtained by filtering an input voice through an inverse filter. Particularly, when a method using a residual waveform is used, the spectrum envelope of an input voice is removed and the impulse of a vocal cord appears conspicuously as shown in FIG. 1B. Therefore, a better performance is obtained than a method for detecting the pitch directly from an input voice waveform. FIG. 1A shows an original waveform. In FIGS. 1A and 1B, time is plotted in units of 4 ms on the axis of abscissa.
However, when the input voice waveform is, e.g., a sine wave which, when input in an inverse filter, is filtered with a very high gain, the residual waveform becomes white noise, as shown in FIG. 2B, and no conspicuous impulse appears. It becomes then difficult to detect the pitch even by autocorrelation or the like. FIG. 2A shows an original waveform. In FIGS. 2A and 2B, the time is plotted in units of 4 ms on the axis of abscissa.