1. Field of Industrial Application
The present invention relates to a method and an apparatus for synthesizing a speech using sinusoidal synthesis, such as the so-called MBE (Multiband Excitation) coding system and Harmonic coding system.
2. Description of the Related Art
There have been proposed several kinds of coding methods in which a signal is compressed by using a statistical property of an audio signal (containing a speech signal and an acoustic signal) in a time region and a frequency region of the audio signal and characteristics of hearing sense. These kinds of coding methods may be roughly divided into a coding method in a time region, a coding method for a frequency region, a coding method executed through the effect of analyzing and synthesizing an audio signal, and the like.
The high-efficient coding method for a speech signal contains an MBE (Multiband Excitation) method, an SBE (Singleband Excitation) method, a Harmonic coding method, an SBC (Sub-band Coding) method, an LPC (Linear Predictive Coding) method, a DCT (Discrete Cosine Transform) method, a MDCT (modified DCT) method, an FFT (Fast Fourier Transform) method, and the like.
Among these speech coding methods, the methods using a sinusoidal synthesis in synthesizing a speech, such as the MBE coding method and the Harmonic coding method, perform the interpolation about an amplitude and a phase, based on the data coded by and sent from an encoder such as the harmonic amplitude and phase data. According to the interpolated parameters, these methods are executed to derive a time waveform of one harmonic whose frequency and amplitude are changing according to time and summing up the same number of time waveforms as the number of the harmonics for synthesizing the waveforms.
However, the transmission of the phase data may be often restricted in order to reduce a transmission bit rate. In this case, the phase data for synthesizing sinusoidal waveforms may be a value predicted so as to keep the continuity on the frame border. This prediction is executed at each frame. In particular, the prediction is continuously executed in the transition from a voiced frame to an unvoiced frame and, vice versa.
In the unvoiced frame, no pitch exists. Hence, no pitch data is transmitted. This means that the predicative phase value deviates from a correct one as the phase is being predicted. This results in the predicative phase value gradually deviating from a zero phase addition or a .pi./2 phase addition, each of which has been originally expected. This deviation may degrade the acoustic quality of a synthesized sound.