(1) Field of the Invention
This invention relates to a voice encoding and decoding device.
(2) Description of a Prior Art
For encoding and decoding a voice for the purpose of transmission and storage of voice information, a voice encoding and decoding device intitally separates an input voice which is expressed either in analog or digital signals into a predictive parameter and a predictive error signal.
The predictive parameter is encoded directly and transmitted or stored. As to the predictive error signal, because it has a flat and very wide frequency spectrum, a base band component of the predictive error signal is only extracted and encoded and transmitted or stored. Thereafter, the encoded signal of the predictive parameter and the base band component are decoded. A reproduced voice will be principally composed by controlling the predictive error signal per se with the predictive parameter.
However, the base band component of the predictive error signal is only obtainable by decoding the transmitted or stored signals. A higher frequency component must be prepared from the base band component and added to the base band component for generating an exciting signal which is used instead of the predictive error signal. As the exciting signal thus obtained has a frequency sprectrum not as flat as that of the original predictive error signal, a satisfactory composite voice is not obtainable.
In the prior art mentioned above, the frequency characteristics of an emphasis circuit and the gain of an amplifier which amplifies the output signal of the emphasis circuit must be set to make the mean value of the exciting signal as flat as possible over a long time period in order to obtain a satisfactory composite voice.
FIG. 1 shows a circuit diagram of a conventional voice encoding and decoding device.
FIG. 2 shows frequency characteristics of main portions of the circuit shown in FIG. 1. For facilitating the explanation, the input voice signal 1 is described as an analog signal, but it may be described also as a digital signal. In FIG. 1, an fed input voice signal 1 input to a predictor 2 is processed to produce a predictive parameter 3 by means of a linear predictor 2a. A predictive error signal 5 is obtained by controlling the frequency characteristics of a filter 2c inputting the voice, such as a transversal filter, with an encoded predictive parameter 4 which has previously been encoded by an encoder 2b. As a voice is considered that it is formed from an impulsive sound and a white noise filtered through a filter of a throat and a mouth, a voice can be expressed by an impulsive sound, a white noise and frequency characteristics of such a filter composed of a throat and mouth. The linear predictor 2a predicts the frequency characteristics of such a filter and the predictive parameter 3 expresses these characteristics. The frequency characteristics of the filter 2c is controlled by an encoded predictive parameter 4 so as to have the characteristic opposite to those of a filter composed of a throat and the like. For this reason, the more accurate the prediction is, the more identical the output of the filter 2c namely a predictive error signal 5 becomes with either an original wave form of an impulsive sound or that of a white noise, and consequently the frequency spectrum of the predictive error signal 5 is made flat as shown in FIG. 2(a). The reason for controlling the frequency characteristics of the filter 2c with the predictive parameter 4 is to absorb quantization errors produced in encoding into the predictive error signal 5. A number of bits is required, if a predictive error signal 5 is directly encoded.
Therefore, as is shown in FIG. 2(b), a base band component 7 is extracted alone from the predictive error signal by a low-pass filter 6 having for example fc=800 Hz as shown and is encoded by an encoder 8. This encoded base band component 9 and the above mentioned encoded predictive parameter 4 are used for transmission or storage. Reference numeral 10 denotes a transmission line or a memory. The high frequency component of the predictive error signal 5 which has been removed by the low-pass filter 6 is reproduced from the base band component for supplement when composing a voice in such a manner as mentioned hereinafter.
After having transmitted or storaged the encoded base band component 9 and the encoded predictive parameter 4, they are decoded by decoders 11 and 12 respectively. The output of the decoder 11 is freed from the decoded noise by a low-pass filter 13 and becomes a decoded base band component 14 which is the same as the original base band component 7. This decoded base band component 14 is input to a non-linear circuit 15 which generates a signal 16 having a higher harmonics component as shown in FIG. 2(c). The signal 16 is input to an emphasis circuit 17 for emphasizing the high frequency component of the signal 16 to get a signal 18 having an emphasized high frequency component as shown in FIG. 2(d). The signal 18 is then supplied to a high-pass filter 19 to make the high frequency component 20 as shown in FIG. 2(e) which has been removed by the low-pass filter 6 or 13. This high frequency component 20 is amplified by an amplifier 21 to get a high frequency component 22 for supplement of the band component 14. The high frequency component 22 is added to the base band component 14 by an adder circuit 23 to get an exciting signal 24.
A voice composing filter 25, for example, a transversal filter whose frequency characteristics are controlled by the decoded predictive parameter 26 to be made frequency characteristics which are substantially the same as those of the filter composed of a throat and the like composes and outputs a reproduced voice sound by passing the exciting signal 24. The voice composing filter 25 is also possible to be controlled directly by the encoded predictive parameter 4. However, as the frequency characteristics of the emphasis circuit 17 and the gain of the amplifier 21 are determined in such a manner that the meanvalue of the frequency spectrum of the exciting signal 24 is made flat over a long time period as has been mentioned above, the frequency spectrum over a short time period is not flat as is shown in FIG. 2(f). This causes the inferior quality of the composite voice of such a conventional device as explained above.