This invention relates to a system for coding
with high efficiency and more particularly to method and system for voice coding which are suitable for providing a reproduced voice of high quality at a high information compression rate.
In the past, a variety of highly efficient voice coding systems have been proposed. For example, "Digital Information Compression" by Kazuo Nakada, published by Kohsaido Sampoh Shuppan, Electronic Science Series 100 explains plainly various systems, showing many systems belonging to waveform coding system and information source coding system (parameter coding system). One may also refer to "Study of Vector Coding of Voice" by Moriya et al, Papers SP86-16 (1986) of Voice Research Conference, The Institute of Electronics and Communication Engineers of Japan, and JP-A-63-285599.
Of the above conventional systems, the waveform coding system can generally insure good voice quality but has difficulties in raising the information compression efficiency, and the parameter coding system can provide high information compression efficiencies but is disadvantageous in that even with the amount of information increased, improvements in voice quality are limited and sufficiently high quality can not be obtained. Thus, in an information compression region (near 10 kb/s) between bands which are well adapted for the above two systems, the performance particularly in terms of voice quality relative to the quantity of the information is degraded. Under the circumstances, a hybrid system utilizing advantages of the above two systems has recently been proposed, including a multipath type (for example, B. S. Atal et al, "A new Model of LPC Excitation for Producing Natural-Sounding Speech at Low Bit Rates" Proc, ICASSP 82, PP. 614-617, (1982), a CELP type (B. S. Atal et al, "Stochastic coding of speech signals at very low bit rates" Proc. ICC 84, pp. 1610-1613 (1984)) and a TOR type (A. Ichikawa et al, "A Speech Coding method Using Thinned-out Residual" Proc, ICASSP 85, pp. 961-964 (1985)), and has been studied from various view points. But the hybrid system is still unsatisfactory from the standpoint of not only voice quality but also processing expense.
In general, various highly efficient coding systems is using the fact that voice information is locally existent within the range in which parameters are available. The above technical idea has been further developed positively. Combination of a plurality of parameters is represented with a vector. The localization of the vectors is noticed, so that the voice informations can be represented by smaller informations. Such a system, called a vector quantization system and disclosed in, for example, R. M. Gray, "Vector Quantization" IEEE ASSP Magazine, pp. 4-29, (1984, 4) has been highlighted. To describe the vector quantization system more specifically, when a voice is expressed using suitable parameters, the parameters are distributed in a special pattern because of the structure of human mouth. As an example, FIG. 1 graphically shows a voice expressed in terms of two parameters a and b. Most of human speech can be expressed by parameter values filling within an area A. In order for the voice to undergo vector quantization, the area A is divided into a great number of domains and codes 1, 2, 3, . . . specifying individual domains are allotted thereto.
In the case of scaler quantization, when the voice repreSented by a point x in FIG. 1 is coded, a.sub.1 as a parameter "a" and b.sub.1 as a parameter "b" are independently transmitted. On the other hand, in the case of vector quantization, code 12 is transmitted. The code 12 specifies the divided region in which the point x is included.
In the case of scalar quantization, the voice information is represented with the value from amin to a.sub.max as the parameter "a" and with the value from bmin to b.sub.max as the parameter "b" in order to cover the whole area in which there is voice information. Since the parameters "a" and "b" are independently used, the information used for representing the voice is allotted to each divided region within the rectangular region represented by B in FIG. 1. As a result, the voice information is allotted to the region (B - A), even though the voice is not actually present in that region. On the other hand, in the case of vector quantization, since the information used for representing the voice is allotted only in the region represented by A in FIG. 1 in which the voice is present, the information can be compressed more than is possible with scalar quantization.
The method of decoding transmitted codes in the vector quantization is explained below. Each divided region is represented by a representing vector, each having values for each of the parameters which represent the divided region. The representing vector is called a code vector or a centroid. this system is provided with a table called a code book in which the representing vector and the corresponding code are listed. Identical code books are provided on the transmitting side (coding side) and on the receiving side (decoding side) respectively, so that the representing vector corresponding to the transmitted code can be obtained by searching the code book. However, in general, there is a difference between the vector representing the actual input voice (referred to hereinafter input vector) and the representing vector which is obtained. The difference is a quantization distortion.
In the vector quantization system, in order to realize high quality voice coding, it is necessary to prepare in advance a code book of high quality which can express a voice with as high fidelity as possible. To this end, many problems have to be solved including the necessity of use of a sufficiently large amount of speech as training data and the decision as to how many codes the code book should contain and as to what parameters should be used. As a countermeasure against problems encountered in preparation of the code book, a fuzzy vector method (for example, H. P. Tseng, et al, "Fuzzy Vector Quantization Applied to Hidden Markou Modeling" ICASSP 87', 4 (1987)) has been proposed wherein a membership function is used for determining the input voice through interpolation. The membership function represent the degree of similarity between the input vector and each of the representing vectors by using numerical values. The similarity is concretely represented by the distance between the input vector and each of the representing vectors. In the fuzzy vector method, in spite of the fact that the voice quality is expected to be improved in proportion to the quality of the code book, it is not used as technique for transmission because of a large amount of the membership function. At present, the use of the fuzzy vector method for pre-processing of speech recognition has been studied at the most. In addition, a KNN method (for example, "Study of Normalization of Spectrogram by Using Fuzzy Vector Quantization" by Nakamura et al., Papers SP87-123 of Voice Research Conference, Feb. 19, 1988) has been proposed wherein with the view of decreasing the amount of information, the input voice is compared with each of all the representing vectors registered in a code book so that only N vectors close to a point representative of the input vector may be used. The KNN method, however, requires a sorting processing for selection of the N representing vectors (code vectors) close to the input voice point and the amount of processing in the sorting processing raises a very severe problem from the practical standpoint. Further, the transmission of codes of all the N representing vectors causes loss on the amount of the information to be transmitted.