1. Field of the Invention
The present invention relates to a voice coding system and a decoding system based on hierarchical coding.
2. Description of the Related Art
Conventionally, a voice coding and decoding system based on hierarchical coding, in which a sampling frequency of a reproduction signal is variable depending upon a bit rate to be decoded, has been employed intending to make it possible to decode a voice signal with relatively high quality while band width is narrow, even when a part of packet drops out upon transmitting the voice signal on a packet communication network. For example, in Japanese Unexamined Patent Publication No. Heisei 8-263096 (hereinafter referred to as "publication 1"), there has been proposed a coding method and a decoding method for effecting hierarchical coding of an acoustic signal by band division. In this coding method, upon realization of hierarchical coding with N hierarchies, a signal consisted of a low band component of an input signal is coded in a first hierarchy, a differential signal derived by subtracting n-1 in number of signals coded and decoded up to the (n-1)th hierarchy from a signal consisted of a component of the input signal having wider band than the (n-1)th hierarchy, in the (n)th hierarchy (n=2, . . . , N-1) is coded. In the (N)th hierarchy, a differential signal derived by subtracting N-1 in number of signals coded and decoded up to the (N-1)th hierarchy from the input signal, is coded.
Referring to FIG. 12, operation of the voice coding and decoding system employing a Code Excited Linear Predictive (CELP) coding method in coding each hierarchy, will be discussed. For simplification of disclosure, the discussion will be given for the case where number of hierarchies is two. Similar discussion will be given with respect to three or more hierarchies. In FIG. 12, there is illustrated a construction, in which a bit stream coded by a voice coding system can be decoded by two kinds of bit rates (hereinafter referred to as high bit rate and low bit rate) in a voice decoding system. It should be noted that FIG. 12 has been prepared by the inventors as a technology relevant to the present invention on the basis of the foregoing publication and publications identified later.
Referring to FIG. 12, discussion will be given with respect to the voice coding system. A down-sampling circuit 1 down-samples (e.g. converts a sampling frequency from 16 kHz to 8 kHz) an input signal to generate a first input signal and output to a first CELP coding circuit 2. Here, the operation of the down-sampling circuit 1 has been discussed in P. P. Vaidyanathan, "Multirate Systems and Filter Banks", Chapter 4.1.1 (FIG. 4.1-7) (hereinafter referred to as publication 2). Since reference can be made to the disclosure of the publication 2, discussion will be neglected.
The first CELP coding circuit 2 performs a linear predictive analysis of the first input signal per every predetermined frames to derive a linear predictive coefficient expressing spectrum envelop characteristics of a voice signal and encodes an excitation signal of a corresponding linear predictive synthesizing filter and the derived linear predictive coefficient, respectively. Here, the excitation signal is consisted of a frequency component indicative of a pitch frequency, a remaining residual component and gains thereof. The frequency component indicative of the pitch frequency is expressed by an adaptive code vector stored in a code book storing past excitation signals, called as an adaptive code book. The foregoing residual component is expressed as a multipulse signal disclosed in J-P. Adoul et al. "Fast CELP Coding Based on Algebraic Codes" (Proc. ICASSP, pp. 1957-1960, 1987) (hereinafter referred to as "publication 3").
By weighted summing of the foregoing adaptive code vector and the multipulse signal with a gain stored in the gain code book, the excitation signal is generated.
A reproduced signal can be synthesized by driving the foregoing linear predictive synthesizing filter by the foregoing excitation signal. Here, selection of the adaptive code vector, the multipulse signal and the gain is performed to make an error power minimum with audibility weighting of an error signal between the reproduced signal and the first input signal. Then, an index corresponding to the adaptive code vector, the multipulse signal, the gain and the linear predictive coefficient is output to a first CELP decoding circuit 3 and a multiplexer 7.
In the first CELP decoding circuit 3, with taking the index corresponding to the adaptive code vector, the multipulse signal, the gain and the linear predictive coefficient as input, decoding is performed, respectively. By weighted summing of the adaptive code vector and the multipulse signal weighted by the gain, the excitation signal is derived. By driving the linear predictive synthesizing filter by the excitation signal, the reproduced signal is generated. Also, the reproduced signal is output by an up-sampling circuit 4.
The up-sampling circuit 4 generates a signal by up-sampling (e.g. converted the sampling frequency from 8 kHz to 16 kHz) the reproduced signal to output to a differential circuit 5. Here, with respect to the up-sampling circuit 4, since reference can be made to Chapter 4.1.1 (FIG. 4.1-8), discussion will be neglected.
The differential circuit 5 generates a differential signal of the input signal and the up-sampled reproduction signal and outputs it to a second CELP coding circuit 6.
The second CELP coding circuit 6 effects coding of the input differential signal similarly to the first CELP coding circuit 2. The index corresponding to the adaptive code vector, the multipulse signal, the gain and the linear predictive coefficient is output to the multiplexer 7. The multiplexer 7 outputs the four kinds of indexes input from the first CELP coding circuit 2 and the four kinds of indexes input from the second CELP coding circuit 6 with converting into the bit stream.
Next, discussion will be given hereinafter with respect to the voice decoding system. The voice decoding system switches operation by a demultiplexer 8 and a switch circuit 13 depending a control signal identifying two kinds of bit rates capable of decoding operation.
The demultiplexer 8 inputs the bit stream and the control signal. When the control signal indicates the high bit rate, the four kinds of indexes coded in the first CELP coding circuit 2 and the four kinds of indexes coded by the second CELP coding circuit 6 are extracted to output to a first CELP decoding circuit 9 and a second CELP decoding circuit 10, respectively. On the other hand, when the control signal indicates low bit rate, the four kinds of indexes coded in the first CELP coding circuit 2 is extracted to output only to the first CELP decoding circuit 9.
The first CELP decoding circuit 9 decodes respective of the adaptive code vector, the multipulse signal, the gain and the linear predictive coefficient from the four kinds of indexes input, by the same operation as the first decoding circuit 3 to generate the first reproduced signal to output to the switch circuit 13.
In the up-sampling circuit 11, the first reproduced signal input via the switch circuit 13 up-samples similarly to the up-sampling circuit 4 to output the up-sampled first reproduced signal to the adder circuit 12.
The second CELP decoding circuit 10 decodes respective of the adaptive code vector, the multipulse signal, the gain and the linear predictive coefficient from the input four kinds of indexes to generate the reproduced signal to output to the adder circuit 12.
The adder circuit 12 adds the input reproduced signal and the first reproduced signal up-sampled by the up-sampling circuit 11 to output to the switch circuit 13 as a second reproduced signal.
The switch circuit 13 inputs the first reproduced signal, the second reproduced signal and the control signal. When the control signal indicates high bit rate, the input first reproduced signal is output to the up-sampling circuit 11 to output the input second reproduced signal as the reproduced signal of the voice coding system. On the other hand, when the control signal indicates low bit rate, the input first reproduced signal is output as the reproduced signal of the voice coding system.
Next, referring to FIG. 13, discussion will be given with respect to the coding circuit on the basis of the CELP coding method used in the first CELP coding circuit 2 and the second CELP coding circuit 6, shown in FIG. 12.
Referring to FIG. 13, a frame dividing circuit 101 divides the input signal input via an input terminal 100 per every frame to output to a sub-frame dividing circuit 102. The sub-frame dividing circuit 102 further divides the input signal in the frame per every sub-frame to output to a linear predictive analyzing circuit 103 and a target signal generating circuit 105. The linear predictive analyzing circuit 103 performs linear predictive analysis of the signal input via the sub-frame dividing circuit 103 per sub-frame to output linear predictive coefficient a(i), i=1, . . . , Np, to a linear predictive coefficient quantizing circuit 104, a target signal, generating circuit 105, an adaptive code book retrieving circuit 107 and a multipulse retrieving circuit 108. Here, Np is order of linear predictive analysis, e.g. "10". As linear predictive analyzing method, autocorrelation method, covariance method and so forth. Detail has been discussed in Furui, "Digital Voice Processing" (Tokai University Shuppan Kai), Chapter 5 (hereinafter referred to as "publication 4").
In the linear predictive coefficient quantization circuit 104, the linear predictive coefficients obtained per sub-frame are aggregatingly quantized per the frame. In order to reduce the bit rate, quantization is performed at the final sub-frame in the frame. For obtaining the quantized value of other sub-frame, a method to use an interpolated value of the quantized values of the relevant frame and the immediately preceding frame is frequently used. The quantization and interpolation are performed after conversion of the linear predictive coefficient into linear spectrum pair (LSP). Here, conversion from the linear predictive coefficient into LSP has been disclosed in Sugamura, et al. "Voice Information Compression by Linear Spectrum Pair (LSP) Voice Analysis Synthesizing Method" (Paper of Institute of Electronics and Communication Engineers of Japan, J64-A, pp. 599-606, 1981 (hereinafter referred to as "publication 5")). As the quantization method of LSP, a known method can be used. A particular method has been disclosed in Japanese Unexamined Patent Publication No. Heisei 4-171500 (Patent Application No. 2-297600) (hereinafter referred to as "publication 6"), for example. The disclosure of the publication 6 is herein incorporated by reference.
Also, the linear predictive coefficient quantization circuit 104 converts the quantized LSP into quantized linear predictive coefficients a' (i), i=I, . . . , Np and then output the quantized linear predictive coefficient to the target signal generating circuit 105, the adaptive code book retrieving circuit 107 and the multipulse retrieving circuit 108 to output to an output the index indicative of the quantized linear predictive coefficient to an output terminal 113.
The target signal generating circuit 105 generates an audibility weighted signal by driving an audibility weighted filter Hw(z) as expressed by the following equation (1) with the input signal: ##EQU1##
wherein R1 and R2 are weighting coefficients controlling audibility weighting amount and, for example R1=0.6 and R2=0.9
Next, the linear predictive synthesizing filter (see next equation (2)) of the immediately preceding sub-frame held in the of the same circuit and an audibility weighted synthesizing filter Hsw(z) continuously connecting the audibility weighted filters Hw(z) are driven by the excitation signal of the immediately preceding sub-frame. Subsequently, a filter coefficient of the audibility weighted synthesizing filter is modified by a current sub-frame to drive the same filter by a zero input signal having all signal values being zero to derive a zero input response signal. ##EQU2##
Furthermore, by subtracting the zero input response signal from the audibility weighted signal, the target signals X(n), n=0, . . . , N-1 are generated. Here, N is a sub-frame length. On the other hand, the target signal X(n) is output to the adaptive code book retrieving circuit 107, the multipulse retrieving circuit 108 and the gain retrieving circuit 109.
In the adaptive code book retrieving circuit 107, by the excitation signal of the immediately preceding sub-frame obtained via a sub-frame buffer 106, the adaptive code book storing past excitation signals is updated. The adaptive code vector signals Adx(n), n=0, . . . , N-1, corresponding to a pitch dx are signals sampled N samples going back for dx samples from the sample immediately preceding sub-frame of the current sub-frame. Here, when the pitch dx is shorter than the sub-frame length N, the sampled dx samples repeatedly connected up to the sub-frame length to generate the adaptive code vector signal.
Using the generated adaptive code vector signal Adx(n), n=0, . . . , N-1, the audibility weighted synthesizing filter initialized per sub-frame (hereinafter referred to as audibility weighted synthesizing filter Zsw(z) in zero state) is driven to generate a reproduced signal SAdx(n), n=0, . . . , N-1. Then, a pitch d making an error E1(dx) of the target signal X(n) and the reproduced signal SAdx(n) as expressed by the following equation(3) is selected from a predetermined retrieving range (e.g. dx=17, . . . , 144). The adaptive code vector signal of the pitch d and the reproduced signal are set to be Ad(n) and SAd(n), respectively. ##EQU3##
On the other hand, the adaptive code book retrieving circuit 107 outputs the index of the selected pitch d to an output terminal 110 and the selected adaptive code vector signal Ad(n) to the gain retrieving circuit 109, and the reproduced signal SAd(n) thereof to the gain retrieving circuit 109 and the multipulse retrieving circuit 108.
In the pulse retrieving circuit 108, P in number of non-zero pulses consisting the multipulse signal are retrieved. Here, positions of respective pulses are not limited to pulse position candidates. However, all of the pulse position candidates become mutually different values. For example, when sub-frame length N=40 and pulse number P=5, the example of the pulse position candidate is shown in FIG. 15.
On the other hand, an amplitude of the pulse is only polarity. Accordingly, coding of the multipulse signal may be performed with assuming total number of combinations of the pulse position candidates and polarities being J, by establishing the multipulse signal of Cjx(n), n=0, . . . , N-1, with respect to the index jx indicative of the combinations, driving the audibility weighted synthesizing filter Zsw(z) in zero state by the multipulse signal, generating reproduced signals SCjx(n), n=0, . . . , N-1, and selecting the index j so that the error E2(jx) expressed by the following equation (4) to be minimum. This method has been disclosed in the foregoing publication 3 and Japanese Unexamined Patent Publication No. Heisei 9-160596 (Patent Application No. 7-318071) (hereinafter referred to as "publication 7"). The disclosure is herein incorporated by reference. The multipulse signal corresponding to the selected index j and the reproduced signal thereof are assumed to be Cj(n) and SCj(n). ##EQU4##
where X' (n), n=0, . . . , N-1 are signals derived by orthogonalizing the target signal X(n) with respect to the reproduced signal SAd(n) of the adaptive code vector signal as expressed by the following equation (5). ##EQU5##
On the other hand, the multipulse retrieving circuit 108 outputs the selected multipulse signal Cj(n) and the reproduced signal SCj(n) thereof to the gain retrieving circuit 109 and corresponding index to the output terminal 111.
In the gain retrieving circuit 109, the gains of the adaptive code vector signal and the multipulse signal are two-dimensional vector quantized. The gains of the adaptive code vector signal and the multipulse signal accumulated in the gain code book of the code book size K are respective assumed to be Gkx(0), Gkx(1), kx=0, . . . , K-1. The index k of the optimal gain is selected to make the error E3(kx) as expressed by the following equation (6) to be minimum using the reproduced signal SAd(n) of the adaptive code vector, the reproduced signal SCj (n) of the multipulse and the target signal X(n). The gains of the adaptive code vector signal and the multipulse signal of the selected index k are respectively assumed to be Gk(0) and Gk(1). ##EQU6##
On the other hand, the excitation signal is generated using the selected gain, the adaptive code vector and the multipulse signal and output to a sub-frame buffer 106. Also, the index corresponding to the gain is output to the output terminal 112.
Next, referring to FIG. 14, a construction of the decoding circuit based on the CELP coding system, employed in the first CELP decoding circuit 3 on the coding side and also employed in the first CELP decoding circuit 9 and the second CELP decoding circuit on the decoding side, will be discussed.
In the linear predictive coefficient decoding circuit 118, the quantized linear predictive coefficients a' (i), i=1, . . . , Np decoded from the input index via the input terminal 114 to output to the reproduced signal generating circuit 122.
In the adaptive code book decoding circuit 119, the adaptive code vector signal Ad(n) decoded from the index of the foregoing pitch via the input terminal is output to the gain decoding circuit 121, and in the multipulse decoding circuit 120, the multipulse signal Cj(n) decoded from the index of the multipulse signal input via the input terminal 117 is also output to the gain decoding circuit 121.
In the gain decoding circuit 121, the gains Gk(0) and Gk(1) are decoded from the index of the gains input via the input terminal 115 to generate the excitation signal using the adaptive code vector signal, the multipulse signal and the gain to output to the reproduced signal generating circuit 122.
In the reproduced signal generating circuit 122, the reproduced signal is generated by driving the linear predictive synthesizing filter Hs(z) by the excitation signal to output to an output terminal 123.
However, the voice coding and decoding system discussed with reference to FIGS. 12 to 14 encounters a problem in insufficiency of coding efficiency in hierarchical CELP coding of the voice signal in second and subsequent hierarchies.
The reason is that, in the (n)th hierarchy (n=2, . . . , N), the differential signal derived by subtracting n-1 in number of reproduced signal CELP coded and decoded up to the (n-1)th hierarchy from the input signal, is CELP coded.
Namely, in the (n)th hierarchy, respective coding parameters (linear predictive coefficient, pitch, multipulse signal and gain) upon CELP coding of the differential signal are different from the quantization error value of the corresponding parameter up to the (n-1)th hierarchy. Therefore, information expressed by the coder of each parameter of (n-1)th hierarchy and information expressed by the coder of the (n)th hierarchy overlap not to improve coding efficiency of respective coding parameter and thus not to improve quality of the reproduced signal.