The present invention relates to coded speech decoding systems and, more particularly, to a method of decoding coded speech with less computational effort than in the prior art in case when the number of channels of speech signal that a coded speech decoder outputs is less than the number of channels that are encoded in a coded speech signal.
Heretofore, multi channel speech signals have been coded and decoded by, for instance, a system called xe2x80x9cDolby AC-3xe2x80x9d. xe2x80x9cDolby AC-3xe2x80x9d techniques are detailed in xe2x80x9cATSC Doc. A/52xe2x80x9d, Advanced Television Systems Committee, November 1994 (hereinafter referred to as Literature Ref. 1, and incorporated herein in its entirety).
The prior art coded speech decoding system will first be briefly described. In the prior art coded speech decoding system, input speech signal is first converted through an MDCT (modified discrete cosine transform), which is in the mapping transform, to MDCT coefficients as frequency domain. In this mapping transform, either one of two different MDCT functions prepared in advance is used depending on the character of speech signal to be coded. Which one of the MDCT functions is to be used is coded in auxiliary data. The MDCT coefficients thus obtained are coded separately as exponents and mantissas in the case of expressing in a binary number of floating point system. The mantissas are variable run length coded based on the importance of the subjective coding quality of the MDCT coefficients. Specifically, the coding is performed by using a larger number of bits for the mantissa of an MDCT coefficient with greater importance and a smaller number of bits for the mantissa of an MDCT coefficient with less importance. The exponents and mantissas obtained as a result of the coding and also the auxiliary data, are multiplexed to obtain the coded speech (in the form of a coded bit stream).
FIG. 3 is a block diagram showing a prior art coded speech decoding system. The illustrated prior art coded speech decoding system comprises a coded speech input terminal 1, a coded speech separating unit 2, an exponent decoding unit 3, a mantissa decoding unit 4, an assigned bits calculating unit 5, an IMDCT (inverse MDCT: mapping) unit 60 and a decoded speech output terminal 7. In the following description of operation of the prior art coded speech decoding system, a case is taken, in which coded speech, obtained as a result of coding of an n-channel speech signal, is decoded to an m-channel decoded speech signal. This process of converting a number n of coded audio channels to a smaller number m of decoded channels without loss of information is known in the art as downmixing (see Ref. 1, p. 82). It is used, for example to convert coded five-channel xe2x80x9csurroundxe2x80x9d sound (n=5) to two-channel stereo (m=2), and the following description will be presented in terms of that application.
The coded speech signal obtained through the coding of the 5 channel speech signal is inputted to the coded speech signal input terminal 1. The coded speech signal inputted to the input terminal 1 is outputted to the coded speech signal separating unit 2.
The coded speech signal separating unit 2 separates the coded speech bit stream into exponent data, mantissa data and auxiliary data, and outputs these data to the exponent decoding unit 3, the mantissa decoding unit 4 and the IMDCT unit 4, respectively.
The exponent decoding unit 3 decodes the exponent data to generate 256 MDCT exponent coefficient per channel for each of the 5 channels. The generated exponent MDCT coefficient for the 5 channels are outputted to the assigned bits calculating unit 5 and the IMDCT unit 60. Hereinunder, the MDCT exponent coefficient of CH-th (CH=1, 2, . . . , 5) channel is referred to as EXP(CH, 0), EXP(CH, 1), . . . , EXP(CH, 255), and N in MDCT exponent coefficient EXP(CH, N) is referred to as frequency exponent.
The assigned bits calculating unit 5 generates assigned bits data for MAXCH channels in a procedure described in Literature Ref. 1, taking human""s psychoacoustic characteristics into considerations, with reference to the MDCT exponent coefficient inputted from the exponent decoding unit 3, and outputs the generated assigned bits data to the mantissa decoding unit 4.
The mantissa decoding unit 4 generates the MDCT mantissa coefficients, each expressed as a floating point binary number, for the 5 channels.
The generated MDCT mantissa coefficients for the 5 channels are outputted to the IMDCT unit 60. Hereinunder, CH-th (CH=1, 2, . . . , 5) channel MDCT mantissa coefficients are referred to as MAN(CH, N), is referred to as the N""th frequency mantissa.
The IMDCT unit 60 first derives the MDCT coefficients from the MDCT mantissa coefficients and MDTC exponent coefficients. Then, the unit 60 converts the MDTC coefficients to the MAXCH-channel speech signal through IMDCT using the transform function designated by the auxiliary data and by windowing. Finally, the unit 60 converts the 5-channel speech signal to 2-channel decoded speech signal through weighting multiplification of the 5-channel speech signal by weighting coefficients each predetermined for each channel. The 2-channel decoded speech signal thus generated is outputted from the decoded speech signal output terminal 7.
FIG. 4 is a block diagram showing an example of the internal structure of the IMDCT unit 60 in the prior art coded speech signal decoding system when the number of the channels is 5.
MDCT exponent coefficient EXP(CH, N) of CH-th (CH=1, 2, . . . , 5) channel for N""th frequency exponent (N=0, 1, . . . , 255) is inputted to the input terminal 100.
MDCT mantissa coefficient MAN(CH, N) of CH-th (CH=1, 2,. . . , 5) channel for frequency exponent N (N=0, 1, . . . , 255) is inputted to the input terminal 101.
Auxiliary data including identification of transform function data of CH-th (CH=1, 2, . . . , 5) channel is inputted to the input terminal 102.
The MDCT exponent coefficient EXP(CH, N) and the MDCT mantissa coefficient MAN(CH, N) are outputted to an MDCT coefficient generator 110.
The MDCT coefficient generator 110 generates MDCT coefficient MDCT(CH, N) of CH-th (CH=1, 2, . . . , 5) channel for N""th frequency exponent (N=0, 1, . . . 255) by executing computational operation expressed as
MDCT(CH, N)=MAN(CH, N)xc3x972{circumflex over ( )}(_EXP(CH, N))
where X{circumflex over ( )}Y represents raising X to power Y.
MDCT coefficient MDCT(CH, N) of the CH-th channel (CH=1, 2, . . . , 5) channel for frequency exponent N (N=0, 1, . . . , 255), is outputted to transform function selector 12-CH of CH-th channel (i.e., transform function selectors 12-1 to 12-5 as shown in FIG. 4).
Transform function selection data of the CH-th (CH=1, 2, . . . , 5) channel inputted to the input terminal 102, is outputted to the pertinent transform function selectors 12-CH. According to the transform function data of CH-th (CH=1, 2, . . . , 5) channel ,transform function selector 12-CH selects either a 512- or a 256-point IMDCT 22-CH or 23-CH for the CH-th channel as transform function to be used, and outputs CH-channel MDCT coefficient MDCT(CH, 0), MDCT(CH, 1), . . . , MDCT(CH, 225) to the selected MDCT function.
CH-channel 512-point IMDCT 22-CH, when selected for CH-th (CH=1, 2, . . . , 5) channel by the pertinent CH-channel transform function selector 12-CH, converts MDCT coefficient MDCT (CH, N) of CH-channel to windowing signal WIN(CH, N) of CH-channel for frequency exponent N (N=0, 1, . . . , 255) through 512-point IMDCT.
The windowing signal WIN(CH, N) of CH-th channel thus obtained is outputted to windowing processor 24-CH of CH-channel. At this time, 256-point IMDCT 23-CH of CH-channel is not operated and does not output any signal. 256-point IMDCT 23-CH of CH-channel, when selected by the pertinent CH-channel transfer function selector 12-CH, converts CH-channel MDCT coefficient MDCT (CH, N) for frequency exponent N (N=0, 1, . . . , 255) to CH-channel windowing signal WIN(CH, N) through 256-point IMDCT. At this time, CH-channel 512-point IMDCT 22-CH is not operated and does not output any signal.
The 512-point IMDCT 22-CH for CH-channel executes the 512-point IMDCT in the following procedure, which is shown in Literature Ref. 1. The 512-point IMDCT is a linear transform.
(1) The 256 MDCT coefficients to be converted are referred to X(0), X(1), . . . , X(255).
Also,
xcos 1(k)=xe2x88x92cos(2xcfx80(8k+1)÷4096)
and
xsin 1(k)=xe2x88x92sin(2xcfx80(8k+1)÷4096)
are set as such.
(2) Calculations on
xe2x80x83Z(K)=(X(225xe2x88x922k)+jxc3x97X(2k))xc3x97(xcos 1(k)+jxc3x97sin 1(k))
are executed for k=0, 1, . . . , 127.
(3) Calculations on                               z          ⁡                      (            n            )                          =                              ∑            0            127                    ⁢                                    z              ⁡                              (                k                )                                      ·                          (                                                cos                  ⁡                                      (                                          8                      ⁢                                              xe2x80x83                                            ⁢                      π                      ⁢                                              xe2x80x83                                            ⁢                                              kn                        /                        N                                                              )                                                  +                                  j                  ·                                      sin                    ⁡                                          (                                              8                        ⁢                                                  xe2x80x83                                                ⁢                        π                        ⁢                                                  xe2x80x83                                                ⁢                                                  kn                          /                          N                                                                    )                                                                                  )                                                          (Formula  1)            
are executed for n=0, 1, . . . , 127.
(4) Calculations on
y(n)=z(n)xc3x97(xcos 1(n)+jxc3x97sin 1(n))
are executed for n=0, 1, . . . , 127.
(5) Calculations on
x(2n)=xe2x88x92yi(64+n),
x(2n+1)=yr(63xe2x88x92n),
x(128+2n)=xe2x88x92yr(n),
x(128+2n+1)=yi(128xe2x88x92nxe2x88x921),
x(256+2n)=xe2x88x92yr(64+n),
x(256+2n+1)=yi(64xe2x88x92nxe2x88x921),
x(384+2n)=yi(n)
and
x(384+2n+1)=xe2x88x92yr(128xe2x88x92nxe2x88x921)
where yr(n) and yi(n) are the real number and imaginary number parts, respectively, of y(n), are executed for n=0, 1, . . . , 127.
(6) Signals x(0), x(1), . . . , x(255) are outputted as windowing signal.
The 256-point IMDCT 23-CH of CH-channel executes the 256-point IMDCT in the following procedure, which is shown in Literature Ref. 1. This 256-point IMDCT is a linear transform.
(1) The 256 MDCT coefficients to be converted are referred to X(0), X(1), . . . , X(255).
Also,
xcos 2(k)=xe2x88x92cos(2xcfx80(8k+1)÷2048)
and
xsin 2(K)=xe2x88x92sin(2xcfx80(8k+1)÷2048)
are set as such.
(2) Calculations on
X1(k)=X(2k)
and
X2(k)=X(2k+1)
are executed for k=0, 1, . . . , 127.
(3) Calculations on
Z1(k)=(X1(128xe2x88x922kxe2x88x921)+jxc3x97X1(2k))xc3x97(xcos 2(k)+jxc3x97xsin 2(k))
and
Z2(k)=(X2(128xe2x88x922kxe2x88x921)+jxc3x97X2(2k)xc3x97(xcos 2(k)+jxc3x97xsin 2(k))
are executed for k=0, 1, . . . , 63.
(4) Calculations on                               z1          ⁡                      (            n            )                          =                              ∑            0            63                    ⁢                                    z1              ⁡                              (                k                )                                      ·                          (                                                cos                  ⁡                                      (                                          16                      ⁢                                              xe2x80x83                                            ⁢                      π                      ⁢                                              xe2x80x83                                            ⁢                                              kn                        /                        512                                                              )                                                  +                                  j                  ·                                      sin                    ⁡                                          (                                              16                        ⁢                                                  xe2x80x83                                                ⁢                        π                        ⁢                                                  xe2x80x83                                                ⁢                                                  kn                          /                          512                                                                    )                                                                                                                              (Formula  2)            
and                               z2          ⁡                      (            n            )                          =                              ∑            0            63                    ⁢                                    z2              ⁡                              (                k                )                                      ·                          (                                                cos                  ⁡                                      (                                          16                      ⁢                                              xe2x80x83                                            ⁢                      π                      ⁢                                              xe2x80x83                                            ⁢                                              kn                        /                        512                                                              )                                                  +                                  j                  ·                                      sin                    ⁡                                          (                                              16                        ⁢                                                  xe2x80x83                                                ⁢                        π                        ⁢                                                  xe2x80x83                                                ⁢                                                  kn                          /                          512                                                                    )                                                                                                                              (Formula  3)            
are executed for n=0, 1, . . . , 63.
(5) Calculations on
y1(n)=z1(n)xc3x97(xcos 2(n)+jxc3x97xsin 2(n))
and
Y2(n)=z2(n)xc3x97(xcos 2(n)+jxc3x97xsin 2(n))
are executed for n=0, 1, . . . , 63.
(6) Calculations on
xe2x80x83x(2n)=xe2x88x92yi1(n),
x(2n+1)=yr1(64xe2x88x92nxe2x88x921),
x(128+2n)=yr1(n),
x(128+2n+1)=yi1(64xe2x88x92nxe2x88x921),
x(256+2n)=xe2x88x92yr2(n),
x(256+2n+1)=yi2(64xe2x88x92nxe2x88x921),
x(384+2n)=yi2(n)
and
x(384+2n+1)=yr2(64xe2x88x92nxe2x88x921)
where yr 1(n) and yi 1(n) are the real number and imaginary number parts, respectively, of y1(n), are executed for n=0, 1, . . . , 63.
(7) Signals x (0), x(1), . . . , x(255) are outputted as windowing signal.
Windowing processor 24-CH of CH-th (CH=0, 1, . . . , 5) channel converts windowing signal WIN (CH, N) (n=0, 1, . . . , 255) of CH-channel to speech signal PCM (CH, n) of CH-th channel by executing calculations on linear transform formulas
PCM(CH,n)=2xc3x97(WIN(CH,n)xc3x97(W(n)+DELAY(CH,n)xc3x97W(256+n))
and
DELAY(CH,n)=WIN(CH,256+n)
where W(n) is a constant representing a window function as prescribed in Literature Ref. 1. DELAY(CH, n) is a storage area prepared in the decoding system, and it should be initialized once to zero when starting the decoding. The speech signal PCM(CH, n) of CH-channel thus obtained as a result of the conversion is outputted to a weighting adding processor 250.
The weighting adding processor 250 generates decoded speech signals LPCM(n) and RPCM(n) (n=0, 1, . . . , 255) of 1-st and 2-nd channel by executing calculations on                               LPCM          ⁡                      (            n            )                          =                              ∑                          i              =              1                        MAXCH                    ⁢                                    LW              ⁡                              (                i                )                                      ·                          PCM              ⁡                              (                                  i                  ,                  N                                )                                                                        (Formula  4)            
and                               RPCM          ⁡                      (            n            )                          =                              ∑                          i              =              1                        MAXCH                    ⁢                                    RW              ⁡                              (                i                )                                      ·                          PCM              ⁡                              (                                  i                  ,                  N                                )                                                                        (Formula  5)            
which are liner transform formulas. In this instance, LW(1), LW(2), . . . , LW(5) and RW(1), RW(2), . . . , RW(5) are weighting constants, which are described as constants in Literature Ref. 1. Decoded speech signals LPCM(n) and RPCM(n) of the 1-st and 2-nd channel are outputted from output terminals 26-1 and 26-2, respectively.
The prior art coded speech decoding system as described above, has a problem that it requires great IMDCT computational effort, because the IMDCT and the windowing are each executed once for each channel.
An object of the present invention is to provide a coded speech decoding system, which permits IMDCT with less computational effort.
According to the present invention, there is provided a coded speech decoding system comprising: a mapping transform means for converting a time domain speech signal having a fast number of channels n to m frequency domain bitstream; a weighting addition means for executing a predetermined weighting adding process on the frequency domain speech signal obtained in the mapping transform means to output a speech signal using channels in a second channel number; an inverse mapping transform means for converting the second channel number speech signal to a time domain speech signal; and windowing means for executing a predetermined windowing process on the time domain speech signal obtained in the inverse mapping transform means.
The mapping transform is modified discrete cosine transform, and the inverse mapping is modified inverse discrete cosine transform. When the inverse mapping transform is executed by using one of a plurality of preliminarily prepared different transform functions, the process of converting the channel number is executed for each transform function. If any transform function is not used for any of the n channels, the n to m channel conversion and the inverse mapping transform are not performed with the unused transform function.
According to another aspect of the present invention, there is provided a coded speech decoding system featuring converting a time domain speech signal having n channels to a frequency domain speech signal; executing a predetermined weighting adding process on the frequency domain speech signal for each of a plurality of different transfer functions; converting a speech signal obtained after the weighting adding process to a time domain speech signal, and executing a predetermined windowing process on the time domain speech signal thus obtained.
According to other aspect of the present invention, there provided a coded speech decoding apparatus comprising: MDCT coefficients generator for generating MDCT coefficients on the basis of channel MDCT exponent coefficient, channel MDCT mantissa coefficient and auxiliary data including channel transform function data; channel transform function selector for selecting one of a plurality of weighting processors according to a channel transform function data contained in the auxiliary data; weighting adder processor for executing a weighting adding process on the MDCT coefficients as frequency domain signal from the output of the channel transform function selector; IMDCT processor for executing IMDCT on the output signal from the weighting adder processor; channel adder for generating windowing signal on the basis of the output of the IMDCT processor; and window processor for converting the window signal from the channel adder into a speech signal.
According to still other aspect of the present invention, there provided a coded speech decoding method comprising the steps of: converting an n-channel time domain speech signal a frequency domain speech signal; executing a predetermined weight adding process on the frequency domain speech signal for each of a plurality of different transfer functions; converting the speech signal obtained through the weighting adding process to a time domain speech signal; and executing a predetermined windowing processing on the time domain speech signal.
Other objects and features will be clarified from the following description with reference to attached drawings.