The recent development on parametric coding technology is very active in the audio coding area because of its advantages of high coding efficiency and sound image reproduction. Compared to the traditional waveform coding schemes, parametric coding scheme not only exploits the limitations of the human auditory system, but also attempts to model the incoming audio signal by capturing the sound scene characteristics. One example of known arts in this technical field is a coding method related to a parametric stereo and MPEG surround.
A typical parametric coding device 100 is shown in FIG. 1. The parametric coding device 100 shown in FIG. 1 includes a time-frequency (T-F) transform unit 101, an analyzer 102, a frequency-time (F-T) transform unit 103, and a downmix encoder 104.
The T-F transform unit 101 transforms a plurality of input audio signals 110, which are time signals, into a plurality of frequency signals 111.
The analyzer 102 analyzes the resulting frequency signals 111 in two ways. The analyzer 102 includes a downmix unit 102A and a parameter extraction unit 102B.
The downmix unit 102A generates a mono- or stereo-intermediate downmix signal 112 from the frequency signals 111. The parameter extraction unit 102B extracts parameters from the frequency signals 111 to generate a parameter sub-stream 113 including the extracted parameters.
The F-T transform unit 103 transforms the intermediate downmix signals 112 back to time domain to generate downmix time signals 114.
The downmix encoder 104 compresses the downmix time signals 114 to generate a downmix sub-stream 115 including the compressed signal.
Finally, the parametrically coded audio stream is composed of a downmix sub-stream 115 and an associated parameter sub-stream 113.
Note that in practice, the above two sub-streams are multiplexed into one audio stream. But for clarity of the later description, description of the multiplexing operation in the encoder and the de-multiplexing operation in the decoder is omitted in this description.
A typical parametric decoding device 200 is shown in FIG. 2. The parametric decoding device 200 includes a downmix decoder 201, a T-F transform unit 202, a parameter synthesis unit 203, and a F-T transform unit 204.
The downmix decoder 201 decodes the received downmix sub-stream 115 into a mono- or stereo-time signal 213.
The T-F transform unit 202 transforms again the time signal 213 into parametric analysis domain to generate frequency signal 214.
The parameter synthesis unit 203 synthesizes the frequency signals 214 into a plurality of transformed signals 215, guided by the information derived from the received parameter sub-stream 113.
The F-T transform unit 204 transforms the transformed signals 215 back to the time domain resulting in a plurality of output audio signals 216, which perceptually represent the same spatial sound images as the input one.
The above coding procedures show two features of a parametric encoder: high coding efficiency, which comes from the reduced transmitting channel number; and realistic acoustic scene reconstruction, which is realized by the synthesis of the spatial relevant parameters.
Because of these two features, a parametric encoder is highly preferred to be adopted in telecommunication systems, where each communication site may have a plurality of input audio signals 110 from a plurality of speakers, and a realistic tele-presence effect is usually expected.
FIG. 3 is a diagram showing a telecommunication system 300 performed among four teleconferencing sites 301A to 301D. If it is not necessary to especially distinguish the sites 301A to 301D from each other, each of them is referred to as a site 301.
At each site 301 (site 301A, for example), a parametric codec is adopted. The site 301 performs parametric coding for all of received input audio signals 110 to generate a coded bitstream 116 (including a downmix sub-stream DmxA and a parameter sub-stream ParasA). The generated coded bitstream 116 is transmitted to each of the other three sites 301B to 301D.
Meanwhile, the site 301 receives coded bitstreams 116 from the other sites 301 and performs parametric decoding for each of the received coded bitstreams 116. (Here, the received coded bitstreams 116 include three downmix sub-streams DmxB, DmxC, and DmxD and three parameter sub-streams ParasB, ParasC, and ParasD).
However, generally, to fulfill the requirement of the set-up and to keep the transmission bandwidth reasonably low, directly transmitting a plurality of coded bitstreams 116 from a plurality of transmission sites to one receiving site is not feasible. In order to assure that each site 301 only receives and sends one audio stream, a combination device (multipoint control unit: MCU 305) is introduced, which is connected with all sites 301A to 301D.
The object of the MCU 305 is, for each site 301, to combines a plurality of received coded bitstreams 116 into a single combined bitstream 124 in a computationally efficient way. Ideally, the combined bitstream 124 should be approximated to the one that could be obtained by coding a single virtual site, wherein all coded bitstreams 116 from the other sites 301 are presented.
For this purpose, a straightforward combining scheme can be designed, as shown in FIG. 4. FIG. 4 is a block diagram showing a functional structure of the MCU 305. As shown in FIG. 4, the MCU 305 includes three individual parametric decoders 401 to 403, an adder (adding unit) 404, and a parametric encoder 405.
The three parametric decoders 401 to 403 decode all coded bitstreams 116 which the site 301 (site 301A, for example) receives from the other sites 301 (sites 301B, 301C, and 301D) in order to generate decoded signals 411B, 411C, and 411D in a time domain.
The adder 404 sums the generated decoded signals 411B, 411C, and 411D to generate a sum signal 412.
The parametric encoder 405 re-codes the sum signal 412 to generate a combined bitstream 124.
Even within this simple scenario, it can be seen that for an N-site telecommunication system, such MCU 305 needs N individual tandem parametric decoding and coding processes. As a result, the complexity of the MCU 305 is increased, which increases a delay amount of signal transmission. The complexity is linearly increased with the increase of the number of sites. Therefore, the MCU 305 is unfeasible for real-time application scenarios.
To design the low latency and low complexity MCU 305, it needs to explore further advantage of the parametric coding. That is, its audio stream format of the parametric coding supports the ability to combine two or more streams into a single signal stream in a computationally efficient way. In detail, the audio stream format of the parametric coding allows downmix sub-streams to be combined in the downmix coding domain, and the parameter sub-streams to be combined in the parameter analysis domain.
The state of the art suggests some similar methods to deal with the efficient MCU design.
For example, Patent Reference 1 proposed a scheme to efficiently combine a plurality of parametrically encoded audio signals. However, in Patent Reference 1, to pursue simplicity, the downmix combination and parameter combination are independent. Moreover, the downmix combination scheme therein only shows a partial scheme using very coarse combination methods. And the parameter combination scheme therein does not address the problem of different parameter analyzing domains.
[Prior Art]
[Patent References]
    [Patent Reference 1] US Patent Application Publication No. 2008/0008323, SpecificationNon-Patent References    [Non-Patent Reference 1] S.-W. Huang et al, “A low complexity design of psycho-acoustic model for MPEG-2/4 advanced audio coding”, IEEE Trans. on consumer electronics, November 2004    [Non-Patent Reference 2] T-H Tsai et al, “An MDCT-based psychoacoustic model co-processor design for MPEG-2/4 AAC audio encoder”, Proc. Of the 7th Int. Conference on digital audio effects, 2004    [Non-Patent Reference 3] I. Dimkoviae et al, “Fast software implementation of MPEG advanced audio encoder”, 14th Int. Conference on DSP, 2002