The standards for coding multi-channel audio signals include the Dolby digital standard and Moving Picture Experts Group-Advanced Audio Coding (MPEG-AAC) standard. These coding standards implement transmission of the multi-channel audio signals by basically coding an audio signal of each channel in the multi-channel audio signals separately. These coding standards are referred to as discrete multi-channel coding, and the discrete multi-channel coding enables coding signals for 5.1 channel practically at a bit rate around 384 kbps as the lowest limit.
On the other hand, Spatial-Cue Audio Coding (SAC) is used for coding and transmitting multi-channel audio signals in a totally different method. An example of SAC is the MPEG surround standard. As described in NPL 1, the MPEG surround standard is to (i) downmix a multi-channel audio signal to one of a 1-channel audio signal and 2-channel audio signal, (ii) code the resulting downmix signal that is one of the 1-channel audio signal and the 2-channel audio signal using e.g., the MPEG-AAC standard (NPL 2) and the High-Efficiency (HE)-AAC standard (NPL 3) to generate a downmix coded stream, and (iii) add spatial information (spatial cues) simultaneously generated from each channel signal to the downmix coded stream.
The spatial information includes channel separation information that separates a downmix signal into signals included in a multi-channel audio signal. The separation information is information indicating relationships between the downmix signals and channel signals that are sources of the downmix signals, such as correlation values, power ratios, and differences between phases thereof. Audio decoding apparatuses decode the coded downmix signals using the spatial information, and generate the multi-channel audio signals from the downmix signals and the spatial information that are decoded. Thus, the multi-channel audio signals can be transmitted.
Since the spatial information to be used in the MPEG surround standard has a small amount of data, increment of information in one of a 1-channel downmix coded stream and a 2-channel downmix coded stream is minimized. Thus, since the multi-channel audio signals can be coded using information having the same amount of data as that of one of a 1-channel audio signal and a 2-channel audio signal, in accordance with the MPEG surround standard, the multi-channel audio signals can be transmitted at a lower bit rate, compared to those of the MPEG-AAC standard and the Dolby digital standard.
For example, a realistic sensations communication system exists as a useful application of the coding standard for coding signals with high quality sound at a low bit rate. Generally, two or more sites are interconnected through a bidirectional communication in the realistic sensations communication system. Then, coded data is mutually transmitted and received between or among the sites. An audio coding apparatus and an audio decoding apparatus in each of the sites codes and decodes the transmitted and received data, respectively.
FIG. 7 illustrates a configuration of a conventional multi-site teleconferencing system, which shows an example of coding and decoding audio signals when a teleconference is held at 3 sites.
In FIG. 7, each of the sites (sites 1 to 3) includes an audio coding apparatus and an audio decoding apparatus, and a bidirectional communication is implemented by exchanging audio signals through communication paths having a predetermined width.
In other words, the site 1 includes a microphone 101, a multi-channel coding apparatus 102, a multi-channel decoding apparatus 103 that responds to the site 2, a multi-channel decoding apparatus 104 that responds to the site 3, a rendering device 105, a speaker 106, and an echo canceller 107. The site 2 includes a multi-channel decoding apparatus 110 that responds to the site 1, a multi-channel decoding apparatus 111 that responds to the site 3, a rendering device 112, a speaker 113, an echo canceller 114, a microphone 108, and a multi-channel coding apparatus 109. The site 3 includes a microphone 115, a multi-channel coding apparatus 116, a multi-channel decoding apparatus 117 that responds to the site 2, a multi-channel decoding apparatus 118 that responds to the site 1, a rendering device 119, a speaker 120, and an echo canceller 121.
There are many cases where constituent elements in each site include an echo canceller for suppressing an echo occurring in a communication through the teleconferencing system. Furthermore, when the constituent elements in each site can transmit and receive multi-channel audio signals, there are cases where each site includes a rendering device using a Head-Related Transfer Function (HRTF) so that the multi-channel audio signals can be oriented in various directions.
For example, the microphone 101 collects an audio signal, and the multi-channel coding apparatus 102 codes the audio signal at a predetermined bit rate at the site 1. As a result, the coded audio signal is converted into a bit stream bs1, and the bit stream bs1 is transmitted to the sites 2 and 3. The multi-channel decoding apparatus 110 for decoding to a multi-channel audio signal decodes the transmitted bit stream bs1 into the multi-channel audio signal. The rendering device 112 renders the decoded multi-channel audio signal. The speaker 113 reproduces the rendered multi-channel audio signal.
Similarly, at the site 3, the multi-channel decoding apparatus 118 decodes a coded multi-channel audio signal, the rendering device 119 renders the decoded multi-channel audio signal, and the speaker 120 reproduces the rendered multi-channel audio signal.
Although the site 1 is a sender and the sites 2 and 3 are receivers in the aforementioned description, there are cases where (i) the site 2 may be a sender and the sites 1 and 3 may be receivers, and (ii) the site 3 may be a sender and the sites 1 and 2 may be receivers. These processes are concurrently repeated at all times, and thus the realistic sensations communication system works.
The main goal of the realistic sensations communication system is to bring a communication with realistic sensations. Thus, any of 2 sites that are interconnected to each other needs to reduce uncomfortable feelings from the bidirectional communication. Additionally, the other problem is that the bidirectional communication is costly.
Performing a bidirectional communication with less uncomfortable feelings and at lower cost needs to satisfy some requirements. The requirements for the coding standard in which an audio signal is coded includes (1) a shorter time period for coding the audio signal by the audio coding apparatus and for decoding the audio signal by the audio decoding apparatus, that is, lower algorithm delay by the coding standard, (2) enabling transmission of the audio signal at a lower bit rate, and (3) satisfying higher sound quality.
Since sound extremely degrades according to a decrease in a bit rate in accordance with e.g., the MPEG-AAC standard and the Dolby digital standard, the difficulty lies in maintaining sound quality high enough to convey realistic sensations and provide less communication cost. In contrast, the SAC standard including the MPEG surround standard enables reducing a transmission bit rate while maintaining the sound quality. Thus, the SAC standard is a coding standard relatively suitable for achieving the realistic sensations communication system with less communication cost.
In particular, the main idea of the MPEG surround standard that is superior in sound quality and that belongs to the SAC standard is that spatial information of an input signal is represented by parameters with a less amount of information, and a multi-channel audio signal is synthesized with the parameters and a downmix signal that is downmixed to one of a 1-channel audio signal and a 2-channel audio signal and transmitted. The reduction in the number of channels of an audio signal to be transmitted can reduce a bit rate in accordance with the SAC standard, which satisfies the requirement (2) that is important in the realistic sensations communication system, that is, enabling transmission of an audio signal at a lower bit rate. Compared to a conventional multi-channel coding standard, such as the MPEG-AAC standard and the Dolby digital standard, the SAC standard enables transmission of a signal with higher sound quality at an extremely lower bit rate, in particular, 192 Kbps in 5.1 channel, for example.
Thus, the SAC standard is a useful means for a realistic sensations communication system.