At present, all multimedia services need processing of multi-channel audio signals. For example, a video-conferencing system often consists of more than two participating conference terminals. Therefore, multiple audio signals are involved. A device, such as a multipoint control unit (MCU), is required for processing and controlling the multiple audio signals to mix the multiple audio signals transmitted by conference terminals. Take three conference terminals, which are top three high-volume ones and can speak at the same time, for example. The handling process is as follows:
In prior art 1, the encoders of the MCU are in one to one correspondence with participating conference terminals. That is, the number of encoders is equivalent to the number of conference terminals. An encoder corresponding to a conference terminal, whose volume is not top three, encodes the audio signals of the three conference terminals, which are top three high-volume ones and can speak at the same time, and sends the encoded signals to the corresponding conference terminals. That is, all encoders of the MCU fully encode signals. However, when all encoders continuously and fully encode signals, the system processing capability is wasted, the cost is increased, and the supported capacity and the number of conference terminals are reduced;
In prior art 2, the number of encoders of the MCU can be one more than the maximum number of conference terminals that can speak at the same time. One encoder is selected as a fixed encoder. Take three conference terminals, which are top three high-volume ones and can speak at the same time, for example. The MCU uses four encoders, one among which is selected as a fixed encoder to fully encode the audio signals of the three conference terminals, which are top three high-volume ones and can speak at the same time, and send the encoded audio signals to a rest conference terminal whose volume is not top three The other three encoders are used to encode the audio signals of two conference terminals in the three conference terminals, which are top three high-volume ones and can speak at the same time (the audio signals of the two conference terminals indicate the audio signals of the other two conference terminals in the three conference terminals, which are top three high-volume ones and can speak at the same time, except the conference terminal corresponding to the current encoder), and send the signals to the corresponding conference terminals. When the three conference terminals, which are top three high-volume ones and can speak at the same time, are changed, the encoded audio signals of the new three conference terminals, which are top three high-volume ones and can speak at the same time, are transmitted by the fixed encoder, and the audio signals of the new three conference terminals, which are top three high-volume ones and can speak at the same time, are encoded by the corresponding three new encoders. The encoding status information of an encoder needs to be reserved during encoding. For example, an encoder in the Advanced Audio Coding (AAC) protocol needs to reserve the first two frames of encoded data as the encoding status information to predict the current frame encoding. In addition, the encoders of the conference terminals, that is, signal sources, need to decode the signals encoded by encoders. Therefore, the decoding status information of a decoder is related to the encoding status information of an encoder. If the encoding status information is inconsistent, the signals encoded by different encoders are sent to a same decoder. The decoder cannot decode the current data or the sound effect after decoding is poor because the current decoded data is inconsistent with the predicted data in the last frame. Therefore, when the speaking conference terminal is changed, the encoder is switched. As a result, the decoder cannot correctly decode the signals, and the sound effect is poor especially during free discussion. Therefore, in the prior art 2, when the three conference terminals, which are top three high-volume ones and can speak at the same time, are changed, the decoder of the conference terminal cannot decode the signals correctly, resulting in poor sound effect.
When implementing the present invention, the inventor finds at least the following defects in prior arts:
Prior art 1: The system processing capability is wasted, the cost is increased, and the capacity and number of signal sources supported by the MCU are reduced.
Prior art 2: When the speaking conference terminal is changed, the encoder of the MCU is switched. As a result, the decoder cannot correctly decode the signals, and the sound effect is poor especially during free discussion.