A video communication system based on an Internet Protocol (IP) network encapsulates the audio data and video data over the Real-time Transport Protocol (RTP), and generates audio and video RTP packets. The video communication system generally transmits the audio and video RTP packets over the User Datagram Protocol (UDP) at the transport layer. UDP is an unreliable transmission protocol. It does not ensure that the data can reach the destination after the data is sent to the network.
When congestion occurs on the network, the data packets transmitted on the network may be discarded. Because the audio data and video data are transmitted over the unreliable UDP protocol, the audio data and video data may get lost in the case of packet losses on the network. When some audio data and video data are lost, the quality of voices and images decoded and reproduced by the receiving terminal may be affected. When the packet loss ratio is low, the sound reproduced by the receiving terminal may be discontinuous, and the reproduced image may be erratic. When the packet loss ratio is high, the sound reproduced at the receiving terminal may be unclear, and the reproduced image may be seriously erratic and cannot be watched.
In the prior art, a data recovery method is used to reduce the negative impact that the packet loss has on the quality of sound and video in video communication communication. That is, error resilience is performed on the video data in the video communication system; the video data to be transmitted is processed, and associated redundant data is generated; the video data and associated redundant video data are transmitted to the receiving terminal through the network. In this way, when packet losses occur on the network, the receiving terminal may restore the lost data packets with a certain probability according to the received video data and redundant data.
In the current video communication systems, after the communication is established, the video communication bandwidth, encoded audio data bandwidth, and encoded video data bandwidth are determined, and cannot be changed in the communication process. For example, to hold a video conference with a bandwidth of 768 Kbps, the compression coding is performed on the audio over the G.711 protocol after the negotiation. In this case, the audio data may occupy a bandwidth of 64 Kbps. If there are no requirements for transmitting other specific data (for example, a data conference) during the conference, the remaining bandwidth is used to transmit video data. In this case, the bandwidth for the video data is 704 Kbps (=768−64). Thus, the audio data may occupy a data bandwidth of 64 Kbps, and the video data may maximally occupy a data bandwidth of 704 Kbps. When there are packet losses on the network, to guarantee the video image quality of the receiving terminal, the data recovery technology may be used to protect the video data. That is, the video data to be transmitted is processed, and associated redundant data is generated; the video data and associated redundant video data are transmitted to the receiving terminal through the network. In this way, when packet losses occur on the network, the receiving terminal may restore the lost data packets with a certain probability according to the received video data and redundant data. When packet losses occur on the network, the audio data in the communication may be discarded. Because some audio data is discarded, when the received audio data is decoded and reproduced, the sound may have problems such as stop, discontinuity, and unclearness. The data recovery technology can only perform error resilience on the video data, and cannot restore the lost packets of the audio data, thus failing to bring a better experience to users.