FIG. 1 is a diagram illustrating the structure of a protocol stack of Voice over LTE (VoLTE). As shown in FIG. 1, an application layer first encodes voice data and then sequentially calls a Real-time Transport Protocol (RTP) to generate an RTP packet and a Transmission Control Protocol (TCP) stack or an Internet Protocol (IP) stack to encapsulate a User Data Protocol (UDP)/IP header, to finally generate a general VoIP packet. There are various layer 2 VoIP-bearing protocols, for example, if the second layer of the widely used Ethernet is a Long-Term Evolution (LTE) network, then the Ethernet is called a VoLTE.
The length of VoIP voice data is generally small, averagely more than ten but less than twenty bytes. The length of the mainstream Adaptive Multi-Rate (AMR) audio encoding ranges from 13 bytes to 32 bytes. However, an RTP/UDP/IP header occupies 40 bytes or even more (for example, the length of an RTP/UDP/IP header is 60 bytes in IPv6), thus, the bandwidth utilization rate of an air interface is low, generally, 20%, during the radio link transmission of VoIP.
Air interface resources for wireless transmission are limited and precious. In order to increase the percentage of payload to save the bandwidth of air interfaces, Robust Header Compression (ROHC) is generally started, when VoLTE media plane data is transmitted between a UE and a base station, to compress the length of the protocol header.
The feature of LTE trunking group calls lies in that numerous listening users only have downlink data (similar to multicast), only a Unidirectional (U) mode without any feedback path is available if ROHC is started, in the U mode, a complete RTP/UDP/IP header should be sent periodically in order to ensure the synchronization of the receiver, however, the synchronization effect of the receiver cannot be ensured. Due to the frequent sending of complete headers, this technical solution has poor compression effect, and contrary to the expectation, the receiver may fail in decompression because of the change of a radio link.
No effective solutions have been proposed to address the problem existing in the related art that it is impossible to save air interface resources for transmitting media plane data between a user equipment and a base station and synchronously achieve disorder adjustment and voice-video synchronization of media plane data.