Technical Field
This disclosure relates generally to the field of adaptive jitter buffer management, as part of multimedia communications. While the use of jitter buffer and adaptive playback are well known, the embodiments provide significant enhancements in terms of optimal spike detection and handling, network specific dynamic adaptations, and improved playback adaptations in the presence of speech or silence.
Description of the Related Art
Fixed and mobile computing systems typically communicate through a network. The network may be wired or wireless and may transmit data in discrete portions or packets. When receiving data, a computing system may sequentially receive numerous data packets, and assemble the received data packets to re-construct the original data.
As data packets are transmitted from one device on a network to another device on a network, individual packets may experience different transit times to their respective destinations. Such variation in transmit times is referred to as “jitter” or “network jitter” and may result from differences in queuing and scheduling differences, differences in a routing path through the network and the like. A certain amount of jitter may always be present on a network, while in other cases, network jitter may vary as a function of time. Network congestion, a hot-spot on the communication channel, and the like, may contribute to temporary high values of jitter (commonly referred to as “spikes”) on a network.
Network jitter may present problems when transmitting certain types of data. For example, when the transmitted data contains audio and/or video data, it is often necessary to temporarily store some number of packets to buffer the data so that smooth playback at the receiving end of the data transmission is possible. The number of packets that need to be stored may be a function of network jitter, and may need to be optimized to maintain good quality and low delay.
One particular method of compensating for network jitter, employs a jitter buffer in a receive chain of a multimedia device. A jitter buffer may be configured to store a target number of data packets as they are received from a network. A control unit may be configured to calculate a jitter value for at least some of the received data packets, and also estimate a target size of the buffer based on the network type, conditions and jitter.
In conjunction with a jitter buffer, a multimedia device may also employ a playback adaptation unit to improve playback of audio and/or video data. The playback adaptation unit may be configured to retrieve stored packets from the jitter buffer, decode the data in the packets into samples, and process the samples thereby controlling the actual size of the buffer to match the adaptive target, while maintaining the same playback rate. When processing audio data, the playback adaptation unit may detect samples that contain speech or silence. Samples containing speech and silence may be stretched or compressed as needed. The compression of speech is called “warping”, compression of silence “skipping”, and expansion of silence “duping”. Adapting samples during a silence period, i.e., silence-addition and silence-removal, may be desirable in some cases to maintain performance and audio quality.