Traditionally, IP-based voice and video conferencing systems have communicated over reliable enterprise networks that control for quality of service. In such networks, the most significant timing impairment comes from relative clock drifts in the end points. As users increasingly set up remote and home offices, however, conferencing systems are now connected over less reliable networks such as wireless and the public Internet. In such networks, timing impairments such as jitter and out of order packets are likely to occur with greater frequency and increased severity.
Consider the impairment of jitter. During a period of network congestion, packets may arrive at the conference system in large bursts. For audio, the passive solution is a deep buffer, which the system can fill from the network at a bursty rate. Meanwhile, the system plays the audio out of the buffer to the listener at a consistent smooth rate. This rate is equal to some desired play-out frame rate. While this solution is simple, the large buffer required has the downside of adding significant audio latency.
Conferencing systems have attempted to avoid the latency problem by using a time-compression algorithm to modify the speed of audio play out. Such algorithms use signal processing to shorten the duration of an audio signal without affecting pitch. When used to combat network jitter, a burst of many frames from the network is time compressed to reduce the number of frames to be played out to the listener.
Ideally, time-compression algorithms would create very natural sounding audio while handling the significant compression needed for network jitter. In fact, however, at high compression rates, these algorithms often result in audio artifacts. The two dominant artifacts that can be found in systems using existing algorithms can be described as sounding rough and sounding ghostly.
In some systems frequency domain techniques such as phase vocoders have been used. These techniques tend to have artifacts that could be described as having a ghostly sound. Time compression techniques have frequently generated rough sounding artifacts. Reducing these artifacts would improve the user's experience with conferencing systems.