Modern voice conference bridge architectures frequently employ Internet Protocol (IP) packet-based methods for transporting voice between the endpoints of a conference call. While generally providing voice quality equivalent to traditional time-division multiplexed (TDM) transport, packet-based transmission is not immune to issues affecting voice quality such as line echo.
Line echo occurs in the context of a conference call when a discontinuity (echo return point) exists somewhere in the transmission path of a bridge call leg. Because of the orientation of the Echo Canceller (ECAN), the call leg is in the echo tail in the echo control context. For a mixed conference speech signal in the bridge's outbound (ECAN receive) direction, it is at the discontinuity in the echo tail where a portion of the speech signal is reflected in the bridge's inbound (ECAN send) direction. IP based communication is inherently 4-wire (send and receive paths are separate) and generally thought to be free from echo sources. Nevertheless, there is a possibility that an electro-mechanical echo return point can exist in the switched telephone network side of an IP voice gateway or there can exist an acoustic coupling at the endpoint, for example between a handset microphone and speaker. Echoes from these discontinuities often escape control by the voice gateway network ECANs.
Jitter buffers are usually categorized as one of two modes: fixed or adaptive. For fixed mode the buffer depth is constant and set ideally to accommodate worst case network jitter. If jitter exceeds this depth, however, even briefly, data starvation may occur to the decoder. Starved of data, the decoder output may suffer loss of voice quality in the form of dropouts, clicks, pops or distorted speech. Adaptive mode attempts to resolve this issue by allowing the buffer depth to automatically adapt to current worst case network conditions. When network jitter is small, buffer size is also small; when network jitter is large, buffer size grows large enough (within maximum physical buffer limits) to accommodate it. Data starvation in the stream to the decoder is avoided, and voice quality is maintained.