The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Referring now to FIG. 1, a functional block diagram of a Voice over Internet Protocol (VoIP) phone 100 is presented. The VoIP phone 100 includes a network interface 102, which may be wireless and/or wired. Packets received by the network interface 102 are passed to a buffer 104. Because the packets are arriving over a dynamic network, the packets may arrive out of order. The buffer 104 buffers packets and reorders them.
The delay in receiving each packet may also vary. The buffer 104 may store a number of packets so that packets can continue to be extracted from the buffer 104 while waiting for delayed packets from the network interface 102. This creates a buffering delay, which may be distracting to a user of the VoIP phone 100.
In order to prevent the buffer 104 from running out of packets, the delay built into the buffer 104 is created to be as long as the greatest expected difference in transmission times between two packets. For example, if all packets arriving over the network are received at least 100 ms after they are transmitted, there is a network delay of 100 ms. If some packets take as much as 300 ms to arrive, an additional 200 ms of delay may be built into the buffer 104. In this way, the buffer 104 will not empty even if a packet is received 300 ms after it is transmitted. The difference between packet delay times is referred to as jitter. A larger amount of jitter is addressed by a longer delay in the buffer 104.
Some packets may never be received by the network interface 102. These lost packets may result in degradation of the sound quality of the received data. Further, some packets may arrive after the longest expected delay. These packets may arrive so late that subsequent packets have already arrived and have been processed. Late arriving packets may therefore present the same quality problems as packets that are lost completely. A decoder 106 may implement Packet Loss Concealment (PLC) to help mask the effects of lost packets.
Packets are output from the buffer 104 to the decoder 106. The decoder 106 may be a speech decoder, and may include an implementation of a standard such as International Telecommunications Union Telecommunications Standardization Sector (ITU-T) G.711 and/or ITU-T G.729. Decoded audio is output from the decoder 106 to an acoustic echo control module 108.
The acoustic echo control module 108 may remove acoustic echo and/or add a sidetone from a microphone 110 onto the decoded audio. The acoustic echo control module 108 then outputs audio data to a speaker 112. The acoustic echo control module 108 receives audio data from the microphone 110. The acoustic echo control module 108 may reduce echo between the speaker 112 and the microphone 110, and outputs audio data to a noise suppression module 114.
The noise suppression module 114 suppresses noise and outputs the resulting audio data to an encoder 116. The encoder 116 encodes the data and outputs encoded data to the network interface 102. The encoded speech may be transmitted and received over the network using a transport protocol, such as the Real Time Transport Protocol (RTP).