Packet loss is an important consideration in mobile VoIP communications, in particular, connecting to Wi-Fi network, because as it results in lost voice data. Packet loss concealment (PLC) is used to mask the effects of packet loss in VoIP communications. Packet loss concealment can result from two causes, namely network packet loss (or network loss) caused by packets lost in the network (i.e. packets that are sent but never received at the receiving end) and delayed packet loss (delay loss). A delayed packet is a voice packet that arrives late and causes the receiver to issue a PLC to mask the packet loss. The delay and the variation of delay, also known as jitter, are caused by the packets traveling from the sender to the receiver through intermediate store-and-forward nodes and packet switching networks. Because voice packets must be decoded in strict order of their sequence numbers, a jitter buffer is used at the receiving end to temporarily store arriving packets in order to minimize delay variations. If packets arrive too late then they are discarded. Missing data from discarded packets must be masked using a PLC technique in order to avoid degradation in voice quality. Different voice decoders use different PLC techniques. For example, G.711 uses simple waveform substitution while G.729 uses more sophisticated algorithms.
In general, there is a tradeoff between the number of delayed packets and the buffering playout time. A long buffering time means that the voice packets will be decoded and playout later in order to absorb the jitter, but with longer latency. A short buffering time means that the voice packets will be decoded earlier in order to minimize the latency, but with a large number of delayed packets. In VoIP, particularly voice over WLAN (VoWLAN), latency is more noticeable because a large portion of latency is the single trip time from the sender to the receiver. If voice latency is too large, for example over 300 ms, voice conversation becomes hard due to double talk. For example, the single trip time of a data packet transmitted over the internet between, say, Asia and North America might range from 100 to 160 ms. In addition the single trip time for a packet between the VoIP client and network access point over a Wi-Fi network may range from 5 to 40 ms. The total trip time latency between two VoIP clients, one in Asia and one in the USA, therefore may be between (100+5)=105 ms and (160+40)=200 ms, excluding packetization delay, decoding delay and processing overheads. Consider the situation when the first packet takes the best-case time of 105 ms to travel from the sending end to the receiving end. The next packet in the sequence takes 190 ms and the third packet in the sequence takes the best case time of 105 ms. Suppose the buffering time is 70 ms. In order for packets to be processed within the buffering time the second packet must be discarded and masked by a PLC. A single PLC is acceptable in order to trade off a shorter voice latency because single PLC on voice is usually unnoticeable by the listener. However consecutive PLCs degrade voice quality and should be avoided.
Various techniques have been proposed for calculating or estimating jitter buffer size as a trade off between delayed packet loss and voice latency. Some techniques use buffer size only; for example, if the buffer is underflow, it increases the buffer size. If it is overflow, it decreases the size by dropping some voice samples. Other techniques involve considering the number of delayed packets that the receiver encounters. If the total amount exceeds a certain pre-determined threshold value, the buffer size is increased. However, this causes consecutive PLCs and in turn severely deteriorates the voice quality. Yet further techniques use jitter estimate functions to calculate the desired buffer size, e.g. arrival packet jitter characteristic.