Real-time audio, such as a telephone conversation may be transmitted over a data network, such as the Internet. The audio transmitted during the telephone conversation includes desired audio (spoken words) and undesired audio (background noise), such as the sound of the air conditioner. While words are being spoken the transmitted audio contains both spoken words and background noise. While words are not being spoken, the transmitted audio contains only background noise.
To transmit real-time audio over the data network, an audio packet transmitter in the source stores the audio in the payload of one or more data packets and transmits the data packet over the data network. Each data packet includes a destination address in a header included in the data packet.
Unlike a telephone network in which there is a dedicated connection between the source and the destination, each data packet may travel on a different path from the source to the destination in a data network and some data packets may travel faster than others. Thus, data packets transmitted over the data network may arrive out of order at the receiver.
To compensate for these path differences, an audio packet receiver in the destination stores the received data packets in a jitter buffer and forwards the stored audio to the listener at the rate at which it was generated in the audio packet transmitter in the source. Jitter buffer latency is the period of time that the received data packet is stored in the jitter buffer being forwarded to the listener. Thus, the jitter buffer latency is the delay after which the receiver forwards the received data packet to the listener. The jitter buffer latency is dependent on the size of the data packet being transmitted and the slowest path between the source and the destination. However, if data packets are not being received on the slowest path, the jitter buffer latency may be reduced.
Thus, in a low loss network, the inter-packet arrival time is monitored and the jitter buffer latency is modified dependent on the inter-packet arrival time, in order to minimize the delay. This modification of the jitter buffer latency is performed while no spoken words are being transmitted so as to minimize the loss of spoken words.
One standard protocol for packetizing real-time audio for transmission over a data network is the Real-Time Transport Protocol (“RTP”) (Request for Comments (“RFC”) 1889, Jan. 1996) at http:// www.ietf.org/rfc/rfc1889.txt. The RTP provides a method for a transmitter to detect the start of a period in which the audio does not contain spoken words. The period in which the audio does not contain spoken words is sometimes called “a period of silence” even though it is not true silence because the audio contains background noise. Upon detecting a period of silence the transmitter may either transmit no data packets during the period of silence or transmit non-speech audio (background noise with no spoken words). By transmitting no data packets during a period of silence, the audio packet receiver may adjust the jitter buffer latency while no data packets are being transmitted and the number of spoken words lost is minimized.
Thus, to minimize the number of spoken words lost while the jitter buffer latency is modified, the transmitter does not transmit non-speech audio packets during the period of silence. During the period of silence, the receiver generates comfort noise to reconstruct background noise for the listener. The receiver forwards the comfort noise to the listener. Comfort noise is generated and forwarded to reassure the listener that the telephone conversation has not ended. The comfort noise reduces the quality of the real-time audio because the listener hearing comfort noise during a period of silence in a telephone conversation instead of background noise receives a negative indication or impression that the telephone conversation is being transmitted over a data network instead of a telephone network.