While packet switching technology is a widely used and efficient means for transmitting digital data, some problems are encountered when it is attempted to employ a packet network for voice communication. While the conversion of a talker's analog voice signals into digital form for transmission through the network and subsequent re-conversion of the digital signals into analog signals for delivery to the listener are easily accomplished, it is in the nature of the packet network that certain delays can be introduced that can be disconcerting to parties involved in a conversation.
In a packet switching system, such as the well-known ethernet running the UDP (datagram) protocol, audio samples are sampled, typically at a rate of 8000 samples per second, the samples are digitized, quantified into packets containing address information, and then sent through the packet network. In such a system, the arrival rate of the packets from the source to the destination is a random variable. The arrival time of a packet may be expressed as a probabilistic time distribution, p(A). When transmitting data, this probabilistic distribution does not usually present a problem. However, in conducting a "live" conversation, each of the parties anticipates a response from the other within a familiar interval of time. If an expected response is not forthcoming within the expected interval, the delay may be disconcerting.
In a typical packet network, the queue of incoming digital packets is entered into one end of a shift register known as the arrival buffer. The arrival buffer is unloaded at its output stage, at its other end, where data words (containing data or speech samples) are removed to be processed and converted to analog speech by a digital to analog (D/A) converter. Because the arrival of packets is probabilistic, it is possible that the output stage of the arrival buffer, which is periodically sampled by the D/A converter, may be empty. If the output stage of the arrival buffer is empty when it is periodically sampled, the D/A converter generates silent intervals and the listening party hears silence. At the same time, a gross delay of one sample is introduced. For a typical system in which the sampling rate is 8000 Hz, samples are taken every 125 microseconds. Accordingly, if the output stage arrival buffer is empty when sampled, any data arriving before the next sampling interval may be delayed up to 125 microseconds and the average arrival queue size would be increased by one sample. If the number of probabilistic samples in the arrival buffer increases beyond a certain amount, a disconcerting delay will be introduced into the conversation beyond that occurring due to normal pauses in speech. While the accumulation of unprocessed bytes in the arrival queue is of no concern in one-way communication, such as a broadcast system, gross delay will become noticeable to most users engaged in a two-way conversation when it reaches approximately one-eighth of a second, corresponding to 1000 samples.
Accordingly, it would be advantageous to reduce the accumulation of probabilistic gross delay in packetized voice samples in a packet network without the need of increasing processing overhead.