Packet telephony involves the transmission of audio signals in discrete blocks, or packets, of digital data. FIG. 1 depicts a typical prior art packet telephony communication path 18. Packet telephony transmitter 14 converts a digitized audio stream 20, e.g., audio sampled at 8 kHz and quantized at 8 bits/sample, into packets. Transmitter 14 places these packets onto packet network 28, which routes the packets to packet telephony receiver 16. Receiver 16 converts packet data back into a continuous digital audio stream 36 which resembles input audio stream 20. Transmitter 14 and receiver 16 typically employ a codec (a compression/decompression algorithm) to reduce the communication bandwidth required for path 18 on packet network 28.
A basic packet voice transmitter 14 includes a voice encoder 22, a packetizer 24, and a transmitter 26. Voice encoder 22 implements the compression half of a codec, compressing audio stream 20 to a lower bit-rate. Packetizer 24 accepts compressed voice data from encoder 22 and formats the data into packets for transmission. Transmitter 26 places voice packets from packetizer 24 onto network 28.
Receiver 16 reverses the process utilized by transmitter 14. Depacketizer 30 accepts packets from network 28. Jitter buffer 32 buffers received data frames and outputs them to voice decoder 34 in an orderly manner. Voice decoder 34 implements the decompression half of the codec employed by voice encoder 22.
Low bit-rate voice codecs used in a packet voice encoder/decoder pair 22, 34 exploit physiological limitations on human hearing ability in order to reduce bit rate. One such human limitation is termed the spectral masking effect, i.e., high energy sound at one frequency masks lower-energy sound at nearby frequencies in the human auditory system. A codec may choose to ignore potentially masked sounds when coding, since a human will be unable to hear them even if they were faithfully reproduced. Low bit-rate codecs typically also model the bandpass filter arrangement of the human auditory system, including the frequency dependence of our auditory perception, in allocating bits to different portions of a signal. In essence, low bit rate encoding involves many decisions to throw away actual audio information that is undetectable or only marginally detectable by a human.
Because it is optimized for humans, voice encoding can produce undesirable effects if the audio signal being encoded is not meant for human hearing. Computer modem and facsimile audio signals are examples of such signals; both can be badly distorted by voice encoding. Modems and facsimile machines employ in-band signaling, i.e., they utilize the audio channel of a telephony connection to convey data to a non-human receiver. However, modem and facsimile traffic do not "share" a voice line with a human speaker. Packet telephony systems can therefore detect such in-band traffic during call connection and switch it to a higher bandwidth, non-voice encoding channel.
Other types of in-band signals share a voice channel with a human speaker. Most common among these are the DTMF (dual-tone multi-frequency) in-band signals generated by a common 12-button telephone keypad. Voice mail, paging, automated information retrieval, and remote control systems are among the wide variety of automated telephony receivers that rely on DTMF in-band control signals keyed in by a human speaker.
Voice encoding/decoding of DTMF signals can render these signals unrecognizable to an automated DTMF receiver. More sophisticated packet telephony systems are capable of detecting DTMF in an input audio data stream in parallel with voice encoding. FIG. 2 depicts a parallel voice-encoding/DTMF detector packet telephony transmitter 38. Transmitter 38 operates a DTMF in-band signal detector 40 on uncompressed audio data stream 20, in parallel with voice encoder 22. If speech is present in data stream 20, packetizer 24 will be supplied with a voice-encoded signal from encoder 22. If a DTMF signal appears in data stream 20, the DTMF signal, rather than the voice-encoded signal, is supplied separately to packetizer 24. This system allows DTMF signals to effectively bypass the voice codec, thereby avoiding DTMF signal distortion.