1. Technical Field
The present invention relates, generally, to the transmission of voice in the form of digital packets of data sent over a network using an internet protocol and, more particularly, to the removal of digitized audio representative of DTMF signals from packets to be sent over an IP network, and replacing such signals with special control packets containing information sufficient to characterize the detected DTMF signal.
2. Background Art and Technical Problems
Telephone communications have, in the past, typically involved voice signals transmitted over the public switch telephone network, sometimes referred to as the PSTN. In-band signaling is commonly utilized to dial a number, control certain devices, and indicate responses. The most common form of in-band signaling is the use of dual tone multi-frequency signals, or DTMF signals, generated by pressing the buttons on a push button telephone. For example, when dialing in to access a voicemail system remotely, a user's access code or PIN number may be provided to the voicemail system by pressing the appropriate buttons on a push button phone to generate DTMF signals that can be decoded by the voicemail system. Or some business operations employ automated attendant systems to answer incoming phone calls, and users may indicate the extension to which they wish to be transferred by pressing corresponding buttons on a push button phone, or speak to an operator by pressing zero. The DTMF signals generated by the user's phone may be decoded by the automated attendant system and used to complete the call without the intervention of a human operator. Voice response systems respond to DTMF signals to allow callers to retrieve information such as the balance of the user's bank account, local weather forecasts, movie times, and many other types of useful information.
The PSTN telephone system is based upon a design that dates back many years before the days of personal computers, modems, and the Internet. The modern development of the Internet now provides an alternative route for the possible transmission of voice signals, in the form of digital packets of data that can be transmitted over a network using the internet protocol, sometimes referred to as an IP network. However, problems have arisen with the transmission of voice signals over an IP network, sometimes referred to as voice over IP. With the voice codecs commonly employed in voice over IP applications, DTMF and other in-band signals cannot be adequately reproduced if sent as digitized representations of the signals. The digital-to-analog conversion and compression techniques employed in voice over IP applications fail to reproduce DTMF signals without substantial distortion. The distortion is severe enough to make the DTMF signals recreated by digital-to-analog conversion at the receive end of the circuit sufficiently outside the specified requirements for such in-band signaling that the DTMF tones will often not be recognized correctly. The end result is that conventional DTMF detectors can be expected to frequently fail to detect such distorted DTMF signals reproduced from digitized representations received from an IP network.
DTMF signals consist of two simultaneous tones that must have certain characteristics to be recognized as valid DTMF signals. The low group of frequencies comprise 697, 770, 852 and 941 Hz. The high group of frequencies comprise 1209, 1336, 1477 and 1633 Hz. To be recognized as a valid DTMF signal, a DTMF signal must consist of two frequencies, one selected from the group of low frequencies, and one selected from the group of high frequencies. Specifications in effect in the United States provide that a DTMF detector must detect a DTMF signal when the dual tones are each within plus or minus 1.5% of the specified frequency. A DTMF detector must reject a DTMF signal if either of the dual tones deviate more than plus or minus 3.5% from the specified frequency. In addition, a DTMF signal must meet certain signal requirements sometimes referred to as “twist,” where twist is defined as the ratio of the high group frequency tone energy to the low group frequency tone energy. The energy ratio of the two dual tones, or twist, that is detected must be within a specified range for the signal to be recognized as a valid DTMF signal: i.e., for U.S. applications it must be greater than or equal to −8 dB and less than or equal to 4 dB. In addition, the “on” time of a DTMF signal must be for a minimum of 40 milliseconds in U.S. applications, followed by an “off” time of a minimum of 40 milliseconds. The minimum cycle time is 93 milliseconds. The tolerable frequency deviation, twist, on-time, off-time, and cycle-time may vary for different countries. But in many applications, DTMF detection must consider tolerable frequency deviation, twist, on-time, off-time, and cycle-time, even though the applicable parameters may vary from the examples provided herein. For convenience, the applicable U.S. parameters are discussed herein, but those skilled in the art will appreciate that other parameters may be substituted, as applicable, without departing from the spirit and scope of the present invention.
The draft specification for RTP packet transport across an IP network currently specifies that DTMF signals should be removed from the RTP packets. However, in order to remove DTMF signals from other audio signals such as voice, the DTMF signals must be detected. Detection of DTMF signals takes a finite amount of time. In addition, normal speech often contains mixtures of various frequencies and many harmonics, which from time to time may momentarily contain frequency components equivalent to a DTMF signal. False detections are a problem. In addition, a valid DTMF signal should meet certain requirements in terms of the duration of the signal followed by a minimum “off” time, and a detection scheme preferably should examine the signal at least for the minimum cycle time to determine whether a valid DTMF signal has been detected.
In the past, efforts to remove DTMF signals have included schemes to delay all packets of digitized audio until the device could be sure that no DTMF tones were present, and then the packets would be transmitted. This method may introduce objectionable delay into the transmission. Such delay can detract from the quality of the voice over IP application, and interfere with efforts to conduct a natural conversation with someone. The performance is likely to be noticeably different from a conventional telephone conversation.
In the past, other efforts to remove DTMF signals have included schemes to transmit packets of digitized audio, and the stream of transmitted packets would be interrupted only when the detection of a valid DTMF signal was confirmed. Since it takes a finite amount of time to reliably detect a valid DTMF signal, some distorted DTMF tones were allowed to be transmitted for a length of time equal to the DTMF detection delay. This method is unsatisfactory because it does not completely remove DTMF signals, but instead allows distorted DTMF signals to be received at least momentarily on the remote end of the IP network. In many applications, the reception of distorted DTMF tones at the receiver can be objectionable, even if the tones are only for a momentary duration.
While the removal of DTMF signals from transmissions of digitized audio data has been recognized as a problem in voice over IP applications, efforts in the past to remove DTMF signals have not been altogether satisfactory. There is a significant need for an improved method and apparatus for removing DTMF signals from voice over IP packets that does not introduce excessive delay into the system, while at the same time effectively removing the DTMF signals so that distorted DTMF signals are not heard at the remote end of the IP connection.