The present invention relates to methods and apparatus for the detection of DTMF symbols.
Dual-tone multiple frequency (DTMF) signaling is used in telephone dialing, voice mail, and electronic banking systems. A DTMF signal corresponds to one of sixteen touchtone symbols (0-9, A-D, #, *) as shown in FIG. 1. Each symbol is represented by one of four frequencies in a low frequency band and one of four frequencies in a higher frequency band. In FIG. 1 the symbols are shown in a matrix format. Each symbol is represented by a frequency representing the column in which the symbol appears and by a frequency representing the row in which the symbol appears. The columns are represented by frequencies in a band between 1 kHz (kilo-Hertz) and 2 kHz, and the rows are represented by frequencies in a band between 500 Hz and 1 kHz. The first three columns of symbols form the telephone keypad layout familiar to consumers of voice telephone services. The last column of symbols are available for more particularized applications. Whenever a key of a touch-tone keypad is depressed, the high frequency and the low frequency corresponding to the symbol assigned to that key is generated and transmitted to a receiving device. The device that receives this dual tone signal must detect which one of the four low frequencies and which one of the four high frequencies have been received in order to determine which symbol has been transmitted.
The problem of DTMF signal detection is non-trivial for several reasons. The eight frequencies used to encode the symbols are within the spectrum of frequencies generated by voice-data. Therefore, when voice data is transmitted, symbol simulation, (also called digit simulation), may occur. A DTMF detector must be able to discriminate against these voice-simulated symbols. Also, the DTMF signal is attenuated by the transmission medium through which it is transmitted. Typical transmission media attenuate high frequencies more than low frequencies. Thus, the higher frequency in the dual tone pair may have significantly less power at the receiver than the low frequency in the pair. Conversely, devices do not typically generate all DTMF frequencies at the same power level. It is therefore possible for the lower frequency to be received at a lower power level than the high frequency. This disparity in power between the low and high frequency is called xe2x80x9ctwistxe2x80x9d. Further, both tones must be detected in the presence of noise power which may be a significant fraction of the signal power of the received DTMF signal. An additional problem is that not all devices will generate the exact dual tone frequencies shown in FIG. 1 because of poor design or system degradation. The DTMF receiver must be able to detect the DTMF signals at frequencies slightly offset from the nominal values while rejecting frequencies outside a given tolerance band. Because the nominal DTMF frequencies are closely spaced, the tolerance band must be very narrow. The problem of signal detection within narrow frequency bands is complicated by the fact that each signal is transmitted only for a short time duration of uncertain length with an uncertain delay time between transmission of successive symbols.
To standardize the performance of devices for DTMF signal generation and reception, the International Telecommunications Union (ITU) has developed a set of performance standards to which these devices should comply. These standards have achieved virtually worldwide acceptance, and refine the standards previously developed by Bell Communications Research, Inc. (xe2x80x9cBellcorexe2x80x9d). The ITU standards are summarized in Table 1. Voice-simulated tones must be rejected as invalid tones. Signal frequencies that are within +/xe2x88x921.5% of the nominal frequencies listed should be detected as valid DTMF tones. A signal frequency outside the band of +/xe2x88x923.5% of a nominal frequency must be rejected as an invalid tone. Two twist parameters are also specified. The twist, which is the ratio of the low frequency power to the high frequency power in deci-Bels (dB), is specified to be greater than xe2x88x924 dB and less than 8 dB. A positive twist value is a forward twist condition, which is the case when the low frequency signal power exceeds the high frequency power. A negative twist value is a reverse twist condition which exists when the high frequency signal power exceeds the low frequency signal power. When the twist is within the range of +4 dB to +8 db, the signal must be accepted as valid. Also, according to Bellcore standards, a valid DTMF signal must be detected if the signal-to-noise ratio (SNR) is at least 15 dB. In addition to frequency and power tolerances, temporal constraints are also imposed. A DTMF signal of duration at least 40 msec (milli-seconds) must be detected. A signal of duration 23 msec or less must be rejected. Also, if the time between the end of one DTMF signal and the beginning of the next successive DTMF signal, the interdigit time, is at least than 40 msec, the signals must be distinguished as two distinct symbols. Conversely, a signal interruption of 10 msec or less must not cause detection of two separate tones.
Within the telephone network, DTMF signals are typically transmitted digitally at a sampling rate of about 8 kHz (8000 samples per second), to give sample durations of approximately 0.125 msec. One way to detect the presence of a valid DTMF signal is by digital-to-analog conversion followed by a bank of analog filters centered at the nominal DTMF frequencies. This method is not efficient because of the required conversion process and the size and complexity of analog filter implementation. It is more desirable to achieve DTMF signal detection using digital methods which can be implemented by an integrated circuit digital signal processor.
The most common digital methods for DTMF detection involve repetitively or iteratively computing the frequency content of the received signal over a finite duration of time referred to as a frame. For each frame, the power at each frequency of interest is determined. Once the power at each desired frequency is detected, a decision process, in the form of a series of tests, is usually employed to determine whether a valid DTMF signal has been detected. For example, voice-simulated DTMF signal tones can be discriminated by computing the signal power at the first harmonic of the fundamental DTMF signal tones listed in Table 1. A DTMF signal that is not voice-simulated will have little or no signal power at these harmonics, whereas the spectrum of a voice signal usually does generate these harmonics at significant power levels. To discriminate against voice-simulated tones, the power level at harmonic frequencies of the fundamental DTMF frequencies can be compared to specified threshold values. If the power in any of the harmonics exceeds the given threshold for that harmonic, a decision is made that an invalid detection has occurred. To determine if a valid tone has been detected, the DTMF frequency in the high band at which the power is greatest is determined. Similarly, the DTMF frequency in the low band at which the power is greatest is also determined. Each of these signals must exceed a certain threshold power or a decision is made that no valid DTMF signal has been detected within the current frame. For static thresholding, the threshold is a fixed, predetermined amount. For dynamic thresholding, the threshold is the minimum amount by which the power in the strongest tone in the band must exceed the power of the signals at the other three DTMF frequencies in the band. Further, the power of the strongest tone in the high band is compared to the power of the strongest tone in the low band to determine if the twist is within the range of xe2x88x924 dB to 8 dB.
One approach to analyzing the frequency content of the received signal is by use of a Fast Fourier Transform (FFT) algorithm. The FFT would produce a sampled frequency spectrum with equally spaced samples. To obtain the frequency resolution required to detect signals within +/xe2x88x921.5% of a nominal DTMF frequency at a sample rate of 8 kHz, an FFT of at least 256 points would be required. The number of computations required to implement an N-point FFT is proportional to Nlog2N. However, since the frequency spectrum needs to be computed at only a small number of frequencies, (8 DTMF frequencies plus some harmonics), it is more efficient to compute the Discrete Fourier Transform (DFT) at these particular frequencies. Further, since it is desirable to process the signal in real time as it is received, without the need to store a large number of samples in a buffer, the Goertzel filter is commonly employed. The Goertzel filter is an implementation of the DFT as a digital filter which is structured to reduce the number of computations required to compute the transform. The number of computations to compute the spectrum of a signal of N samples at M discrete frequencies using the Goertzel filter is proportional to N*M. When the number M is less than log2N, the Goertzel method requires fewer computations than the FFT. The transfer function implemented by the Goertzel filter is:                                           H            k                    ⁡                      (            z            )                          =                              1            -                          ⅇ                              -                                                      j                    ⁡                                          (                                              2                        ⁢                                                  xe2x80x83                                                ⁢                        π                        ⁢                                                  xe2x80x83                                                ⁢                                                  k                          N                                                                    )                                                                            z                                          -                      1                                                                                                                1            -                          2              ⁢                              xe2x80x83                            ⁢              cos              ⁢                              xe2x80x83                            ⁢                              (                                  2                  ⁢                                      xe2x80x83                                    ⁢                  π                  ⁢                                      xe2x80x83                                    ⁢                                      k                    N                                                  )                            ⁢                              z                                  -                  1                                                      +                          z                              -                2                                                                        (        1        )            
This filter requires no signal buffering because each sample is processed when received. Given a sequence of N samples, the Goertzel filter produces the energy at a frequency that is an integer multiple of 2xcfx80/N (in radians). However, since the DTMF tones are not equally spaced in frequency, there is no single value of N for which Hk(z) can be computed precisely at all 8 DTMF frequencies for any set of integers, k. One way to overcome this problem is to take N sufficiently large so that k/N will be arbitrarily close to each of the 8 DTMF frequencies (normalized by N) for some values of k. However, in order to achieve the required frequency resolution, the frame size, N, would have to be so large that it would not be possible to determine whether the received signal was less than 23 msec or greater than 40 msec in duration. Another approach is to use a different frame size for each DTMF frequency, choosing N for each frequency such that k/N is arbitrarily close to the desired normalized frequency for some value of k, consistent with the frame size required to discriminate signals of valid duration. However, this approach results in considerable computational complexity and increased data storage due to the processing of signals accumulated over different durations of time. A better approach is to alter the Goertzel filter to compute the z-transform at the precise frequencies of interest. The altered Goertzel filter is obtained from the transfer function of equation (1) by replacing k with the exact frequency of interest f1 and N with the sampling frequency f2. This implementation is referred to as the Non-uniform DFT (NDFT). Further simplification can be achieved by modification of the algorithm to compute signal power rather than signal energy, since only signal power is needed for DTMF detection. This eliminates the need for complex multiplication to implement the transfer function of equation (1). Implementation of the modified NDFT results in a considerable reduction in computational complexity, since it can be implemented using 3N real multiplications/additions and four words of memory. Therefore, the modified NDFT may be used to detect power at the exact DTMF frequencies and the selected harmonics efficiently, within the limits of machine precision.
Computing the frequency spectrum of the received signal over a finite-length frame is equivalent to determining the frequency content of the received signal multiplied by a rectangular window. In continuous time, rectangular windowing corresponds to convolving the frequency spectrum of the received signal with a sine function:                               sin          ⁢                      xe2x80x83                    ⁢                      (                          2              ⁢                              xe2x80x83                            ⁢              π              ⁢                              xe2x80x83                            ⁢                              f                /                N                                      )                                    2          ⁢                      xe2x80x83                    ⁢          π          ⁢                      xe2x80x83                    ⁢                      f            /            N                                              (        2        )            
An example of this function is shown in FIG. 2. The effect of windowing in the time domain is to spread the tonal energy of the DTMF signal in the frequency domain. The sidelobes of the windowing function can be reduced by using a tapered window such as a Hamming window or other tapered windowing function. However, using a tapered window increases the width of the main lobe, thereby reducing frequency selectivity. Increasing the window size narrows the width of the main lobe, thereby increasing frequency selectivity. However, increasing the window size increases the difficulty in meeting the ITU timing specifications.
Prior art methods have used a frame size of about 13.3 msec which corresponds to a frame of 106 samples at the standard nominal sampling rate of 8000 samples per second. This frame size guarantees that a signal of at least 40 msec duration would fill at least two frames. After the conclusion of each frame, the detected signal in the current frame is compared to the detected signal of the previous two frames. If the result of the current frame is the same as the previous frame, but different from that of the frame before the previous frame, then a decision is made that a new valid DTMF signal has been found. However, this decision logic will incorrectly detect two distinct signals when a brief interruption occurs, because a 10 msec interruption will generate an invalid frame. Also, a signal of 20 msec duration that is centered between two frames could incorrectly result in a valid DTMF detection, even though the signal is less than 23 msec in duration.
Moreover, a 106-sample frame will not meet the ITU frequency specifications for all frequencies. In particular, the detector would be unable to reject frequencies in the low band frequency group that fall outside the +/xe2x88x923.5% band centered at each DTMF frequency. A larger frame size would be necessary to meet the frequency rejection specification in the low band. However, increasing the frame size has two negative effects. First, a longer frame duration makes it difficult to meet the signal timing specifications. Second, a larger frame size increases the selectivity of the filter, making it more likely to reject frequencies within the +/xe2x88x921.5% band pass region centered at the DTMF frequencies in the high band. In fact, no single frame size has been reported that will satisfy all of the ITU specifications.
As already noted, different frame sizes could be used for different DTMF frequencies at the expense of increased computational complexity and data storage requirements. Using frames of different lengths implies accumulation of the received samples over different time durations for different DTMF frequencies. Therefore, the outputs of the NDFT filters at each DTMF frequency will occur at different rates, making it difficult to achieve a meaningful comparison of power levels at different DTMF frequencies over the same time interval without the necessity of storing the received samples in a data buffer. The cost of increased storage and computational complexity to implement multiple frame lengths with signal buffering is considerable, especially for devices that must simultaneously process multiple channels. Therefore, there is great need for a computationally efficient DTMF detection method that meets all of the ITU specifications without the necessity of signal buffering.
An object of the present invention is to provide apparatus and methods for DTMF detection that minimize computational complexity and data storage requirements and that meet all ITU specifications.
The present invention comprises a high band filter block with four filters that detect power at each of the four high band DTMF tones, two low band filter blocks, each with four filters, that detect power at each of the four low band DTMF tones, and two filters for detecting power at the harmonic frequencies of the high and low band tones. The frame length of each low band filter block is twice the frame length of the high band filter block and is chosen to meet the ITU frequency selectivity requirements for both the high and low band tones. The frames of the low band filter blocks are staggered by a duration of time equal to the frame length of the high band filter block, and are aligned to start and end concurrently with the high band filter block frames. In this way, the outputs of the low band filter blocks alternate and coincide with the outputs of the high band filter block, so that a low band and high band result is obtained at the end of each high band frame without the need for signal buffering. This results in a substantial reduction in data storage requirements.
The present invention comprises methods for implementing power level tests that result in improved DTMF detection and voice rejection while meeting the ITU frequency and timing constraints. Improved performance and sensitivity is achieved by basing detection decisions on power levels determined over the average of two high band frames and one low band frame of twice the length as the high band frames.
At the end of each high band frame, a harmonic power level test is applied only to the symbol detected in the previous frames. This reduces the total number of filters required. Improved rejection of voice-simulated symbols is achieved by applying a harmonic power level test only to the high band tones, and not the low band tones, when one of the symbols xe2x80x9c2,xe2x80x9d xe2x80x9c6,xe2x80x9d or xe2x80x9cCxe2x80x9d has been detected in the previous frame, because the harmonic frequencies of the low band tones for these three symbols are too close to their high band tones to provide rejection that exceeds what can be achieved by a total power test. A total power check that improves performance by detecting valid DTMF signals at lower signal-to-noise ratios is achieved by experimentally determining an optimal threshold ratio for each DTMF symbol.
The present invention also provides a finite state machine that ensures that all of the ITU timing constraints are met. A length test is imposed which ensures that a signal of duration 23 msec or less will be rejected while ensuring that a DTMF signal that is at least 40 msec long is accepted. A pause test is imposed which ensures that an interdigit time of 10 msec or less will not result in an erroneous detection of two successive symbols, while ensuring that a pause of at least 40 msec will result in detection of two successive symbols.
An embodiment of the present invention meets the ITU specifications by using the Non-uniform Discrete Fourier Transform (NDFT) in conjunction with dual-windowing and a computationally efficient finite-state machine. The present invention requires no buffering of input data, and is simple enough to decode 24 digitized telephone channels of a time-division multiplexed T1, line (1.544 Megabits/second) using a standard single fixed-point digital signal processor (DSP).
These and other features, aspects, and advantages of the present invention will become better understood with reference to the following description, appended claims, and drawings.