1. Field of the Invention
This invention generally relates to the recognition of a human voice as distinct from various call progress and signaling tones on a telephone network and, more particularly, to an improved technique for separating valid audio signals from noise and then classifying those audio signals as call progress or other signaling tones, voice, and even recorded voice.
2. Description of the Prior Art
The invention has specific utility in automatic telephone dialing systems used to deliver prerecorded messages or connect an operator to a called party upon their answer. Such systems are being increasingly used for many commercial applications from the sale of subscriptions to periodicals or the sale of securities by various brokerage houses to the collection of bills. Generally, the system has input to it a list of telephone numbers to dial which are dialed in order, keeping track of those numbers for redial at a later time when a call is not completed. When a call is answered, the system recognizes that the call has been answered, then connects the called party to an available operator who is provided with data on a video display terminal (VDT) screen concerning the called party. If there is no available operator, the system connects the called party to a source of a prerecorded message until an operator becomes available. In this way, the operators are saved the time required to dial and monitor a call, greatly enhancing their efficiency.
These automatic dialing systems must be able to recognize a human voice and various call progress tones. The public telephone network is a challenging environment to attempt this. At first, one would expect that this environment would be ideal. Call progress tones are comprised of various distinct tones generated at predictable intervals whereas the human voice is by, its nature, indistinct and asymmetric. However, in practice, reliable detection of the human voice has proved to be much more difficult than, at first, would be expected.
Actually, there are two separate environments which exist in the public telephone network. The first may be described as a "clean" environment wherein calls have a moderate amount of background noise and the call progress tones follow the Bell Standard Precise Tone Plan; that is, those specific tone frequencies and cadence patterns as specified in Bell publication 61100, "Description of the Analog Voiceband Interface Between the Bell System Local Exchange Lines and Terminal Equipment." The second, in contrast, may be described as a "dirty" environment wherein calls are subject to much noise and/or have call progress tones which do not follow the Bell Standard Precise Tone Plan. Generally, a "clean" environment is associated with new Central Office equipment or large metropolitan areas, and a "dirty" environment is associated with older Central Office equipment or small, independent Central Offices. It is these noise conditions of a "dirty" environment which pose the greatest obstacle to rapid and efficient detection of an answer to a placed call while still filtering out busy signals and other results of call progress.
In the past, several approaches to human voice detection have been employed. For example, U.S. Pat. No. 4,405,833 to Cave et al. describes a call progress detection circuit which makes use of unique properties of call progress tones. More specifically, Cave et al. detect the modulation envelope produced by the difference frequency of two frequencies which compose a call progress tone. The frequency and other timing parameters of the modulation envelope are examined to determine the type of signal. If, for example, a busy signal is detected, the circuit notifies the associated automatic dialing system which will release the line which has been dialed, store the telephone number in memory for later recall, and then dial the next number on a list. If a ringing tone is detected, the circuit will determine if the call is answered.
Another approach is taken by Szlam et al. in U.S. Pat. Nos. 4,477,698 and 4,540,855 which disclose a combination of hardware and digital signal processing for detecting pick-up of a telephone call. The apparatus employs a high gain band pass filter, the output of which goes to a window comparator. The output of the window comparator goes to a digital high pass filter and from there to an integrator for providing a digital output signal indicative of the presence or absence on the telephone line of a signal exceeding a predetermined magnitude within the filter pass band. This digital signal is then processed by a microprocessor based digital filter having a set of predetermined threshold values of durations for states of the digital output signal, by which a determination of a telephone pick-up are made.
Also known is the voice detection circuitry used in the ComPlus equipment manufactured and sold by International Telesystems Corporation of Herndon, Va. This detection circuitry used an algorithm, referred to hereafter as the old recognition method or ORM, which made use of a SAMPLE which is a value representing half the number of zero crossings of an incoming audio signal during a 25 ms period. Eight of these are collected into a WINDOW of data. To classify a WINDOW, all eight SAMPLEs are compared with successive ranges of values. For example, the SAMPLEs with values between seven and twenty-two are tallied and if the count is greater than six, a "TONE" is returned. If this test fails, the number of SAMPLEs with values greater than five are counted. If this count is greater than five, an "ANSWER" is returned, but if five or less, then "NOISE" is returned.
Notice that the data is repeatedly searched for values between certain limits. This has proved to be inefficient. Furthermore, if, while collecting the WINDOW, the value of any one SAMPLE is below three (a predefined quiet threshold), the collection is stopped and a value of "NOISE" is returned. This event will misclassify a voice introduction where, because of the nature of voice, one SAMPLE out of eight is quite likely to fall into the frequency range which is normally filtered out. Instead, the "NOISE" result is ignored, and the "hello" is lost. Now only another "hello" or detecting a break in the cadence of the call can detect the answer. The WINDOW collecting portion does no testing for small or large deviations of SAMPLE values within a frequency range and the frequency "bands" tested are too broad to properly qualify a WINDOW of data as a "TONE". This method does not identify continuous tones in the range of a modem and improperly classifies as "TELCO" some classes of "rings" which have SAMPLEs across several frequency ranges.
The ORM decides to analyze a WINDOW based on information gathered by two functions which monitor for a change of state. These functions are capable only of detecting a change from "tone" to "quiet" or from "quiet" to "tone" and will not detect a frequency shift such as a "ring" becoming a "hello". Finally, the cadence analysis portion of the ORM bases classification decisions on only half the information available for a tone, such as "if the tone lasts as long as a ring, then ringing", but the intervening quiet period may not have also met the "ringing" criteria. This portion will also begin to look for the next change of state without first qualifying the previous change against a possible noise "hit".