Telephone systems were originally developed for use by people. A person would dial the desired number, then determine whether the called party answered by listening to the line. The telephone system provides various signals to aid the calling party in determining the progress of the call they are placing. These include dial tone, audible ring (the ring indication heard by the calling party), busy signals, and special service information tones (SSIT) (an SSIT is typically followed by a recorded voice announcement indicating such conditions as all circuits are busy, the dialed number is no longer in service, etc.).
Increasingly, various types of apparatus are working with people in their use of the telephone system, and in some cases automated systems are placing calls independent of any person's supervision.
The following example illustrates the need for automatic call progress analysis. It is typical for voice message systems to require that message recipients call the voice message system to receive their messages. Sometimes it is desirable for a voice message system to call the recipient, rather than wait for the recipient to call the system. In such situations it is necessary for the voice message system to perform call progress analysis, in order to reliably deliver voice messages.
Automatic call progress analysis can also be useful when a computer is placing a call for a person. Someone might select a number to be dialed from a computer-based telephone list. The computer could place the call and indicate to the person when the telephone has been answered without the person needing to listen to the intervening telephone line activity.
In order for a machine to be able to reliably place calls, it is useful for the machine to be able to track a call's progress by recognizing the various signals a calling party receives. It is important to be able to detect and identify the various progress tones (e.g., dial tone, ring, busy, SSIT) and to detect speech and accurately distinguish it from the other signals indicating call progress (i.e., the progress tones).
Dialtone, ring, and busy signals are normally double-tone signals, i.e., each is the sum of two tones. For example, a dial tone typically is the sum of a 350 Hz tone and a 440 Hz tone; audible ring typically is the sum of a 440 Hz tone and a 480 Hz tone. SSITs are single-tone signals lasting for about one second in which the tone frequency is changed in discrete steps after each one-third of a second; an SSIT is normally followed by speech (in the form of a recorded announcement).
Speech is both one of the most important signals to recognize, and yet is one of the most difficult signals for a machine to recognize. Speech is difficult for a machine to recognize because of the great variability in speech waveforms. It is important to distinguish speech from the other call progress signals because the presence of speech typically indicates the end of a call delivery sequence: the presence of speech normally indicates that the phone has been answered and the call should be "delivered" (a connection to the called party has been completed and the call progress analysis portion of the system passes control to that portion of the system that will conduct the substance of the call), unless the speech follows an SSIT, which is also the end of the call delivery sequence, but indicates that the call attempt should be terminated.
One class of approaches to detection and discrimination of progress signals involves analysis of the cadence of the received signals. This starts by summarizing the received signals as an alternating series of signal-energy-present and signal-energy-absent intervals. The patterns of the durations of the signal-on intervals and the durations of the intervening signal-off intervals can often be used to distinguish among the various types of progress tones. This approach deals poorly with speech because speech can occur with highly varied timing patterns, including patterns that match those of progress tones. Although the irregularity of speech is a useful clue, this irregularity may only reveal itself after an unacceptably long period of time. Timing-based detection schemes also have difficulty dealing with nonstandard progress tones. For example, ring signals can take on a wide variety of different patterns depending on the telephone exchange generating the ring. Also, actual signals may vary in their timing substantially from the standard, due to equipment variability.
Another class of approaches involves the use of frequency measurement. In these systems, the detection/discrimination system has either a set of band-pass filters or is capable of determining the frequency spectrum of the received signal (i.e., by performing an FFT computation). As progress tones generally have characteristic tone structure, frequency analysis of the waveform received by the caller can extract a great deal of information about call progress. Again, speech can create discrimination problems because the frequency content of speech is highly variable. And, there is variability in the frequency information due to differences between telephone systems and the state of particular telephone equipment. In addition, the needed filters can be expensive and the FFT is computationally intensive.
Both cadence-based and frequency-based approaches tend to be limited by the lack of uniformity among and within telephone systems. There are no universally accepted standards: there are differences between older systems and newer systems; there are differences between systems in different geographic areas. The effect of even those standards that do exist is qualified by the fact that the standards are generally only recommendations and compliance is not mandated.
Therefore, it is an object of the present invention to detect and discriminate among call progress signals with a high degree of accuracy despite variations that exist among and within telephone systems.
A further object is to reliably distinguish speech from other call progress signals.
Further, it is an object to provide such detection and discrimination using a relatively modest amount of computing resources.