Generally, this invention relates to the field of automatic speech recognition technology, and in particular, to an apparatus and method for transmitting speech signals over a control channel in a telecommunications system to initiate calls.
Conventional telephone systems use speech recognition technology to enable voice-activated dialing services and voice-activated directory assistance. With these systems, a directory receives a spoken name, a speech recognition process recognizes the received name, and system elements use the recognized name to find the corresponding telephone number. Once the number is located, a call is then launched to the desired destination. Longstanding problems with such systems, however, have limited their performance in terms of both accuracy and computational speed. Further, to ensure the most accurate speech recognition, conventional systems and methods must transmit the entire speech signal "in-band," which requires telecommunication data channels due to the high bandwidth.
In conventional telephone networks, control, or signaling, channels transmit control information for establishing terminal links (session set-ups), terminating terminal links (session tear-downs), etc. In contrast, data channels carry data, or media type, signals such as voice and video transmissions. Control channels operate at a much lower data rate than the data channels because the control information requires less bandwidth than media type data signals. In most cases, signaling information is transmitted over a control channel around 8 or 16 Kbps, while data information is transmitted around 64 Kbps. In addition, data channels occupy a greater portion of a communication line's capacity, and thereby limit the number of calls a particular transmission line can accommodate.
Other speech recognition systems perform the entire speech recognition process locally and dial a number based on the result. These systems use a telephone terminal that can perform the three basic stages of speech recognition: feature extraction, pattern classification, and decision logic. In the first stage, relevant characteristics of the speech signal are extracted. The later stages use the extracted features to correlate the spoken name with a previously stored name template. A database lookup is then performed to retrieve a telephone number corresponding to the recognized name.
Systems employing this solution are currently expensive and impractical to implement. One drawback of such systems is that every telephone terminal capable of providing full speech recognition must be able to perform the entire speech recognition process locally before setting up or initiating a call. This requirement forces the terminal to contain both the hardware and software to perform all three phases of the speech recognition process.
The terminal also requires access to a database of recognizable names or speech patterns. The more names the speech processor can recognize the greater and more practical the benefit to the individual user. In the past, this goal has been accomplished by allowing the user to train the speech processor to recognize certain speech patterns and recalling these patterns when a voice-dialing request was made. Alternatively, preprogramming the processor with a number of "templates" allows multiple users to implement voice-dialing from the same terminal. The resulting terminal in both scenarios, however, is expensive and usually has limited voice recognition capabilities.
Other solutions have been proposed to overcome the problems associated with local speech recognition. For example, U.S. Pat. No. 5,488,652 (Bielby et al.) discloses a method and apparatus for training a speech recognition algorithm for directory assistance applications. This allows terminal users to send their voice-activated dialing requests to a remote speech recognition server. With the system disclosed in Bielby et al., the user speaks a name into a receiver at a standard terminal interface and, upon receiving the speech/voice signal, the remote server performs the entire speech recognition process and initiates the desired call. That system, however, requires a high-bandwidth data channel to transmit the speech signal received from the user.
In addition to transmitting the entire speech signal in-band over the data channel, systems such as Bielby et al. require the call to be "set-up" through an analog channel bank or digital interface prior to processing the call information. Call set-up is a procedure used between the call routing switch and the telephone terminal elements. The procedure uses a protocol and switching mechanism that operate jointly to negotiate the set-up and establish the connection between parties. For example, if A places a call to B, A would send a call-request message to the switch with B as the destination number. The switch would then check the status of B and, if B is not busy, send a call-initiate message to A (at which point A hears ringing) and a call-setup message to B (at which point B's phone starts ringing). When B pickups up, a call-accept message is sent from B to the switch. At this point, the switch completes the connection, switching the call, and changes its internal state to show that both A and B are busy.
In digital telephone systems, digital interfaces such as T1, DS30, or other proprietary mechanisms, provide the protocol and switching mechanisms necessary for call set-up between the user and the remote speech processor. Normally, call set-up is required to establish a complete connection because the remote processor needs the entire speech signal before speech recognition can occur.
In the alternative, allowing a user to transmit voice-activated dialing requests "out of band," over a lower bandwidth control channel, would eliminate the need for the call to be setup prior to the speech recognition process. As a consequence, the digital interface between the user and the speech recognition processor could also be eliminated, which in turn would result in significant cost and equipment savings.