The use of voice processing technology in both public and private telecommunication networks is widespread. The most familiar type of voice processing technology is a telephone system equipped with a voice mail system. In a voice mail system, an incoming caller is routed to a voice mailbox associated with a particular person or department. The particular owner of the voice mailbox may not be available to speak immediately to the caller. The caller is then invited to leave or record a message on the system in a similar fashion to telephone answering machines. Many callers would rather speak to a live person than a computerized machine and some callers avoid leaving a message. At least some of these persons find speaking to a voice messaging system an unpleasant experience, in-part, because the voice messaging system may not give responsive feedback during the recording session. This responsive feedback is generally denoted as audible backchannel responses, such as, “mm-hummm”, “O.K.”, “yeah”, “uh-huh”, or “yes”. These backchannel responses generally are what a human listener normally says while listening to another person speaking.
The purpose of backchannel responses is to make the speaker feel more natural and comfortable during speech. These audible backchannel responses are generally utterances during a conversation that signifies to the speaker that the listener has understood what the speaker spoke. In particular, when one person is recording a spoken message on an automated recording device for delivery to another there are no backchannel responses provided to the person. Without backchannel responses, the speaker generally becomes less efficient in communication and uncomfortable. Thus, a spoken message recorded on the automated recording device, such as a voice mail system, may be longer and sometimes difficult to understand.
Research has shown that people speaking on the telephone while leaving a message tend, to repeat themselves and use more words to convey the same information when they do not hear backchannel responses. This additional message length tends to cause a storage medium, such as a hard disk drive, of voice messaging systems to become full. Telecommunication managers must spend additional labor resources to clean the system storage, purchase additional storage capacity, or force the voice mailbox owner to delete messages. This can increase the operating cost of using voice messaging systems in terms of additional labor hours and out-of-pocket capital equipment expenditures. Therefore, if the length of messages can be shortened, the storage space and money can be saved.
Conventional voice processing systems do not provide automated backchannel responses keyed to the caller while the caller is speaking, in particular, recording or dictating a message. Voice messaging systems only record a message by allowing the caller to speak first. The current available voice messaging systems play pre-recorded messages or voice prompts to the caller, at the end of the speaker's message or post recording. After the caller finishes the recorded message, the voice mail or processing system or automated attendant tells the caller what to do for navigating in the system. Further interactive voice response (“IVR”) systems do not provide automated backchannel responses. Conventional IVR systems generally perform an action upon receiving an audible voice command or telephone keypad input. The audible voice command takes the place of keyboard input. Some IVR systems provide audible information, such as stock quotes or banking account information. IVR provide conversational responses by either waiting for the end of a voice command to perform an action or to play pre-recorded information. Again the voice commands are post processing. Some voice mail systems or IVR systems prompt the user by alerting or beeping the user to a time limit for the message. This alerting or beeping is not a backchannel response based on the speech and silence pattern in the voice of the user.
There has been some research in the area of backchannel responses. For example, the authors Ward and Tsukahara in Prosodic Features which Cue Back-Channel Responses In English and Japanese, Journal of Pragmatics, Volume 32, Issue 8, 2000 discloses research that focuses on the changes in sound or pitch in the speaker's voice to determine when to produce a backchannel response. This research discloses focusing on prosodic cues in which to trigger a backchannel response. There must be software to determine the syntactic cues in a person speech. There is no disclosure of a voice processing system that uses the pattern of speech and non-speech to determine when to produce a backchannel response for a user.
Voice transcription devices are known in the art. Some are hand-held devices and computer based systems as disclosed in U.S. Pat. No. 5,197,052 to Schroder et al. and U.S. Pat. No. 6,122,614 to Kahn et al. Some transcription devices convert speech-to-text using speech-recognition software. Conventional voice transcription devices lack the ability to facilitate the dictation process by providing automated backchannel responses based on the speech pattern of a user.
As both consumers and businesses are flooded with electronic messages in various media types, the ability to process these messages efficiently becomes more valuable. Thus, what is needed is a system and method of providing audible backchannel responses in voice processing systems without the aforementioned drawbacks of conventional voice processing technology. In particular, what is needed is a voice messaging system that treats the problem at the source, by influencing the caller or speaker to leave a shorter message for more efficient voice messages. Also what is needed is a voice recording/messaging system that simulates a human listener.