The present invention relates to relay systems for providing voice-to-text captioning for hearing impaired users and more specifically to a relay system that uses automated voice-to-text captioning software to transcribe voice-to-text.
Many people have at least some degree of hearing loss. For instance, in the United states, about 3 out of every 1000 people are functionally deaf and about 17 percent (36 million) of American adults report some degree of hearing loss which typically gets worse as people age. Many people with hearing loss have developed ways to cope with the ways their loss effects their ability to communicate. For instance, many deaf people have learned to use their sight to compensate for hearing loss by either communicating via sign language or by reading another person's lips as they speak.
When it comes to remotely communicating using a telephone, unfortunately, there is no way for a hearing impaired person (e.g., an assisted user (AU)) to use sight to compensate for hearing loss as conventional telephones do not enable an assisted user to see a person on the other end of the line (e.g., no lip reading or sign viewing). For persons with only partial hearing impairment, some simply turn up the volume on their telephones to try to compensate for their loss and can make do in most cases. For others with more severe hearing loss conventional telephones cannot compensate for their loss and telephone communication is a poor option.
An industry has evolved for providing communication services to assisted users whereby voice communications from a person linked to an assisted user's communication device are transcribed into text and displayed on an electronic display screen for the assisted user to read during a communication session. In many cases the assisted user's device will also broadcast the linked person's voice substantially simultaneously as the text is displayed so that an assisted user that has some ability to hear can use their hearing sense to discern most phrases and can refer to the text when some part of a communication is not understandable from what was heard.
U.S. Pat. No. 6,603,835 (hereinafter “the '835 patent) titled system for text assisted telephony teaches several different types of relay systems for providing text captioning services to assisted users. One captioning service type is referred to as a single line system where a relay is linked between an AU's device and a telephone used by the person communicating with the AU. Hereinafter, unless indicated otherwise the other person communicating with the assisted user will be referred to as a hearing user (HU) even though the AU may in fact be communicating with another assisted user. In single line systems, one line links the HU to the relay and one line (e.g., the single line) links to the relay to the AU device. Voice from the HU is presented to a relay call assistant (CA) who transcribes the voice-to-text and then the text is transmitted to the AU device to be displayed. The HU's voice is also, in at least some cases, carried or passed through the relay to the AU device to be broadcast to the AU.
The other captioning service type described in the '835 patent is a two line system. In a two line system a hearing user's telephone is directly linked to an assisted user's device for voice communications between the AU and the HU. When captioning is required, the AU can select a captioning control button on the AU device to link to the relay and provide the HU's voice to the relay on a first line. Again, a relay CA listens to the HU voice message and transcribes the voice message into text which is transmitted back to the AU device on a second line to be displayed to the AU. One of the primary advantages of the two line system over one line systems is that the AU can add captioning to an on-going call. This is important as many AUs are only partially impaired and may only want captioning when absolutely necessary. The option to not have captioning is also important in cases where an AU device can be used as a normal telephone and where non-assisted users (e.g., a spouse that has good hearing capability) that do not need captioning may also use the AU device.
With any relay system, the primary factors for determining the value of the system are accuracy, speed and cost to provide the service. Regarding accuracy, text should accurately represent voice messages from hearing users so that an AU reading the text has an accurate understanding of the meaning of the message. Erroneous words provide inaccurate messages and also can cause confusion for an AU reading the messages.
Regarding speed, ideally text is presented to an AU simultaneously with the voice message corresponding to the text so that an AU sees text associated with a message as the message is heard. In this regard, text that trails a voice message by several seconds can cause confusion. Current systems present captioned text relatively quickly (e.g. 1-3 seconds after the voice message is broadcast) most of the time. However, at times a CA can fall behind when captioning so that longer delays (e.g., 10-15 seconds) occur.
Regarding cost, existing systems require a unique and highly trained CA for each communication session. In known cases CAs need to be able to speak clearly and need to be able to type quickly and accurately. CA jobs are also relatively high pressure jobs and therefore turnover is relatively high when compared jobs in many other industries which further increases the costs associated with operating a relay.
One innovation that has increased captioning speed appreciably and that has reduced the costs associated with captioning at least somewhat has been the use of voice-to-text transcription software by relay CAs. In this regard, early relay systems required CAs to type all of the text presented via an AU device. To present text as quickly as possible after broadcast of an associated voice message, highly skilled typists were required. During normal conversations people routinely speak at a rate between 110 to 150 words per minute. During a conversation between an AU and an HU, typically only about half the words voiced have to be transcribed (e.g., the AU typically communicates to the HU during half of a session). This means that to keep up with transcribing the HU's portion of a typical conversation a CA has to be able to type at around 55 to 75 words per minute. To this end, most professional typists type at around 50 to 80 words per minute and therefore can keep up with a normal conversation for at least some time. Professional typists are relatively expensive. In addition, despite being able to keep up with a conversation most of the time, at other times (e.g., during long conversations or during particularly high speed conversations) even professional typists fall behind transcribing real time text and more substantial delays occur.
In relay systems that use voice-to-text transcription software trained to a CA's voice, a CA listens to an HU's voice and revoices the HU's voice message to a computer running the trained software. The software, being trained to the CA's voice, transcribes the revoiced message much more quickly than a typist can type text and with only minimal errors. In many respects revoicing techniques for generating text are easier and much faster to learn than high speed typing and therefore training costs and the general costs associated with CA's are reduced appreciably. In addition, because revoicing is much faster than typing in most cases, voice-to-text transcription can be expedited appreciably using revoicing techniques.
At least some prior systems have contemplated further reducing costs associated with relay services by replacing CA's with computers running voice-to-text software to automatically convert HU voice messages to text. In the past there have been several problems with this solution which have resulted in no one implementing a workable system. First, most voice messages (e.g., an HU's voice message) delivered over most telephone lines to a relay are not suitable for direct voice-to-text transcription software. In this regard, automated transcription software on the market has been tuned to work well with a voice signal that includes a much larger spectrum of frequencies than the range used in typical phone communications. The frequency range of voice signals on phone lines is typically between 300 and 3000 Hz. Thus, automated transcription software does not work well with voice signals delivered over a telephone line and large numbers of errors occur. Accuracy further suffers where noise exists on a telephone line which is a common occurrence.
Second, most automated transcription software has to be trained to the voice of a speaker to be accurate. When a new HU calls an AU's device, there is no way for a relay to have previously trained software to the HU voice and therefore the software cannot accurately generate text using the HU voice messages.
Third, many automated transcription software packages use context in order to generate text from a voice message. To this end, the words around each word in a voice message can be used by software as context for determining which word has been uttered. To use words around a first word to identify the first word, the words around the first word have to be obtained. For this reason, many automated transcription systems wait to present transcribed text until after subsequent words in a voice message have been transcribed so that context can be used to correct prior words before presentation. Systems that hold off on presenting text to correct using subsequent context cause delay in text presentation which is inconsistent with the relay system need for real time or close to real time text delivery.