Traditional telephony presents a problem for persons who are deaf, hard of hearing, or speech-impaired (D-HOH-SI). Communication by telephone requires each party to a telephone call to be able to hear and/or speak to the other party on the call to communicate. For hearing or speech impaired persons, audio communication is difficult or impossible, making telephone communication difficult or impossible.
Early approaches to facilitating telecommunications for D-HOH-SI persons included text-based telecommunications relay service (TRS). Text-based TRS services allow a D-HOH-SI person to communicate with other people over an existing telecommunications network using devices capable of transmitting and receiving text characters over the telecommunications network. Such devices include the telecommunications device for the deaf (TDD) and the teletypewriter (TTY). Text-based TRS services were well-suited to the bandwidth limitations of subscriber lines of the time. The bandwidth limitations of subscriber lines were also a limiting factor in the widespread use of video telephony.
The availability of affordable, high-speed packet-switched communications has led to the growth in the use of video relay services (VRS) by D-HOH-SI persons. Using VRS equipment, D-HOH-SI persons can place video calls to communicate between themselves and with hearing individuals using sign language. VRS equipment enables D-HOH-SI persons to talk to Hearing individuals via a sign language interpreter, who uses a conventional telephone at the same time to communicate with the party or parties with whom the D-HOH-SI person wants to communicate. The interpretation flow is normally within the same principal language, such as American Sign Language (ASL) to spoken English or spoken Spanish.
While VRS is a useful service for people who rely on sign language to communicate, captioned telephone service can be used by people who can use their own voice to speak but need assistance to hear what is being said to them on the other end of a telephone call. Captioned telephone service is a telecommunication service that enables people who are hard of hearing, oral deaf, or late-deafened to speak directly to another party on a telephone call. Typically, a telephone displays substantially in real-time captions of what the hearing party speaks during a conversation. The captions are displayed on a screen embedded in the telephone base. Captioned telephone services can be provided in traditional telephone environments as well as in voice-over-internet-protocol (VOIP) environments.
Initially, captioned telephone service was only available to people in states that had captioned telephone service as part of their state relay program. The FCC made internet protocol caption telephone service (IP-CTS) a part of the federally mandated services under the TRS fund. IP-CTS requires an internet connection to deliver the captions to the user. Most users also rely on their regular land-line telephone for the audio portion of the call, but some configurations of IP-CTS allow the use of VOIP to carry the call audio. IP-CTS has allowed captioned telephone service to be provided on smartphones and tablets.
IP-CTS is a relatively new industry that is growing extremely fast. IP-CTS has services paid for by the FCC's TRS fund and delivered by private companies, such as ClearCaptions, LLC, assignee of the present application. IP-CTS is particularly useful to anyone who can use their own voice to speak but who needs assistance to hear or understand what is being said by the other end of the call.
To reduce the costs associated with operating a call center with human captioners for IP-CTS, automated speech recognition (ASR) software is an alternative that can be used to deliver a caption stream in real-time to a user across the user's telephone, computer, tablet or table-top phone. ASR software does not have a human element so the costs of operating a call-center with human captioners can be reduced or eliminated. However, an issue with using ASR exclusively (i.e., without a human captioner) for IP-CTS is that ASR is only as good as its ability to understand the speech of a particular individual and accurately generate captions for each and every individual using the service. Because not everyone talks the same way, ASR may not work to caption everyone with an acceptable level of accuracy (i.e., ASR would work with sufficient accuracy for some people's voices, but not for others). What is needed is a way to provide accurate captioned telephone service using automated speech recognition assisted by human captioning.