1. Field of the Invention
The present invention relates to various computer-implemented methods, computer systems, and computer program products for providing text of voice conversations to users using a form of telecommunications Relay Service, known as a Captioned Telephone Service.
2. Background and Relevant Art
According to a 2006 National Health Interview Survey conducted by the National Center for Health Statistics of the Centers for Disease Control and Prevention, there are at least 37 million adults in the U.S. that have some form of diminished hearing. Some persons are born with diminished hearing, some persons lose hearing due to injury, and many persons experience hearing loss while ageing. Persons with diminished hearing often experience isolation and difficulty communicating with others.
Participating in telephone calls is especially challenging for persons with diminished hearing. This is because the Public Switched Telephone Network (PSTN), due to technical limitations and legacy design constraints, provides an audio experience that uses only a fraction of the frequency spectrum that most humans are able to hear. For example, FIG. 1 illustrates that the frequency range conveyed by a standard landline (i.e., the portion labeled “low fidelity”), the part of the PSTN with which people are most familiar, is incapable of conveying much of the frequency spectrum associated with the pronunciations of one or more of j, m, f, s, th, v, d, g, n, ng, and e. Cellular voice networks, including Second-Generation (2G) and Third-Generation (3G) wireless technologies such as Code Division Multiple Access (CDMA), Global System for Mobile communication (GSM), and Enhanced Data rates for GSM Evolution (EDGE), which are also considered part of the PSTN, have similar limitations. While people with normal hearing are generally able to overcome the limitations of the PSTN by filling in the gaps based partly on the context of the communication, for people with diminished hearing there are often too many gaps to fill.
Under present technology, hearing and speech impaired individuals communicate over a telephone call with people without such impairments by means of a relay service (RS). Conventional RS's are offered by having a human operator participate in a 3-way conversation between the hearing or speech impaired individual and the remote party, with the operator providing assistance as needed to allow the two parties to communicate. One conventional form of relay service is called Captioned Telephone Service (CTS). A CTS provides a textual transcription (i.e., captions) of the remote party's portion of a live telephone conversation, enabling persons with hearing loss to both listen to and read the words spoken by the remote party in the conversation. FIG. 2 illustrates a conventional CTS telephone, which includes a display for presenting text captions of the words spoken by the remote party.
FIG. 3A illustrates an overview of some conventional CTS implementations. In FIG. 3A, a CTS phone 301 (e.g., the CTS phone of FIG. 2) of a CTS party 301a is in a bi-directional voice communication with a remote party device 303 (e.g., landline phone, cellular device) of a remote party 303a. A CTS provider 302 is also involved in the communication, at least to the extent that the remote party device 303 transmits the remote party's 303a voice to the CTS provider 302 over the PSTN. The CTS provider 302 converts the remote party's 303a speech to text captions to be transmitted to the CTS phone 301. The CTS provider 302 generates text captions by a human operator listening to the speech originating from the remote party device 303, and then generating text captions from the remote party's speech. Generating text captions may include the human operator creating a transcription by typing the remote party's speech or through use of stenography, or re-voicing the words into a microphone. When re-voicing is being used a recording of the human operator's speech as recorded by the microphone is then fed to a speech recognition software application. The speech recognition software application converts the operator's speech to text, and the text is transmitted to the CTS phone 301 for viewing by the CTS party 301a. 
FIGS. 3B through 3D illustrate some more specific conventional CTS implementations. FIG. 3B illustrates a conventional “1-line” CTS implementation (i.e., the CTS phone 301 requires one phone line connected to the PSTN). In FIG. 3B, the CTS phone 301 of the CTS party 301a is connected to the CTS provider 302 through a first PSTN connection 304a, and the remote party device 303 of the remote party 303a is also connected to the CTS provider 302 through a second PSTN connection 304b. In this configuration, a user initiates a call by first calling a “1-800” number of the CTS provider 302, and then provides the phone number of the person to be called. The voice conversation for the CTS party 301a and the remote party 303a is relayed through the CTS provider 302 over the PSTN connections 304a/304b, and the CTS provider 302 generates text captions from the remote party's 303a voice as described above, and transmits the text captions to the CTS phone 301 through the first PSTN connection 304a. 
FIG. 3C illustrates a conventional “2-line” CTS implementation (i.e., the CTS phone 301 requires two phone lines connected to the PSTN). In FIG. 3C, the CTS phone 301 of the CTS party 301a is connected to the CTS provider 302 through a first PSTN connection 304a (i.e., over a first phone line connected to the CTS phone 301), and is also connected to the remote party device 303 of the remote party 303a through a second PSTN connection 304b (i.e., over a second phone line connected to the CTS phone 301). Thus, the voice conversation between the CTS party 301a and the remote party 303a is relayed over the second PSTN connection 304b. The CTS phone 301 also communicates the speech of the remote party 303a to the CTS provider 302 over the first PSTN connection 304a. The CTS provider 302 generates text captions from the remote party's 303a voice as described above, and transmits the text captions to the CTS phone 301 through the first PSTN connection 304a. 
FIG. 3D illustrates a conventional Internet Protocol (“IP”) CTS implementation. The implementation of FIG. 3D is the same as FIG. 3C, except that instead of the CTS phone 301 connecting to the CTS provider 302 through a connection to the PSTN, the CTS phone 301 connects to the CTS provider 302 through an Internet connection 305. The CTS phone 301 still connects to remote party devices 303 over the PSTN 304.
Due to the technical limitations of the PSTN, all existing CTS implementations—including each of the foregoing example implementations—require extensive involvement of a human operator at a CTS provider. Such human involvement is undesirable for a variety of reasons. For example, use of a human operator in each CTS call makes RS's very expensive. In addition, involvement of a human operator presents privacy concerns, introduces delay in the conversation (i.e., a delay as an operator is re-voicing, or a delay as the operator creates a transcription), and can lead to inaccuracies in the text captions, may require human operators with specialized skills (e.g., the ability to speak and listen at the same time, fast and accurate typing skills, stenography skills), among other things.