In order to allow for people having speech and/or hearing disabilities that prevent them from using conventional telephones to communicate over the public switched telephony network, text telephones (TTY devices), also known as telecommunications devices for the deaf (TDD devices) have been developed. In general, such devices encode characters of text using sequences of audible tones. In particular, in response to receiving a command to transmit a character, a TTY device will generate a sequence of audible tones that is transmitted through the telephone network to a similar TTY device at the receiving end. The TTY device at the receiving end decodes the sequence of audible tones, and displays or otherwise outputs the encoded character.
In the United States, TTY devices communicate with one another using a 45.45 baud frequency shift key protocol commonly referred to as Baudot signaling. Baudot signaling transmits characters using a sequence of seven audible tones at either 1400 Hz or 1800 Hz. In particular, a Baudot character comprises a start bit of 1800 Hz, five tones of either 1400 or 1800 Hz to signal the series of five bits specifying the character, and a stop bit of 1400 Hz. There is no error correction. At 45.45 baud, the duration of each individual tone is 22 ms. By coincidence, the duration of individual tones used in Baudot signaling is very close to the time segment of a voice communication that is included in a packet of data transmitted in connection with a typical voice over IP (VoIP) communication system.
VoIP systems are increasingly popular as a way to efficiently allow parties to engage in voice communications. In particular VoIP systems allow parties to communicate by voice over computer networks, such as the Internet, rather than requiring the establishment of a dedicated, point-to-point communication link, as in traditional switched circuit telephony. Accordingly, VoIP can be a very economical way to conduct voice communications.
Unfortunately, existing VoIP systems are problematic when used in conjunction with TTY devices. The problem is caused by packet loss. Specifically, VoIP systems transmit digital audio streams, such as voice, by breaking the streams into individual packets (typically 20 ms in length, although packet sizes of other lengths are not precluded). Each of these packets is assigned header information, such as the digital audio encoding scheme that was used, a sequence number, and a destination. It is important to note that the route to the destination is not part of the header information.
The ability for each packet to take what is, at that instant, the “best” route to the destination is where VoIP derives its economic advantage. It is also the reason why TTY-on-VoIP is unreliable: because packets are free to take different pathways, they cannot be relied upon to arrive at the receiving device before it is their “turn” to be played. Although these packets often arrive eventually, they are regarded as lost because they did not arrive in time, and must therefore be discarded.
Under most circumstances, the loss of occasional packets is not detectable in voice communication. The reasons is that VoIP telephones employ packet loss concealment algorithms that trick the human ear by mimicking the acoustic properties of adjacent packets. Although these techniques work well with voice, they do not work with TTY characters. If a packet containing a TTY tone is lost, the VoIP packet loss concealment techniques of the present art are unable to recover it or rebuild it.
Systems for improving the reliability of TTY transmissions have been developed in other domains, for example in connection with digital wireless telephony applications. (In wireless telephony, the problem being addressed was not due to packet loss, but was instead caused by the use of voice-optimized audio encoding techniques that cannot encode TTY tones without distortion). All of these approaches rely on a modem-type mechanism, which the TTY's Baudot tones are not transmitted as an audio stream, but are instead translated into a non-audio data stream. Despite their inherent reliability, these approaches are not entirely satisfactory because they tend to preclude mixed-mode voice and TTY dialog. This is a significant problem because nearly half of all TTY users are hard of hearing, but still speak clearly. These individuals prefer to receive with their TTYs and then speak in response, something they are unable to do on systems that do not permit TTY and voice transmissions to be intermixed.