TTYs (also known as telecommunication devices for the deaf, or TDDs) are the text terminals that people with hearing impairments use in order to communicate over telephone lines. In the United States, the most commonly used TTY communication standard is described by ANSI/TIA/EIA standard 825. It describes a 45.45 Baud frequency-shift-keyed (FSK) modem for use on the public switched telephone network. Important aspects of this standard include:
1. TTYs are silent when not transmitting. Unlike fax machines and computer modems, TTYs have no “handshake” procedure at the start of a call, nor do they have a carrier tone during the call. (Although absence of a carrier tone tends to limit the speed of transmission, it has the advantage of permitting TTY tones, Dual Tone Multi-Frequency signals—also known as DTMF or touch tones—and voice transmissions to be intermixed on the same call.)
2. Operation is “half duplex.” In other words, TTY users must take turns transmitting, and typically cannot interrupt each other. If both people try to type at the same time, their TTYs will show no text at all, or will show text that is gibberish. There is no automatic mechanism that lets TTY users know when a character they have typed correctly has been received incorrectly.
3. Each TTY character consists of a sequence of seven individual tones. The first is always a “start tone” at 1800 Hz. This is followed by a series of five tones, at either 1400 or 1800 Hz, which specify the character. The final tone in the sequence is always a “stop tone” at 1400 Hz. The “stop tone” is a border that separates this character from the next. Each of the first six tones is 22 milliseconds in duration. The final “stop tone” may also be 22 milliseconds, but is permitted to be as long as 44 milliseconds. This means that the duration of each TTY character is at least 154 milliseconds, which works out to approximately six and a half characters per second. (The description of this as a 45.45 Baud protocol is based on the number of 22-millisecond tones that can be transmitted in one second, not the number of characters.)
4. The protocol is moded. That is, the same five-bit (five-tone) sequence will code for a letter and for a number or punctuation mark, as shown in Table 1. Illustratively when a TTY is in “letters” mode, the sequence 00001 corresponds to the letter E. By contrast, when a TTY is in “figures” mode, the sequence 0001 corresponds to the digit 3. It should be noted that the mode shifts are likewise specified by five-bit sequences “11011” and “11111,” as shown in table 1.
TABLE 1Binary SequenceLettersFigures00000N/AN/A00001E300010LFLF00011A—00100SpaceSpace00101SBELL00110I800111U701000CRCR01001D$01010R401011J′01100N,01101F!01110C:01111K(10000T510001Z″10010L)10011W210100H#10101Y610110P010111Q111000O911001B?11010G&11011Figures ShiftFigures Shift11100M.11101X/11110V;11111Letters ShiftLetters Shift
Many of the techniques that are commonly employed by telephone systems to digitize voice signals are able to digitize TTY tones with perfect accuracy. Unfortunately, some techniques that are optimized for low-bit-rate encoding of voice signals tend to distort TTY tones. An example of the former is the ITU standard G.711 encoding (also known as 64 kilobit μ-law Pulse Code Modulation) that is commonly employed in digital telephones. An example of the latter is the Group Systeme Mobile (GSM) encoding used on many wireless telephones.
A problem of a different sort is presented when trying to use a TTY in conjunction with packet-switched systems, such Voice over Internet Protocol (VoIP) telephony networks. These systems transmit audio streams by digitally encoding the audio and then breaking the streams into individual packets. A typical packet contains a 20-millisecond stream of audio, although packets of other lengths may also be employed. Each of these audio packets is tagged with header information, such as an identifier of the audio encoding scheme that was used, a sequence number, and the destination's IP address. The complete packet is then delivered by the originating device to the network, which transports the packet via shared pathways that often contain packets from many different sources, with many different destinations.
Although the destination is specified in the packet header information, the route to the destination is not specified. The ability for each packet to take what is, at that instance, the “best” route to its destination is where VoIP derives a lot of its economic advantage. It is also the reason why TTY-on-VoIP can be unreliable: because packets are free to take different pathways, they cannot be relied upon to arrive at the receiving device before it is their “turn” to be played. Although these packets often arrive eventually, they are regarded as lost because they did not arrive in time, and must therefore be discarded.
Under most circumstances, the loss of occasional packets is not detectable in voice communication. Although 20-millisecond periods of silence would certainly be noticeable in a voice stream (sounding a bit like static), VoIP telephones employ packet loss concealment algorithms that trick the human ear, typically by mimicking the contents of adjacent packets that have been received. Although these techniques work well with voice, they do not work with TTY tones. If a packet containing a TTY tone is lost, the current generation of VoIP techniques is unable to recover it or rebuild it.
With regard to the percentage of packets that one might expect to lose, it is generally the case that packet loss of 0.2% or less is achievable when the two VoIP endpoints are on the same campus, using communication pathways that are not congested. By contrast, for VoIP calls that originate or terminate “off campus”—in other words, for calls in which there is a wider range of packet routing possibilities—or for VoIP calls that are transported on congested networks, packet loss of 2.0% or higher is typical.
With regard to the impact of packet loss on TTY performance, consider the following illustrative example: assume that the VoIP packet size is 20 milliseconds (a typical value) and that the packet loss rate is 0.5% (a rate generally regarded as excellent for VoIP communication). Keep in mind that an individual TTY text character is at least 154 milliseconds in length, and therefore spans eight packets. This means that, if there is a 0.5% likelihood that any one of those packets is missing, approximately four percent of all TTY characters will lose one of their packets. If any one of the eight packets within a character is lost, that character will not be displayed properly on the receiving device. This is true of the mode shift “characters” as well: the signaled mode shift will not be recognized.
Even though the simple statistical model above would seem to predict a four percent TTY error rate under the described conditions (20-millisecond packet size, 0.5% packet loss rate), the actual error rate would tend to be much higher. This is because, if the lost packet is the one that contained the “stop tone” for that character, subsequent characters, even if transmitted without packet loss, might nevertheless be decoded improperly.
As a point of comparison, a TTY character error rate of more than one percent is generally regarded as unacceptable, chiefly because the transmission of information such as bank balances and credit card numbers becomes unreliable. Using a simple statistical model that is based on a 20-millisecond packet size, and ignoring the additional deleterious effects that result from dropping a “stop tone,” the one percent character error rate threshold is exceeded when VoIP packet loss rates exceed approximately 0.12%.
Federal laws, such as Section 255 of the Telecommunications Act of 1996 and Section 508 of the Workforce Investment Act of 1998, require telecommunication systems to retain compatibility with standard TTY devices. Given the problems associated with TTY-on-VoIP transmissions, many manufacturers of VoIP systems are exploring methods by which TTY tones may be translated into a standard non-audio text protocol, such as the ITU standard T.140, for reliable transmission within IP networks. Specifically, under the proposals that have been submitted recently to standards bodies such as the Telecommunication Industry Association, incoming TTY tones that are received by the system via an input audio channel (e.g., via an analog trunk on the PSTN) would be converted to their text equivalents and then transmitted within the IP network via data channels that employ an error-correcting protocol such as TCP/IP. Although this text stream could be reconverted to audio tones at the receiving end, thereby permitting a standard TTY to be used, most of the proposals envision piping the text stream directly to non-TTY endpoints, such as desktop computers that are equipped with T.140-compatible “Instant Messaging” software.
FIG. 1 illustrates the architecture of this prior art. User 102 is communicating via a standard TTY device 104. The tones generated by TTY device 104 are transmitted via connection 106, which may be an analog line or a TTY-compatible digital connection that does not distort the tones. Connection 106 terminates at gateway 108, which decodes the tones and translates them into Internet-compatible text equivalents, using a standard protocol such as T.140. The text is transmitted within the IP network 110 to an IP endpoint 112; illustratively, endpoint 112 may be a desktop computer that is able to decode T.140-encoded text and present it on a display. Text transmissions, which originated with TTY user 102, may then be read by non-TTY user 114.