To allow for people having speech and/or hearing disabilities that prevent them from using conventional telephones to communicate over the public switched telephony network, text telephones or teletypewriters (TTY devices), also known as telecommunication devices for the deaf (TTD devices) have been developed. In general, such devices encode characters of text using sequences of audible tones. In particular, in response to receiving a command to transmit a character, a TTY device will generate a sequence of audible tones that is transmitted through the telephone network to a similar TTY device at the receiving end. The TTY device at the receiving end decodes the sequence of audible tones, and displays or otherwise outputs the encoded character.
In the United States, TTY devices communicate with one another using a 45.45 baud frequency shift key protocol defined in ANSI/TIA/EIA 825″ A 45.45 Baud FSK Modem, commonly referred to as Baudot signaling. Baudot signaling transmits characters using a sequence of seven audible tones at either 1400 Hz or 1800 Hz. As shown in FIG. 1, a Baudot or TTY character 100 comprises a start tone 104 of 1800 Hz, five tones 108-124 of either 1400 or 1800 Hz to signal the series of five bits specifying the character, and a stop tone 128 of 1400 Hz. The stop tone 128 is a border separating this TTY character 100 from the next. Between each adjacent pair of tones, a tone border exists, such as the tone borders 132a-h. To provide both numbers, letters, and punctuation marks, each TTY endpoint operates in two modes, namely a number/figure mode and a letter mode. There is no error correction. At 45.45 baud, the duration of each individual tone is 22 ms, though the stop tone 128 is permitted to be as long as 44 milliseconds. The duration of each TTY character 100 is at least 154 milliseconds, which works out to approximately six and a half characters per second. By coincidence, the duration of individual tones used in Baudot signaling is very close to the time segment of a voice communication that is included in a packet of data transmitted in connection with a typical voice over IP (VoIP) communication system.
Voice over IP or IP telephony is rapidly gaining in popularity due to the widespread availability of the Internet. In IP telephony, voice communications are “packetized” or divided into a number of packets at the source communication device and sent over a packet-switched network, such as the Internet, to the destination communication device. This mechanism permits efficient bandwidth utilization, allowing voice and nonvoice data to be mixed on the same infrastructure. The voice communication is converted into a digital representation for inclusion in packets using either waveform codecs or vocoders. The resulting numerical representation is divided up into frames, a number of which are included within a given packet payload. The payload for each packet is typically 20 milliseconds. Each host packet further includes a header (containing the audio encoding scheme, a packet sequence number, the source and destination addresses and other information), trailer, and other “overhead” bytes.
The ability for each packet to take what is, at that instant, the “best” route to the destination is reason why TTY-on-VoIP is unreliable: because packets are free to take different pathways, they cannot be relied upon to arrive at the receiving device before it is their “turn” to be played. Although these packets often arrive eventually, they are regarded as lost because they did not arrive in time, and must therefore be discarded.
Under most circumstances, the loss of occasional packets is not detectable in voice communication. The reason is that VoIP telephones employ packet loss concealment algorithms that trick the human ear by mimicking the acoustic properties of adjacent packets. Although these techniques work well with voice, they do not work with TTY characters. If a packet containing a TTY tone is lost, the VoIP packet loss concealment techniques of the present art are unable to recover it or rebuild it.
Systems for improving the reliability of TTY transmissions have been developed in other domains, for example in connection with digital wireless telephony applications. In wireless telephony, the problem being addressed was not due to packet loss, but was instead caused by the use of voice-optimized audio encoding techniques that cannot encode TTY tones without distortion. All of these approaches rely on a modem-type mechanism, in which the TTY's Baudot tones are not transmitted as an audio stream, but are instead translated into a non-audio data stream. Despite their inherent reliability, these approaches are not entirely satisfactory because they tend to preclude mixed-mode voice and TTY dialog. This is a significant problem because nearly half of all TTY users are hard of hearing, but still speak clearly. These individuals prefer to receive with their TTYs and then speak in response, something they are unable to do on systems that do not permit TTY and voice transmissions to be intermixed.
The impact of packet loss on the quality on TTY communications can be illustrated by a simple example. Assume a VoIP packet size is 20 milliseconds (a typical value) and the packet loss rate is 0.5% (a rate generally regarded as excellent for VoIP communication). An individual TTY text character 100 is at least 154 milliseconds in length and therefore spans eight packets. When there is a 0.5% likelihood that any one of these packets is missing, approximately 4% of all TTY characters will lose one of their packets. Worse yet, the 4% error rate is deceptively low in that if the lost packet is the one that contains the stop tone 128 for that character, subsequent characters, even if transmitted without packet loss, might nonetheless be decoded improperly. A TTY character error rate of more than 1% is generally regarded as unacceptable, primarily because the transmission of information such as bank balances and credit card numbers becomes unreliable. Using a simple statistical model that is based on a 20 millisecond packet size and ignoring the additional deleterious effects that result from dropping a stop tone 128, the 1% character error rate threshold is exceeded when VoIP packet loss rates exceed approximately 0.12%—a packet loss rate generally regarded as unachievable in standard VoIP systems.