The invention relates generally to data network telephony. More particularly the invention relates to providing a minimum acceptable quality of service for a voice conversation conducted over a data network.
The publicly switched telephone network (PSTN) is a circuit switched network that has been optimized for real time or synchronous voice communication with a guaranteed quality of service (QoS). When a telephone call is initiated, a circuit is established between the calling party and the called party and the PSTN guarantees QoS by dedicating a full duplex circuit between parties of a telephone conversation. Regardless of whether or not parties are speaking or silent, they are occupying the entire dedicated circuit until the call ends. Since the occupied bandwidth remains constant, the cost of a telephone call on the PSTN is based on distance and time.
On the other hand, typical data networks are packet switched networks that have been used for applications such as e-mail and file transfers where a variable QoS is tolerable. Typical packet switched networks do not dedicate a path between a sender and a receiver and therefore it is harder to guarantee a particular QoS. As data networking technology has improved, the ability to conduct real time conversations over data networks has been developed. By conducting conversations over data networks, access to a PSTN may not be needed and PSTN charges may be avoided. For example, many corporations have extensive enterprise data networks that have untapped capability to carry voice conversations in addition to the data that is being exchanged throughout the network. By channeling voice traffic onto a data network, a corporation may be able to significantly reduce PSTN expenses.
In data networks that transfer data packets according to the popular Internet Protocol (IP), conducting real time voice conversations over a data network is commonly referred to as IP telephony. The term IP telephony is used in the present specification to refer generally to all real time voice conversations conducted through a data network. Besides private data networks such as enterprise networks, IP telephony can be carried out over the global Internet, which also allows users to avoid PSTN expenses beyond the expenses related to Internet access.
Although IP telephony has many advantages, it also has some disadvantages. The main disadvantage of IP telephony is the unpredictable QoS that is provided. The unpredictability is predominantly a result of bandwidth limitations and latency. Bandwidth limitations and latency are often tied together, since when there is insufficient bandwidth in a network to transfer a voice conversation at a desired rate, some packets may be delayed in the transmission to the destination or some packets may be dropped altogether from the transfer because they have taken too much time to be transferred. When packets generated from a voice conversation are delayed or dropped, the quality of the voice conversation carried over the network declines.
One conventional technique used to minimize bandwidth limitations and latency problems in the transmission of voice conversations over a data network is data compression. Data compression allows the amount of voice data from a conversation to be reduced into a smaller number or smaller sized packets for transfer through a network. One problem with compression/decompression algorithms is that the more compressed the data is, the harder it is to decompress the data into an exact replica of the original voice data. As a result, there is a tradeoff between the compression ratio applied to a voice conversation and the quality of the decompressed product.
In order to match the optimal compression ratio to the current bandwidth capacity of a network that is used for voice conversations, systems have been designed that intelligently negotiate and terminate a current compression/decompression algorithm in favor of a more appropriate algorithm depending on current network traffic conditions. An example of a dynamically changing compression/decompression algorithm system is disclosed in U.S. Pat. No. 5,546,395, entitled xe2x80x9cDynamic Selection of Compression Rate for a Voice Compression Algorithm in a Voice Over Data Modem,xe2x80x9d issued to Sharma et al. (hereafter Sharma). Although Sharma may work well for its intended purpose, the transition between compression/ decompression algorithms typically costs valuable setup time that degrades voice conversation quality.
In view of the current bandwidth limitations and latency involved with IP telephony and the disadvantages of compression/decompression negotiation, what is needed is a method and apparatus for conducting voice conversations over a data network, with sufficient quality and reliability.
A method and apparatus for transmitting delay-sensitive data over a packet-based network involves converting the delay-sensitive data into two versions for transmission through the network and then using one of the two versions to regenerate the original delay-sensitive data and using the other version to supplement the regeneration of the delay-sensitive data when necessary to compensate for transmission errors or delay that occur during the transmission of the version that was initially used for regeneration. In a preferred embodiment, the delay-sensitive data represents a real time voice conversation and the two versions of the delay-sensitive data include the same segments of the conversation that have been compressed into packets using two different compression algorithms. The first version of the delay-sensitive data is more highly compressed than the second version and because of compression/decompression inefficiencies, the first version provides a lower quality reproduction of the original voice conversation than the second version. Although the first version is of a lower quality, it consumes less bandwidth and has lower latency when transmitted over the network, relative to the more voluminous high quality second version. The low quality version is then used to fill in voice data gaps that are caused when the high quality version does not arrive at its destination on time.
To optimize the quality of the voice conversation, packets from the highly compressed version of the data are sent before packets from the less compressed version, where both sets of packets represent the same segment of the voice conversation. The highly compressed packets are buffered at the receiving end in case they are needed to supplement the less compressed version. Packets from the less compressed version are utilized whenever possible to regenerate the conversation at the receiving end of the transmission, however if packets from the less compressed version are overly delayed or are dropped, then the corresponding packets from the highly compressed version of the data are supplemented to regenerate the segments of the conversation that would otherwise have been lost or distorted.
The preferred method of the invention is applicable to any packet-based network where the term packet includes data segments referred to as cells, frames, etc. Network protocols applicable to the invention include Internet protocol-based networks, ethernet networks, token ring networks, frame relay networks, and asynchronous transfer mode (ATM) networks.
An IP telephony device designed to enable data transmission in accordance with the invention includes a microphone, a speaker, a converter, memory, a processor, a compression unit, a decompression unit, and a transceiver. The speaker and microphone are conventional devices that are used to send and receive audio information in the frequency range of normal conversation. The converter is a conventional device that converts analog information to digital information and converts digital information to analog information. The converter interfaces with the speaker and the microphone to convert analog voice data from the microphone into digital voice data for the IP telephony device and to convert digital voice data from the IP telephony device to analog voice data for the speakers. The memory is conventional memory that is used to buffer incoming and/or outgoing packets. The processor performs data management functions which include controlling the flow of data between the functional units of the IP telephony device. The compression unit compresses the digital voice data into packets before the packets are sent to a receiving device, and the decompression unit decompresses compressed packets that are received from a sending device. The transceiver includes any conventional device, such as a network interface card or a modem that sends and receives packets of data to and from the data network. Although the components of the IP telephony device are described separately, the functions of the components can be incorporated into a single device or groups of devices other than as explained.