1. Field of the Invention
The present invention relates to data transmission systems and more particularly to a method and apparatus for facilitating correction of data loss in such a system. The invention is suitable for use in any telecommunications network or transmission path that includes an end-to-end or node-to-node connection for communication of multiple data streams between a pair of devices.
By way of example, and without limitation, the invention will be described in the context of transmitting packet based real time voice, video, both voice and video, or other media signals over a packet switched computer network, for use in internet-based telephony (e.g., voice over IP (VoIP)). These are generally referred to herein as multimedia signals. However, the invention may also be suitably employed to transmit other types of signals and over other networks (such as local area (LAN), metropolitan area (MAN) or wide area (WAN) networks, and circuit switched networks, for example) or direct end-to-end connections, as well as with other transmission protocols.
2. Description of the Related Art
Packet switched networks now provide interactive communications services such as telephony and multi-media conferencing. In the context of packet switched networks operating according to the Internet Protocol (IP), this technology is presently known as internet telephony, IP telephony or, where voice is involved, Voice over IP (VoIP).
VoIP presents an attractive technology for use in long distance telephone calls, as compared to the public switched telephone network (PSTN), which has been the traditional transmission medium. The advantage of VoIP calls over PSTN calls is cost. In the United States, for instance, long distance service providers for the PSTN provide domestic services at rates ranging from roughly 10 to 30 cents per minute, and international rates for substantially more, depending on the time of day, day of the week, and the distances involved. In contrast, the cost of a VoIP call anywhere in the world is potentially the cost of a local telephone call to a local internet telephony service provider at one end and the cost of a local call from an internet telephony service provider at the far end to the destination telephone. Once the call is routed from the local VoIP provider onto the IP network, the cost to transmit the data from the local internet telephony provider to the far end internet telephony provider can be free for all practical purposes, regardless of where the two parties are located, Similarly, the cost to facilitate a direct dial internet telephony call can theoretically be free, except for possible access fees charged by local exchange carriers. VoIP service providers can thus potentially charge users far less for VoIP calls than the users would pay for comparable calls placed strictly over the PSTN.
In a packet switched network, a message to be sent is divided into blocks, or data packets, of fixed or variable length. The packets are then sent individually over the network through multiple locations and then reassembled at a final location before being delivered to a user at a receiving end. To ensure proper transmission and re-assembly of the blocks of data at the receiving end, various control data, such as sequence and verification information, is typically appended to each packet in the form of a packet header. At the receiving end, the packets are then reassembled and transmitted to an end user in a format compatible with the user""s equipment.
To facilitate packet-based communication over interconnected networks that may include computers of various architectures and operating systems, the networks and computers typically operate according to an agreed set of packet switching protocols. A variety of such protocols are available, and these protocols range in degree of efficiency and reliability. Those skilled in the art are familiar, for instance, with the Transport Control Protocol/Internet Protocol (TCP/IP) suite of protocols, which is used to manage transmission of packets throughout the Internet and other packet switched networks.
Each protocol in the TCP/IP suite is designed to establish communication between common layers on two machines, or hosts, in the network. The lowest layer in the Internet is the xe2x80x9cphysicalxe2x80x9d layer, which is concerned with ensuring that actual bits and bytes of information pass along physical links between nodes of the network. The next layer is the link layer, which ensures a reliable connection between nodes in the network. The next layer is the xe2x80x9cnetworkxe2x80x9d or xe2x80x9cIPxe2x80x9d layer, which is concerned with permitting hosts to inject packets of data into the network to be routed independently to a specified destination. The next layer in turn is the xe2x80x9ctransportxe2x80x9d layer, which is concerned with allowing peer entities on source and destination hosts to carry on a conversation. Generally speaking, the IP and transport layers of the Internet are not concerned with the physical arrangement of the network, such as whether source and destination machines are on the same sub-network or whether there are other sub-networks between them.
The transport layer of TCP/IP can utilize two end-to-end protocols, TCP (Transport Control Protocol) and UDP (User Datagram Protocol). TCP is a reliable connection-oriented protocol, which includes intelligence necessary to confirm successful transmission between the sending and receiving ends in the network. UDP, in contrast, is an unreliable connectionless protocol, which facilitates sending and receiving of packets but does not include any intelligence to establish that a packet successfully reached its destination. In general, UDP is used by applications that do not want TCP""s sequencing or flow control and wish to provide their own.
According to UDP, the transport layer takes a data stream to be transmitted and breaks it up into independent connectionless segments or xe2x80x9cdatagrams.xe2x80x9d UDP adds to each of these packages an 8 byte header, which includes overhead information such as a source port number, a destination port number and a length and a checksum designed to allow the receiving end to properly reassemble the datagrams into the original message. The transport layer then xe2x80x9cpassesxe2x80x9d each of these packages to the IP layer.
The IP layer in turn adds another header to each package, providing additional overhead information, such as a source IP address and a destination IP address. The IP layer then transmits the resulting packages through the Internet, possibly fragmenting each package into pieces as it goes. As the pieces of the package finally reach the destination machine, they are reassembled by the IP layer and passed to the transport layer.
For real time data or media signals (such as voice or video) to be transmitted over packet switched networks, the packets to be transmitted may be encapsulated by one or more additional header layers according to established higher level protocols. An example of one such higher level protocol is Real Time Protocol or RTP. RTP may provide each packet with at least a 12 byte header containing timestamps and sequence numbers. Included in this header may be a 7 bit payload type, which may define the type of payload in the underlying data packet. In practice, when the transmitting and receiving network ends establish communication of such signals, they will negotiate a mutually acceptable meaning for these RTP payload types. By way of example, the RTP payload type may indicate the type of voice or video codec (e.g., G.729, G.723.1, etc.) used to compress the underlying media signal, thereby facilitating proper decoding at the receiving end.
Packet switched networks such as the Internet thus serve to provide end-to-end (or node-to-node) communication between a pair of network devices or machines. These network devices may access or be connected to the Internet through any suitable configuration. In a usual arrangement, for instance, each device is connected via a communications link (such as the public switched telephone network (PSTN) or a LAN) to a server or gateway that provides access to the Internet. The gateway is typically owned and operated by an Internet service provider (ISP) and is known as a network access server (NAS) or remote access server (RAS).
Of course, the gateway itself may also be considered a network device or machine, as it serves to communicate over the network with a machine (e.g., another gateway) at another end or node.
Network access servers are commercially available from 3Com Corporation and other telecommunications equipment manufacturers such as Ascend Communications, Livingston Enterprises, and Multitech. A representative NAS is the Total Control Enterprise Network Hub from 3Com Corporation, as described in the patent of Dale M. Walsh, et al., U.S. Pat. No. 5,597,595 (xe2x80x9cthe Walsh patentxe2x80x9d), which is fully incorporated herein by reference. This NAS has a telephone line interface that can be connected to a high-speed multiplexed digital telephone line, such as a T1 line or an ISDN line. The NAS further provides a plurality of digital modems to perform signal conversions (such as voice or video encoding) on the data from the telephone line channels and a bus network connecting the digital modems to a network interface card or module. Examples of such network interface cards are the NetServer(trademark) and EdgeServer(trademark) cards from 3Com Corporation. The network interface card in turn couples the NAS to a local or wide area network, such as the ISP backbone network or the Internet.
While packet switched networks have traditionally been used to carry non-realtime transmissions (such as e-mail messages or other data transfers), one of the promising new uses of these networks is to carry telephone conversations and other interactive communications. Known as xe2x80x9cIP telephonyxe2x80x9d in the context of IP networks, the goal of this new technology is to replace or enhance conventional circuit switched telephone networks with more versatile and universal packet switched communications.
FIG. 1 illustrates a basic IP telephony configuration. In this configuration, users at two or more telephone devices are set to engage in a conversation over an IP network. Each telephone device may take any of a variety of forms. For instance, without limitation, the device may be a conventional analog telephone or a personal computer (PC) equipped with a handset (or a microphone and speakers) to facilitate a conversation. Each telephone device is served by to an IP telephony gateway (ITG), which is owned by an IP telephony service provider (ITSP) and provides connectivity to the network. In practice, users may subscribe to the service provided by an ITSP and may then place and receive calls over the IP network via a communications link to their respective gateways.
The communications link may take any suitable form. For instance, if the telephone device is a conventional telephone, the communications link may be the conventional PSTN, with a T1 span extending to the ITG. In that case, a subscriber may place a call to the ITG over the PSTN. As another example, if the telephone device is a PC on a LAN, the communications link may be the LAN extending to the ITG. In that case, a subscriber may contact the ITG via the existing network connection. Of course, other suitable communications links are known or will be developed as well.
The ITG may take the form of a network access server similar to those described above, modified to the extent necessary to facilitate telephone conversations over the network. For instance, while the modems in a conventional NAS modulate and demodulate signals to communicate with subscribers"" PC modems, the xe2x80x9cmodemsxe2x80x9d in an ITG may not need to modulate or demodulate signals. Instead, the modems may be configured to receive the telephone signals originating at subscriber telephone devices and to sample (if necessary), compress and packetize the signals for transmission over the network, and vice versa for signals coming from the network.
Like other network access servers, an ITG will typically receive and process a plurality of telephone conversation signals from subscriber devices and transmit these signals in parallel over the IP network to a destination gateway. At a given moment, for instance, the ITG may simultaneously receive a plurality of unrelated speech signals from a given communications link such as a T1 span, process those signals as necessary, and place a series of corresponding RTP packets onto the network in one or more outgoing packet streams for transmission to a destination gateway.
Ideally, all of the packets transmitted into a packet switched network by the ITG should arrive successfully at the designated remote gateway, for conversion as necessary and transmission to the destination device. Either the remote gateway or the destination device, as the case may be, should then receive the transmitted IP packets, extract the payload from the packets and reconstruct an ordered data stream or signal for receipt by an end user.
Unfortunately, however, deficiencies in the existing communication infrastructure have precluded the successful widespread transmission of real time media signals, such as digitized voice, audio and video, from end-to-end over packet switched networks. One of the principles reasons for this lack of success is a high rate of packet loss and delay.
The Internet, for example, suffers from a high rate of packet loss and resulting transmission delays. In particular, depending on conditions such as how congested the Internet is at any given time, loss of entire packets has been found to occur on the Internet at a rate of up to 25%, or up to one in every four packets. Typically, this packet loss occurs one packet at a time, which might or might not perceptibly distort a real-time audio signal, but may perceptibly distort a real-time video signal, and would certainly distort a pure data signal such as an e-mail message. Often, however, burst errors occur on the Internet and result in the loss of multiple sequential packets in a row. Unlike the sporadic loss of a single packet, if left uncorrected, these burst errors can and will substantially and perceptibly distort almost any transmitted signal.
The connection-oriented TCP protocol provides a mechanism for responding to packet loss in an IP network. According to TCP, when a segment arrives at its destination, the receiving TCP entity should send back to the sending entity a segment bearing an acknowledgement number equal to the next sequence number that it expects to receive. If the sending entity does not receive an acknowledgement within a specified time period, it will re-transmit the package of data.
Generally speaking, this acknowledgment and re-transmission system works well to correct for packet loss. However, the system can unfortunately delay the complete transmission of a data stream. For the transmission of packets representing pure data signals such as e-mail messages, transmission delay is not ideal, although it is of secondary concern compared to an unrecoverable loss of information. Real-time media signals, however, are by definition highly sensitive to delay and will appear jumpy, interrupted or otherwise distorted if parts of the signal do not flow continuously to the receiving end. Further, in the context of interactive real-time communications such as packet-switched telephony, delay is even more problematic, since participants to such communications expect the network connection to simulate immediate, in-person interaction, without delay.
Rather than employing (or invoking) an acknowledgement and retransmission system, less delay in packet loss correction can be achieved by transmitting a correction code of some sort concurrently with the payload data, thereby providing the receiving end with sufficient information to recover lost packets. Several error correction code mechanisms are available for this purpose. These mechanisms include, for instance, convolution coding, interleaving and block coding, all of which are well known to those skilled in the art. Of these mechanisms, perhaps the most common is block coding.
Block coding calls for mapping a frame of source data into a coded block of data that includes a set of redundant parity symbols. By conventional terminology, an xe2x80x9c(n, k)xe2x80x9d block coder typically converts a group of k payload units (such as bytes or bits) from a data stream into a larger group of n units by deriving p=nxe2x88x92k parity units or forward error correction (FEC) codes. Each parity unit is generated through a predetermined coding technique based on all or some subset of the k payload units.
The parity units may then be transmitted in-stream with the underlying payload units (e.g., interleaved with the payload, or after the payload, or appended to the payload). Alternatively or additionally, the parity units may be transmitted in a separate stream in parallel with the underlying payload stream. This latter technique is described, for instance, in J. Rosenberg, H. Schulzrinne, An RTP Payload Format for Generic Forward Error Correction, Internet Engineering Task Force, Internet Draft, July 1998, the entirety of which is hereby incorporated herein by reference.
Many forms of block coding are now known. One of the simplest forms of a block code, for instance, is a repetition code, in which the source data is repeated as a set of parity bits. One of the more popular but complex block codes is the Reed-Solomon (RS) class of codes over the 28 Galois field. These codes are optimal in their ability to correct erased bytes. For example, provided that 8 bytes are protected with 3 parity bytes (a total of 11 bytes), any three bytes can be lost, and the original 8 bytes may still be recovered.
Another example of block coding is to append or concatenate redundant parity information to existing data packets in the packet stream. For instance, as an offshoot of traditional repetition codes, the transmitting node may append to each data packet redundant copies of the preceding k number of data packets. In this way, the receiving end may readily recover a lost packet Di from one of the k subsequent packets Di+1. . . Di+k. As more preceding packets are concatenated with each current packet in the stream, the network can then tolerate a higher rate of packet loss.
Still another block coding technique is described in co-pending U.S. patent application Ser. No. 08/989,616, entitled xe2x80x9cA Forward Error Correction System for Packet Based Real Time Mediaxe2x80x9d and filed on Dec. 12, 1997, the entirety of which is hereby incorporated by reference. According to this technique, parity bits associated with current packets are piggy-backed onto future packets. In particular, as a sequence of payload blocks is being transmitted, every k payload blocks in the sequence are fed through a block coder to create p=nxe2x88x92k forward error correction (FEC) codes or parity packets, where pxe2x89xa6k. Each of these p parity packets may then be concatenated respectively with one of the next p data packets being transmitted. In turn, at the receiving end, if a packet is lost, the associated payload may be extracted from the parity blocks carried by the appropriate subsequent group of packets.
Yet another coding technique is described in U.S. patent application Ser. No. 08/989,483, also entitled xe2x80x9cA Forward Error Correction System for Packet Based Real Time Mediaxe2x80x9d and filed on Dec. 12, 1997, the entirety of which is also hereby incorporated by reference. According to this technique, a single parity block p may be derived as an XOR sum of the payload carried by the preceding k packets in the stream and then concatenated with the current packet for transmission. With this technique, regardless of the number of sequential packets to be recovered at the receiving end, the size of the forward error correction code remains of the same order as the payload itself.
While each of these forward error correction coding techniques has its advantages, the existing techniques still suffer from at least one inherent disadvantage: delay. In particular, since the parity information, p, is derived as some function of group of preceding payload information, k, the receiving end will usually not receive the parity information until it first receives all of the payload information. Therefore, in response to a loss of some payload information, the receiving end will need to wait until the necessary parity information arrives in order to recover the lost information.
Further, provided with a complex coding scheme in which a number of the k payload units (as well as the parity unit(s)) are required in order to recover from a loss of one or more of the k payload units, the receiving end will need to wait until all of those necessary payload units arrive as well. Thus, regardless of whether the parity units for a given stream are transmitted in-stream with the underlying payload or in a separate FEC stream, some additional delay will inherently occur in responding to packet loss.
As noted above, any such delay is problematic in the context of real time media transmissions and particularly so in the context of interactive network communications such as IP telephony. While one way to reduce this delay may be to use less complex FEC schemes (such as simple repetition codes), that solution is likely to be unacceptable as the quality of error correction may decrease and the bandwidth may increase.
In view of these deficiencies in the existing art, a need exists for an improved system of forward error correction coding.
The present invention provides a simple yet elegant mechanism for improved end to end transmission of real time audio or voice signals. In addition, video, both voice and video, or other media signals may be sent. These are generally referred to herein as multimedia signals. For convenience, the invention will be described primarily with reference to audio signals. But it should be understood that the description applies equally to multimedia signals generally. In the context of VoIP, the telecommunications network comprises of a packet switched network and an IP telephony gateway serving as an interface between a telephone device and the IP network, the IP telephony gateway receives a conversation signal from the telephone device. At least two different digital encoding systems are employed on a single audio source. In one preferred embodiment, for instance, both the G.711 and G.723.1 encoding standards simultaneously produce two different sets of data for the identical conversation. By way of illustration, the preferred embodiment utilizes a high-level time-conscious protocol RTP. Two different sets of coded data are transmitted simultaneously through two separate RTP packet streams. The G.711 stream contains frames of 10 milliseconds worth of audio samples whereas the G.723.1 stream contains 30 milliseconds worth of audio samples. Both streams are frame aligned, it follows that when every three G.711 frames are sent on the G.711 stream, a G.723.1 frame representing the same information in the three sent G.711 frames will be transmitted shortly after on the G.723.1 stream. It must be noted that in other embodiments, data samples of alternative time lengths are possible and standards other than the G.711 and G.723.1 are also possible.
In this preferred embodiment, the receiving end of the VoIP conversation would preferably buffer and decode the frames back into audible signals from the G.711 stream. The G.723.1 stream is preferably ignored and left undecoded. In the event of packet loss on the
G.711 stream, the lost information is recovered from the G.723.1 stream. Further, due to the frame alignment aforesaid and the timestamp feature of RTP, the G.723.1 frame from which the FEC information is to be derived and decoded can be easily located.
Alternatively, in another embodiment, the simultaneous transmission of G.711 and G.723.1 frames uses only one RTP stream. In this manner, the voice signal is, once again, subjected to both G.711 and G.723.1 coding and produces simultaneously two different sets of digital code representing the voice signal. However the transmission method is different. For every 30 millisecond segments of VoIP audio, two G.711 frames each having 10 milliseconds of data would be transmitted separately and the remaining G.711 packet having 10 milliseconds of data would be transmitted together with 30 milliseconds of data in a G.723.1 frame corresponding to the three 10 millisecond G.711 frames. Hence a total of three frames are transferred on a single RTP stream. Once again, in the event of a data loss, the G.723.1 frame already received would be decoded for FEC information. It must also once again be noted that in other embodiments, data samples of alternative time lengths are possible and standards other than the G.711 and G.723.1 are also possible.
The preferred embodiment using a single RTP stream is more efficient when compared with the dual RTP streams. More overhead is involved in establishing two streams. By using a single stream, there is no increase in overhead. However, in a dual stream embodiment, there is a 33% increase in packet count with the extra and separate transmission of G.723.1 frames. Finally, in the single stream embodiment, when the packet containing both the G.711 and G.723.1 code is lost, then the FEC information is unavailable and reconstruction may not be possible.
Nonetheless, the availability of the different frames renders the transmission of data practically redundant in the preferred embodiment. Upon the loss of any data, FEC, according to he present invention allows simultaneous or near-simultaneous recovery from the alternate frames. As a result, in the context of VoIP, the resulting telephone conversation may be carried out in a coherent and satisfying manner, without concern from audio loss in transmission.