The present document is related to two other U.S. patent applications filed on the same date, each in the name of the same inventors. The other two applications are entitled xe2x80x9cSystem for Dynamic Jitter Buffer Management Based on Synchronized Clocksxe2x80x9d and xe2x80x9cSystem for Routing Real-Time Media Transmissions Based on Delay.xe2x80x9d The entirety of each of these other applications is incorporated herein by reference.
1. Field of the Invention
The present invention relates to the transmission of real-time media signals over data networks and more particularly to a method and apparatus for using absolute time information and/or other parameters to assess, improve and manage such transmission. The invention is particularly useful in the context of IP networks such as the Internet or an intranet. However, the invention is not limited to use in this context but extends more generally to use in the context of any store-and-forward network such as any packet switched network, including, for instance, ATM, frame relay, X.25 and SNA networks.
2. Description of Related Art
There has long been a need in the art to transmit real-time media signals from one location to another. In early days, the need to convey voice signals was satisfied through the use of relatively simple analog telephone systems. More recently, the availability of digital telephone systems and advanced computer networks such as the Internet has facilitated the communication of assorted real-time media signals, such as voice, audio and/or video over long distances at a fraction of the cost of conventional systems. Currently, there are two types of networks that can be used to convey real-time media signals, circuit switched networks and packet switched networks.
In a circuit switched network, a point-to-point communication path or circuit is established between two or more users, such that the users have exclusive and full use of the circuit until the connection is released. A media signal to be transmitted is then sent in whole over the dedicated circuit, received by the other side and played out to a user. The public switched telephone network is an example of a circuit switched network.
In a packet switched network, in contrast, a message to be sent is divided into blocks, or data packets, of fixed or variable length. The packets are then sent individually over the network through multiple locations, and then reassembled at a final location before being delivered to a user at a receiving end. To ensure proper transmission and re-assembly of the blocks of data at the receiving end, various control data, such as sequence and verification information, may be appended to each packet in the form of a packet header, or otherwise associated with the packet. At the receiving end, the packets are then reassembled and transmitted to an end user in a format compatible with the user""s equipment. The Internet is an example of a packet switched network.
At their inception, each type of telecommunications network was designed to support the transmission of select types of media. Circuit switched networks were designed to carry real-time audio signals (e.g., voice). Packet switched networks, on the other hand, were designed to carry pure data signals (e.g., e-mail). Today, however, these networks compete to provide multi-media transmission services, including, for instance, the transmission of data, voice, audio and/or video. Further, with the growth of the Internet and other advances in technology, packet switched networks are now competing with conventional circuit switched networks to provide interactive communications services such as telephony and multi-media conferencing. In the context of packet switched networks operating according to Internet Protocol (IP), this technology is presently known as internet telephony, IP telephony or, where voice is involved, Voice over IP (VoIP).
Internet telephony presents an attractive technology for use in long distance telephone calls, as compared to the public switched telephone network (PSTN), which has been the traditional transmission medium. One of the primary advantages of internet telephony is its flexibility and features, such as the ability to selectively provide different levels of service quality and to integrate voice and data services (for instance, integrating e-mail and voice mail functions).
Another primary advantage of internet telephony is cost. In the United States, for instance, long distance service providers for the PSTN provide domestic services at rates ranging from roughly 10 to 30 cents per minute, and international rates for substantially more, depending on the time of day, day of the week, and the distances involved. In contrast, the cost of an internet telephony call anywhere in the world is potentially the cost of a local telephone call to a local internet telephony service provider at one end and the cost of a local call from an internet telephony service provider at the far end to the destination telephone. Once the call is routed from the local internet telephony provider onto the IP network, the cost to transmit the data from the local internet telephony provider to the far end internet telephony provider can be free for all practical purposes, regardless of where the two parties are located. Similarly, the cost to facilitate a direct dial internet telephony call can theoretically be free, except for possible access fees charged by local exchange carriers. Internet telephony service providers can thus potentially charge users far less for internet telephony calls than the users would pay for comparable calls placed strictly over the PSTN.
To transmit a real-time media signal over a packet switched network, the media signal is typically first sampled, divided into frames, and channel coded or compressed according to an established media coding standard. Each encoded frame of data is then inserted as payload into a packet, which is then labeled with one or more headers (often depending on various transmission protocols). The header usually identifies a packet sequence number, a source and destination network addresses for the packet, and a sender timestamp.
In general, a purpose of the sender timestamp is to record the time spacing between packets in a sequence. Therefore, the sender timestamp may identify any suitable time at the transmitting end, consistently for the packets in a sequence. For instance, without limitation, the sender timestamp may identify when the first sample of the payload in a packet was taken or when the packet was sent into the network.
In this regard, each packet of a real-time media sequence typically represents a successive time block of the underlying media signal. For instance, according to the G.723.1 standard, a 16 bit PCM representation of an original analog speech signal is partitioned into consecutive segments of 30 ms length, and each of these segments is encoded into a frame of 240 samples, represented by either 20 or 24 bytes (depending on a selected transmission rate). The time spacing between each of these frames is significant, as it serves in part to define the underlying signal. For example, under G.723.1, it is important to know that a sequence of four packets were transmitted at times t, t+30, t+60, and t+90. With this inter-packet time spacing information and sequence number information, a receiving device ideally should be able to reconstruct the packet sequence and decode and play out the underlying signal.
As a stream of real-time media packets is created, each packet is sent independently into the network and routed to the receiving end as identified by the destination address in the packet header. The packets may be sent back to back or with a holding time between packets. Ideally (excepting packet loss, for instance), each packet will then traverse the network and arrive at the destination end, to be decoded and played out to an end user.
As is well known in the art, the transmission of any data signal from one location to another over a telecommunications network is generally not instantaneous but rather involves some end-to-end (e.g., user to user) delay or latency. This end-to-end delay may depend on a number of factors, including, for instance, the available network bandwidth, the current network load, the distance between transmitter and receiver, the number of processing points (e.g., switches, routers and buffers) encountered prior to media play-out, and the processing time delay required at each processing point.
In the context of interactive real-time communications such as internet telephony, delay is particularly problematic, since participants to such communications expect the network connection to simulate immediate, in-person interaction, without delay. Provided with more than a maximum tolerable end-to-end delay (a matter of design choice), conversation participants may is be faced with the unsettling experience of having to wait some time after one person speaks before the other person hears what was spoken. Consequently, in most telecommunications networks carrying real-time media signals, there is a need to reduce or minimize the total end-to-end (e.g., user-to-user) transmission delay.
Further, because each packet in a stream representing a real-time media signal is routed independently, multiple packets in the stream may traverse the network from originating end to destination end by different routes. These routes may be of varying lengths and may include varying numbers of packet switches and routers that operate on the packets. Consequently, the various packets in a given stream may experience diverse levels of propagation delay (also known as xe2x80x9cdelay variancexe2x80x9d or xe2x80x9cjitterxe2x80x9d) and will thus typically arrive at the destination address with varying inter-packet time-spacing. This varying inter-arrival time spacing is especially disruptive to real-time media communications, as it can give rise to packet loss, which produces audible pops and clicks and other distortion. Therefore, there is a need to reduce the effect of jitter on real-time media transmissions.
In an effort to mask network-induced expansion and contraction of packet inter-arrival times, the packets that arrive at the destination end are typically received by a playout buffer or jitter buffer. The jitter buffer operates by holding packets for a period of time and then successively releasing them to be played out in sequential order, ideally with inter-packet time spacing corresponding to the inter-departure time spacing that was employed at the originating end.
The size of the jitter buffer bears on its effectiveness and involves a trade-off between increased end-to-end transmission delay and increased packet loss (and resultant distortion). A large jitter buffer can theoretically respond to large delay variances, as it can maintain packets for a longer period of time and thus release a packet for decoding with a high probability that the next packet in the sequence can be subsequently released with the appropriate inter-packet time spacing. However, by its very nature, the jitter buffer directly contributes to the total end-to-end transmission delay by holding packets before they are played out. Therefore, the larger the jitter buffer, the more the real-time media communication may be distorted from delay.
On the other hand, a small jitter buffer would be less likely to delay end-to-end real-time media transmission. Unfortunately, however, the length of buffer processing time bears an inverse relationship to the media frame loss: as buffer processing time decreases, media frame loss attributable to the buffering process increases. A small jitter buffer can also distort a real-time media transmission, since packets that arrive too late to be processed (e.g., after the previous packet has already been released for play out) may be deemed lost. Consequently, as quality of service is increased by a decrease in buffer contribution to end-to-end transmission delay, quality of service is simultaneously decreased by the increase in media frame loss caused by the buffer. Therefore, in designing a jitter buffer, a delicate balance exists between delay and packet loss.
Buffer processing algorithms may use estimates of network transmission delay when making the tradeoff decision between additional processing delay and additional media frame loss. In general, such delay estimates either are computed on a case-by-case basis from observed standard deviation of network delay or are pre-set based upon a selected standard value. For instance, in a given network, a transmission line element may periodically transmit a timestamped test packet to a remote element and arrange to have the remote element return the packet to the line element. The line element may then determine a round trip transmission delay for the test packet by comparing the initial timestamp with the time when the packet returns. In turn, the line element may estimate that the one-way transmission delay is half of this round-trip transmission delay. By repeating this process periodically, for instance, the line element may establish a statistical estimate (e.g., average, mean, etc.) of the one-way transmission delay in the network. Assuming a maximum tolerable end-to-end transmission delay (including the time a packet could be held in the jitter buffer before being played out), the line element may then set its jitter buffer size to be the difference between the maximum tolerable end-to-end transmission delay and the one-way network transmission delay.
Packet switched real-time transmission schemes can use one of these delay estimates. However, these estimates of network transmission delay are imperfect and can therefore give rise to inefficiency in the buffer tradeoff decision and a resulting decrease in quality of the media signal played out at the receiving end. For instance, it is possible that the transmission delay in one direction between two line elements may be much higher than the transmission delay in the other direction between the two elements. Therefore, half of the round trip delay between these elements is not necessarily representative of a one-way transmission delay between the elements. Consequently, using such estimates as a basis for setting jitter buffer size can result in oversized or undersized jitter buffers and can cause packet loss or excessive (e.g., unnecessary) delay and therefore limit the overall quality of real media service deliverable over switched-packet networks.
In view of the deficiencies in the existing art, a need therefore exists for an improved method of assessing, improving and managing real-time media transmission over switched-packet networks.
The present invention provides a method and apparatus for establishing, providing and/or facilitating improved buffering, billing and/or routing of real-time media signals.
According to one aspect of an exemplary embodiment, for instance, the receiving and transmitting ends for a real-time media transmission may maintain substantially synchronized time signals. These time signals may be substantially synchronized by any suitable mechanism such as by originating from a common clock. Provided with synchronized clock signals, it is possible to make a substantially accurate and appropriate measurements and adjustments in the transmission system. Exemplary measurements include network transmission delay and packet delay variance (i.e., jitter). Exemplary adjustments include changing the jitter buffer size at the receiving end, changing the fees charged for a given transmission, and changing (or selecting) the routing for a given real-time media signal.
Network transmission delay may be measured, for instance, by comparing packet departure time with packet arrival time (or other benchmarks or timestamps provided at the transmitting and receiving ends), as indicated by the synchronized time signals at the transmitting and receiving ends. Packet delay variance may be measured (approximated), for instance, by comparing the differences between network transmission delays for packets in a sequence. Statistically, the variance is then the square of the estimated standard deviation of a population of delay measurements. Of course, other modifications or estimates of variance may be used in addition or instead.
Jitter buffer size and jitter buffer operation may be dynamically altered in order to provide substantially the same inter-packet time spacing at the receiving end as existed or was established at the transmitting end. This may be done, for instance, by having the receiver delay play-out of successfully transmitted packets until the time signal at the receiving end indicates a time that is substantially a predetermined end-to-end delay period after a benchmark time for the packet at the transmitting end.
Since delay and/or jitter can bear directly on the quality of real-time media transmission, a service provider may adjust the fees that it charges for such transmission, based on a measure of transmission delay and/or jitter. These measurements are preferably but not necessarily based on substantially synchronized time signals at the transmitting and receiving ends. Further, the measurements may be taken generally or, preferably, with respect to the very signal(s) as to which the service provider may charge a fee. Based on these measurements, for instance, if a network is experiencing particularly high delay and/or jitter, the service provider may decrease its fee or otherwise alter its fee schedule.
Similarly, a transmitting end may use delay and/or jitter measurements as a basis for selecting a transmission path over which to route a given real-time media signal. Again, these measurements are preferably but not necessarily based on substantially synchronized time signals at the transmitting and receiving ends. For instance, a transmitting end may regularly monitor the delay and/or jitter for transmissions over a plurality of transmission paths that can be used to convey a real-time media signal to a receiving end. For transmission of a given signal, the transmitting end may then select the transmission path having the lowest delay and/or jitter. As a particular example, for instance, if an Internet telephony gateway determines that the Internet is particularly congested and may substantially delay the transmission of telephone signals, the gateway may opt to route a given telephone signal over the PSTN instead of over the Internet.