1. The Field of the Invention
This invention relates generally to data transfer over a network or mixed networks. More specifically, this invention relates to acceptable multimedia data streaming over one or more combined networks in the presence of reduced bandwidth and less reliable network paths.
2. Relevant Technology
The public switched telephone network (PSTN) is designed to carry voice traffic as inexpensively as possible. Until about the end of the 1970""s, the PSTN was an entirely analog communication system throughout the world. With the advent of digital computers, it became very desirable to provide a means through which computers could exchange digital information. Special signal transforming devices, called modems, were created, allowing digital devices to communicate over analog communication channels.
Since the end of the 1970""s, the core of the PSTN in the United States and other industrialized countries has been completely digital. Still, for cost reasons, most users have an analog connection to their telephone company""s digital central office. Since the bandwidth of this analog connection, again for cost reasons, is limited to about 3000 Hz, and since the signal-to-noise ratio is slightly over 30 dB, it follows that according to Shannon""s theory, the maximum speed at which information can be exchanged is slightly over 56,000 bits/second. This maximum speed is presently achieved. See International Telecommunications Union, Telecommunication Standardization Sector (ITU-T) Recommendation V.90, Geneva, Switzerland (1998). Modems able to communicate at speeds up to 56,000 b/s are widely available in the marketplace with the largest modem vendor being the assignee, 3Com Corporation.
Until the 1990""s, almost 99% of the traffic over the PSTN was voice traffic. The fact that the PSTN is poorly suited to carry data traffic was therefore of little concern since only small portion of the PSTN traffic was data. More recently, the Internet is causing an unprecedented data communication revolution. Currently, about 15% of the total PSTN traffic in the United States is data traffic. This figure is rapidly increasing and is expected to increase in the next several years to about 90%. The Internet continues to grow exponentially and the growth rate shows no sign of slowing. Additionally, the vast majority of present users are connected to the Internet using a V.34 (up to 33,600 bits/second) or V.90 (up to 56,000 b/s in the downstream direction) protocol.
Those familiar with communication network architectures appreciate that the PSTN is a circuit-switched network. A circuit-switched network is one in which the communicating entities are interconnected via a circuit or direct line dedicated interface. A circuit-switched network offers low bandwidth, but high reliability. The high reliability is largely due to the direct dedicated coupling of the communicating entities. Modems used in circuit-switched applications can communicate digital information at a low probability of bit error.
In contrast to the dedicated direct interface of a circuit-switched network, other topologies exist such as a packet-switched network. In a packet-switched network, rather than establishing a dedicated direct connection between the communicating entities, data information packets are addressed and delivered into the network. Routing entities within the network then examine the packet addressing associated with the data information packets and route the packets toward their destination. Additionally, while the PSTN was originally optimized to carry voice traffic, packet-switched networks such as the Internet are optimized to carry non real-time data traffic. Packet networks offer high bandwidth, but do not provide the necessary Quality of Service (QoS) for multimedia communications. Such a low QoS is primarily due to the fact that data information is delivered into a connectionless network where packets may be delivered late, out of order or even lost within the system, unlike in a circuit-switched environment where a direct connection is established between the communicating entities. In a packet-switched network bandwidth is measured in bits/second, as is common in computer networks.
A significant impediment to reliable transmission of multimedia over packet networks is packet delay, reordering, or loss. The most significant of these is packet loss, meaning the concept of bit error rate is meaningless in a packet-switched network. Packets may be lost for a variety of reasons, namely:
congestion of routers and gateways, which leads to a packet being discarded;
delays in packet transmission, which may cause a packet to arrive too late at the receiver to be played back in real-time;
heavy loading of the workstations, leading to scheduling difficulties in real-time multitasking operating systems.
To combat the realities of lost packets in a non-real-time system, there exist retransmission protocols such as TCP that facilitate recovery of lost packets. TCP operates by sending a positive acknowledgement only when a packet is received both in an expected sequence and within a designated time-out period. Furthermore, packets are often re-transmitted due to excessive delay, even though they may not be lost. Such unnecessary retransmission not only increases the overhead, but may be counter-productive in an attempt to maximize bandwidth. Multimedia, especially video, requires significant bandwidth. Unnecessary retransmission of packets can easily cause congestion resulting in exacerbated packet loss. It is widely recognized that TCP is not well-suited to real-time multimedia packet transfers especially those real-time xe2x80x9cstreamingxe2x80x9d types of transfers.
Several approaches for multimedia streaming from a server to a client have been attempted. According to one approach, an entire multimedia file is downloaded using the existing protocols (such as TCP) from the server to the client and then, at the client, the file is played back locally. The shortcomings of such an approach are apparent in that only relatively small multimedia files may be downloaded, otherwise the client has to wait for a long time before the start of playback.
In a second approach, multimedia information is streamed immediately to the client without any re-transmission being preformed to recover lost packets. Such an approach eliminates the delay associated in the first approach, however, quality suffers dramatically as a result of packet loss. In general, packet loss can be between 3% and in some extreme cases up to 25%. Any prior success of either of the aforementioned approaches has been shortlived and are not presently commercially viable. Therefore, other approaches continue to be actively investigated.
One alternate approach is described in the xe2x80x9cReal Time Video and Audio in the World Wide Webxe2x80x9d, by Z. Chen, S.-M. Tan, R. H. Campbell and Y. Li, published in the Fourth Int. WWW conference, 1995. That approach recognizes that not all packets have inherent equal value in a multimedia stream, for example, some packets are more important to an individual perceiving the multimedia data stream than others. In the above approach, a receiver detects which packets are lost and only requests re-transmission of the more important packets. In such an approach, the client also maintains control of the bit rate of the streaming based on the packet loss rate and any re-transmission requests. Such an approach is still not very efficient when the packet loss rate is high. It can lead to congestion and unacceptably low quality.
An additional alternative scheme is described in U.S. Pat. No. 5,768,527, assigned to Motorola Inc. of Schaumburg, Ill. While that patent takes into consideration the low bandwidth provided by dial-up modems as a result of the fundamental limitations of the PSTN, its fundamental disadvantage is that the QoS manager is situated at the client and is responsible for the QoS over both the packet network and the low-speed access link. Thus, from the client""s point of view, the overall system described in the Motorola patent is low-bandwidth and of low reliability. The result of such an implementation also impacts potential performance.
Yet another approach relates to forward error-correction (FEC), a concept known in the prior art for circuit-switched networks, and more recently for packet networks as well. This is a very promising technique for ensuring high-quality multimedia streaming. It incurs no additional retransmission delay and is, in principle, suitable for real-time operation. Two approaches for FEC are presently known.
The first approach for FEC is a xe2x80x9cjoint source and channel codingxe2x80x9d approach. In short, it is known from Shannon""s theory that source coding (performed at the data originating entity generally for the purpose of reducing the amount of data used to represent the original image/sound etc. and traditionally employs lossy and lossless compression techniques), and channel coding (performed to make the transfer characteristics of the channel more robust) should be preformed separately. However, it is now recognized in the engineering community, that this only holds under theoretical assumptions, which do not hold in many practical cases. In practice, it is possible and sometimes advantageous to design the source coding and the channel coding simultaneously, i.e., incorporating both source coding and channel coding into the data prior to transmission. In such an approach, a lossy, compressed version of the signal is interlaced over the current bit stream. For example, assume each packet k contains not only the multimedia data of frame k, but the compressed encoded data of an earlier frame k-l. If packet k-l is lost, it will not be recovered exactly, but a lower-quality version of the missing data may be recovered. This idea is illustrated in FIG. 1, where a series of data packets 10 are coded into two separate approaches.
In one approach depicting packets 12, low loss rates are assumed such that an immediately successive packet may be used to recreate a low resolution version of an immediately previous packet. In an approach showing packets 14, higher loss rates are assumed and low resolution versions of earlier packets are appended to packets that are not immediately successive.
Joint source and channel coding may be additionally promising if subband coding is employed. Subband coding involves the partitioning of the signal into specific bands each of which represents different characteristics of a signal such as frequency components. Additionally, various bands may be more essential to the intelligibility of the transferred information. Joint source and channel coding using subband coding does not require re-transmission and is very suitable for real-time multimedia streaming over packet networks. One disadvantage, however, is that such a technique is also not well suited for a low-speed connection, such as a circuit-switched PSTN, with the packet-switched network, such as the Internet. In particular, this technique is not well suited when the probability of a packet loss over the low-speed connection is tiny compared to the probability of a packet loss over the packet network. This, however, is exactly the case in practice. Since redundant information is transmitted over the low-speed, but highly reliable modem link, this technique of including redundant information wastes precious bandwidth over the highly reliable low-speed connection.
A second approach for FEC is to modify techniques such as parity checks, or even Reed-Solomon coding and apply them to packets. Such techniques detect error conditions and recover through reconstruction of the erroneous portions of the packet. In such a case, a lost packet can be recovered exactly at the receiver by performing the corresponding decoding operation. The disadvantage of this technique is that it requires redundant packets to be transmitted over the packet network. In the case of one or few multimedia streams being transmitted, such a technique may achieve the highest quality of all techniques. However, in the case of many multimedia streams transmitting redundant packets, congestion is obtained which results in increased packet loss and lower quality. Furthermore this technique has inherent implementation disadvantages as packet-based decoding may require a huge buffer to ensure real-time operation.
Such an FEC technique is separate from the source coding, i.e., the method falls in the class of separate or disjoint source and channel coding. When the client has a low-bandwidth connection to the packet network that technique can also result in reduced quality, because precious bandwidth is wasted to transmit redundant packets. To remedy this, the concept of spatially disjoint source and channel coding was proposed by G. Schuster, see for example the paper from the IEEE Int. Conf. on Image Processing ""98 xe2x80x9cSpatially Disjoint Source Channel Coding: Taking Advantage of the Current Dial-up Architecture for Video Over the Internet: by G. Schuster, I. Sidhu, and M. Borella.
Thus, there continues to exist a need for an efficient approach to ensure high-quality multimedia streaming over high-bandwidth packet networks accessed via the highly reliable low-speed modem links.
It is an object of the present invention to provide a system and method for improving multimedia streaming of data from a source server to a client.
It is an additional object of the present invention to provide a method and system for streaming data from a server to a client in an improved reliable manner by employing reliability enhancements such as coding at networks portions that can directly benefit without unduly burdening network portions that are adequately inherently reliable.
The present invention employs spatially disjoint source and channel coding in an attempt to preserve precious bandwidth on the low-speed link for source coding. The present invention also employs channel coding, however, no redundant packets are transmitted. While a lost packet cannot be recovered exactly, a lower-quality version of the lost packet can be recovered if the packet is important. In the present invention, the joint source and channel coding are performed at spatially disjoint places.
The present invention provides and improved approach to streaming multimedia data from a server or source to a client while minimizing the redundant information transferred over the various networks. In the preferred embodiment, a server either having therein or having access to a multimedia data stream performs source coding on the real-time multimedia data stream. The source coding is performed to reduce the overall amount of data that must be transferred to the client, who eventually performs the source decoding. The server prior to transmitting the source coded data also performs channel coding on the source coded data. In the preferred and most widely anticipated network topology, the server is operably coupled to a packet-switched network such as the ubiquitous Internet. It is also know that a packet-switched network is a very lossy network in the sense that data packets may be lost or delayed. Therefore, the present invention also employs channel coding to aid in the reconstruction of any lost packets that are lost in the lesser reliable packet-switched network.
Since the real-time multimedia data is ultimately destined for a client that accesses the packet-switched network via a direct dialup network, the packet-switched network must interface with a gateway that is coupled to the circuit-switched network. A remote access gateway or concentrator (RAC) provides the transition between packet-switched and circuit-switched. The RAC is comprised of a packet processor that evaluates the received packets to determine if they are all present, in order, etc., and attempts to recreate the multimedia data stream. In an effort to recreate the data stream, the packet processor performs channel decoding. Multimedia data is comprised of data that was either entirely present and timely received over the packet network at the RAC, a low quality reconstruction of any missing packets from the redundant information provided in the channel coding process, or alternatively, when redundant information is not present because it too was lost in the packet network or when a lesser important packet was lost and no redundant information was ever channel coded, then an error concealment process is invoked to bridge the unavailable multimedia data.
In the present invention, it is appreciated that for many applications and in particular to high bandwidth requirement applications such as multimedia streaming, the bandwidth as present over the traditionally lower bandwidth but highly reliable circuit-switched link is very precious and should not be squandered through the transmission of unnecessary channel coding redundancy data. The method and system of the present invention utilizes bandwidth much better i.e., no redundant packets are sent over the low-speed modem link. By moving the channel decoding away from the client this technique frees up bits on the low-speed, but highly reliable modem link for the source coding which in turn results in a compressed video of higher quality.
Disjoint source and channel coding is efficient because packet error rates (packet loss) is not an issue for modems. The reliability of a modem connection is described in terms of bit error rates. Modem connections have typically bit error rates of about 10xe2x88x926. If a packet contains N bits and if we assume that a single bit error renders the packet useless, then clearly the probability for a packet loss over the modem link is P=1xe2x88x92(1xe2x88x9210xe2x88x926)N. As an example if N=1000, we can determine that P=0.1%. This is at least an order of magnitude smaller than the probability of a packet loss over the Internet.
These and other objects and features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.