Interactive television services provide a television viewer the ability to interact with their television. Such services have been used, for example, to provide navigable menuing systems and ordering systems that are used to implement electronic program guides and on-demand and pay-per-view program reservations without the need to call a television provider. These services typically employ an application that is executed on a server located remotely from the viewer. Such servers may be, for example, located at a cable television headend. The output of the application is streamed to the viewer, typically in the form of an audiovisual MPEG Transport Stream. This allows the stream to be displayed on virtually any client device that has MPEG decoding capabilities, including a television set top box. The client device allows the user to interact with the remote application by capturing keystrokes and passing these back to the application.
The client and the server are, in cable deployments, separated by a managed digital cable-TV network that uses well-known protocols such as ATSC or DVB-C. Here, ‘managed’ means that any bandwidth resources required to provide these services may be reserved prior to use. Once resources are allocated, the bandwidth is guaranteed to be available, and the viewer is assured of receiving a high-quality interactive application experience.
In recent years, audio-visual consumer electronics devices increasingly support a Local Area Network (LAN) connection, giving rise to a new class of client devices: so-called “Broadband Connected Devices”, or BCDs. These devices may be used in systems other than the traditional cable television space, such as on the Internet. For example, consider FIG. 1, in which a client device 110 (such as a Blu-ray player) implements a client application 112 to deliver audiovisual applications streamed over a public data network 120 from an audiovisual application streaming server 130 to a television 140. A user may employ a remote control 142 in conjunction with the client device 110 to transmit interactive commands back to the application streaming server 130, thereby controlling the content interactively.
However, because public data networks are not managed in the same way that private cable systems are, challenges arise. The transport protocols that are commonly used on the open Internet (such as TCP or RTSP) do not support bandwidth reservation. Since bandwidth cannot be guaranteed, the application server is not assured that the network connection can deliver the requested bandwidth. The actual throughput of an Internet connection can vary from second to second depending on many factors, including: network congestion anywhere between the application server and the client device; high-throughput downloads or uploads sharing the same physical internet connection as the client device (e.g. an ADSL line); mechanisms at lower (data link) layers that introduce delay, for example Adaptive Retransmission (ARQ) mechanisms in (wireless) access protocols; lost packets at any link between the client and the server; Transmission Control Protocol (TCP) state and more specifically TCP congestion window size; and reordering of packets caused by any link between the client and the server. To the server streaming the data to the client device, these factors all manifest themselves as fluctuations in actual achieved throughput. Small fluctuations can be addressed by using sufficient buffering, however buffering causes larger end-to-end delays (the time between the moment a user pressing a remote control button, and the moment that the screen update as a result of the key press has been rendered on the user's screen). Delays as short as five seconds may result in an unpleasant viewer experience in some applications such as an electronic program guide, while delays of even one-half second may be extremely noticeable in high-performance gaming applications. Further, the use of such buffering cannot compensate for large fluctuations in throughput.
FIGS. 2-4 illustrate an example of the type of end-to-end playback latency in a typical network system, such as that of FIG. 1, during a transient network outage. There are three sources of playback latency: a server buffer that represents a source of pre-transmission latency; a network buffer that represents transmission latency in the public data network; and a client buffer that represents post-transmission latency in the client device before the audiovisual data are shown. Because of these sources of latency, at a time T1, as the application streaming server 130 generates data for display, the client device 110 is displaying data generated at an earlier time T0. The data that have been generated but not yet viewed are distributed in the three buffers awaiting display. The data themselves are visually represented and discussed in terms of video frames for ease of understanding.
FIG. 2 shows the system operating normally at time T1, just before a network outage occurs between the public data network 120 and the client device 110. FIG. 3 shows the system at a time T2 that is 200 ms later, at the end of the network outage. FIG. 4 shows the system at a time T3 that is another 30 ms later (that is, 230 ms after the start of the outage), after the network has had a chance to transmit some of its buffered data to the client device. These figures are now described in more detail.
More particularly, FIG. 2 shows a server buffer, a network buffer, and a client buffer at a time T1. This network is operating in equilibrium: on average, application server 130 generates one frame of video data in the length of time that each frame of video is displayed on the client device 110 (typically, 1/30 of a second). There are 180 ms of buffered playout data in this Figure: 50 ms in the server buffer, 80 ms in the network buffer, and 50 ms in the client buffer. To be even more specific, the 50 ms of data in the server buffer represent data generated in the 50 ms prior to time T1. Thus, the server buffer contains data spanning the playback range (T1−50 ms, T1), and the first frame of data in the server buffer was generated at time T1−50 ms, as indicated. The data in the network buffer were generated over the 80 ms prior, and therefore span the playback range (T1−130 ms, T1−50 ms). The data in the client buffer were generated over the 50 ms prior, and span the playback range (T1−180 ms, T1−130 ms). Therefore, the display device 140 is playing out the video frame for T0=T1−180 ms from the top of the client buffer. Assuming that the system continues operating with these latencies, and assuming that the application server can generate a frame instantly in response to user input, a keystroke entered using remote control 142 at time T1 will cause a visible reaction on the display device 140 at time T1+180 ms. That is, the keystroke will have a visible effect as soon as the buffered frames have emptied out of the three buffers onto the display, and the new frame can be displayed. Thus, the system as shown includes a response time of just under two tenths of a second. This delay is barely noticeable for an electronic program guide application.
Continuing the example, suppose a network outage between the network buffer and the client occurs immediately after the time T1, and lasts for 200 ms. At this point, the buffers may appear as in FIG. 3. Here, the client has drained its 50 ms of data, and playout is paused at T1−130 ms. It has been paused there for 150 ms (i.e., the amount of time that has elapsed for which it has not received any data). Meanwhile, the server has generated 200 ms of additional audiovisual data for playback. Based on the particular bandwidths available in the network during the outage, only 110 ms of playback have been sent to the network. Thus, the network buffer has 190 ms of stored data: 110 ms of new data, plus the 80 ms that it had at the beginning of the outage. No data have been sent from the network buffer to the client buffer, so the network buffer has data for 190 ms of playback in the range (T1−130 ms, T1+60 ms). In FIG. 2 the server buffer had data that began at T1−50 ms. In the intervening 200 ms, 110 ms of data have passed through the buffer and 90 ms of data have accumulated there. These 90 ms are in addition to the 50 ms already there, so the server buffer now has 140 ms of playback data. These data span the range (T1+60 ms, T1+200 ms).
The 200 ms of playback generated by the server during the outage have been buffered in the network. The 200 ms of data are split between the server buffer (90 ms of increase) and the network buffer (110 ms of increase). The total non-client buffering has increased from only 130 ms (about ⅛ of a second) to 330 ms (about ⅓ of a second).
Thus, after the outage has been resolved, an additional 200 ms of data will be buffered in the system. This can be seen in FIG. 4, which corresponds to the state of the system 30 ms after the outage. In these 30 ms, the network provided enough bandwidth to the client to transfer 50 ms of playback data, which are seen in the client buffer. These data span the range (T1−130 ms, T1−80 ms). The client has just received enough data to safely resume playback, so playback is resumed at T1−130 ms. Looking at the network buffer, 50 ms of playout data have been sent to the client, but 50 ms of playout data have been received from the server, so the network buffer still has 190 ms of data, now spanning the range (T1−80 ms, T1+110 ms). Meanwhile, the server has generated an additional 30 ms of data and transmitted 50 ms of data to the network, so the server buffer has 120 ms of data spanning the range (T1+110 ms, T1+230 ms).
From these figures it is clear that a buffer underrun at the client can lead to playback latency buildup. The system of FIG. 2 had 130 ms of end-to-end delay, but by the end of FIG. 4 when playback resumed, an additional 230 ms of delay had been introduced into the system. Thus, in FIG. 4, there are 130 ms+230 ms=360 ms of total latency in the system, distributed between the three buffers. This playback latency buildup occurs for each client buffer underrun, and such buildups are cumulative. This is a highly undesirable situation for interactive applications.
The prior art does not adequately solve this problem. The client cannot simply skip individual frames because typical encoding schemes, such as MPEG, may encode each frame based on the data contained in previous and subsequent frames. The client could skip to its next intracoded frame, but these frames may be infrequent, and in any event such a strategy might be jarring for the viewer watching the stream. The server cannot pause frame generation, since it has no indication of the playout problems at the client. A new approach is therefore needed.