1. Technical Field
The disclosed invention relates to compressed digital video delivery systems such as cable television (CATV), satellite television, Internet protocol television (IPTV) and Internet based video distribution systems. In particular, it relates to the use of a low-delay and layered codec and the corresponding low-delay transport, typically used for videoconferencing systems, in connection with digital video delivery systems to enable fast browsing of video content of multiple TV channels or video files while simultaneously watching one or more selected channels or video files. It is also concerned with the technology used in the endpoints of a digital video delivery system, such as a set-top-box or game console.
2. Background Art
Subject matter related to the present application can be found in co-pending U.S. patent application Ser. No. 12/015,956, filed and entitled “System And Method For Scalable And Low-Delay Videoconferencing Using Scalable Video Coding,” U.S. patent application Ser. No. 11/608,776, filed and entitled “Systems And Methods For Error Resilience And Random Access In Video Communication Systems,” and U.S. patent application Ser. No. 11/682,263, filed and entitled “System And Method For Providing Error Resilience, Random Access And Rate Control In Scalable Video Communications,” and U.S. Pat. No. 7,593,032, filed and entitled “System And Method For A Conference Server Architecture For Low Delay And Distributed Conferencing Applications,” each of which is hereby incorporated by reference herein in their entireties.
Traditionally, TV programs are carried over CATV networks. CATV is one of the most popular broadband digital cable networks in Europe, Australia, America, and Asia. With a CATV system, many video channels are multiplexed on a single cable medium with very high bandwidth and distributed through dispersed cable head-end offices that serve a geographical area. The cable head-end of the CATV infrastructure simultaneously carries the digitized and encoded video of each and every channel, regardless of whether the user watches a channel or not.
Recently, IPTV, which transmits TV programs over packet networks, has gained significant momentum due to advantages in delivering new services with ease. One of the drawbacks of IPTV is the relatively narrow bandwidth of the user's access line. For example, a user's access line may be a telephone line employing asymmetric digital subscriber line (ADSL) or similar technologies, which have limited bandwidth available to deliver high quality video content. Sending such a large number of programs at the same time is not practical in an IPTV system due to the aforementioned lack of bandwidth. Furthermore, given the vast amount of video material available over the public Internet, it is practically impossible to deliver all video content of interest to the user simultaneously. In addition, IPTV may rely on public Internet or a private IP network, which may have notable transport delays. In addition, while the CATV infrastructure is designed for broadcast TV systems, video on demand (VoD) and pay per view (PPV) services, which require a unicast transmission to a user's TV for “personalized TV” services, are ideally fit for IPTV.
Endpoints optimized for video conferencing have been disclosed, amongst other things, in co-pending U.S. patent application Ser. No. 12/015,956, incorporated herein by reference. IPTV endpoints share many commonalities with video conferencing endpoints relevant to this invention.
An IPTV endpoint comprises a set of devices and/or software that are located in the user's premises. One typical implementation of IPTV endpoint comprises a network interface (for example a DSL modem, a cable modem, an ISDN T1 interface) connected to the Internet, a set-top-box device that connects via a local area network (for example Ethernet) to the network interface, and a TV monitor. The set-top-box translates the data received from the Internet into a signal format the TV understands; traditionally, a combination of analog audio and video signals are used, but recently also all digital interfaces (such as HDMI) have become common. The set-top-box (on the TV side), therefore typically comprises analog or digital audio/video outputs and interfaces.
Internally, set-top-boxes have a hardware architecture similar to general purpose computers: A central processing unit (CPU) executes instructions stored in Random Access Memory (RAM) or read-only-memory (ROM), and utilizes interface hardware to connect to the network interface and to the audio/video output interface, as well as an interface to a form of user control (e.g., a TV remote control, computer mouse, keyboard, or other similar user input device), all under the control of the CPU. Most set-top-boxes also comprise accelerator units (for example dedicated Digital Signal Processors, DSP) that help the CPU with the computationally complex tasks of video decoding and video processing. Those units are typically present for reasons of cost efficiency, rather than for technical necessity.
General purpose computers, such as personal computers (PCs), can often be configured to act like a set-top-box. In some cases, additional hardware needs to be added to the general purpose computer to provide the interfaces that a typical set-top-box contains, and/or additional accelerator hardware must be added to augment the CPU for video decoding and processing.
The operating system controlling the set-top-box typically offers services that can be used for the present invention, for example, receivers and transmitters according to certain protocols. The protocols of most interest here are those for the transmission of real-time application data: Internet Protocol (IP), User Datagram Protocol (UDP) and/or Transmission Control Protocol (TCP), and Real-time Transport Protocol (RTP). RTP receivers and transmitters are also commonly implemented in the application, rather than in the operating system. Most operating systems support the parallel or quasi-parallel use of more than one protocol receiver and/or transmitter.
The term codec is equally used for the (description of) techniques for encoding and decoding and for implementations of these techniques. A (media) encoder converts input media data into a bitstream or a packet stream, and a (media) decoder converts an input bitstream or packet stream into a media representation suitable for presentation to a user, for example digital or analog video ready for presentation through a monitor, or digital or analog audio ready for presentation through loudspeakers. Encoders and decoders can be dedicated hardware devices or building blocks of a software-based implementation running on a general purpose CPU.
It is possible to build set-top-boxes such that many encoders or decoders run in parallel or quasi-parallel. For hardware encoders or decoders, one easy way to support multiple encoders/decoders is to integrate multiple instances in the set-top-box. For software implementations, similar mechanisms can be employed. For example, in a multi-process operating system, multiple instances of encoder/decoder code can be run quasi-simultaneously.
The basic approach to program navigation, i.e., successive channel skipping or “channel surfing,” was suitable in the early days of broadcast TV systems, where there were only a few channels. As the number of broadcasting channels increased to many hundreds, successive channel skipping has become more cumbersome and time consuming. Although several proposed solutions, such as text based electronic program guides, have been offered to alleviate this problem, they are not substitutes for the easy-to-use channel surfing experience of the older systems.
Increases in channel-change times have also made channel surfing more difficult. Digital video codecs, alternatively known as digital video coding/decoding techniques (e.g., MPEG-2, H-series codecs such as H.263 and H.264), and packet network delivery, have increased channel-change times primarily for the following two reasons:
(1) Transport Delays: These delays result from buffering by the decoder at the receiving end, which is necessary to alleviate the effects of: (a) bandwidth changes in the transport network (such as variable link bandwidths experienced in wireless networks); (b) delay jitter caused by varying queuing delays in transport network switches; and/or (c) packet loss in the network.
(2) Encoding Delays: To display a video, the decoder at the receiver, alternatively known as the receiver/receiving end or receiver/receiving application, must receive an 1-frame, alternatively known as an intra-coded frame, from the encoder before a video can be decoded. The time distance between I-frames in an encoder is fixed (for example, 0.5 sec or more) to reduce the required coding bandwidth. Therefore, when a user changes a channel, it can take as long as 0.5 seconds or more before the receiver can decode the video. Furthermore, the encoders used in TV systems use “future frames” as well as “previous frames” as references to efficiently compress the current frame. As such, the decoder must wait for both the I-frame and the future reference frames to arrive so that the frames are generated in the correct sequence, causing inherent delays in the instant display of the video.
While CATV and satellite TV systems suffer only from encoding delays, IPTV and other packet network-based video distribution systems also suffer from transport delays, which can involve a significantly longer delay. In the evolving IPTV environment, the channel-change time has become significantly longer, particularly when video channels are delivered over a best effort network such as the public Internet, where the network conditions are completely unpredictable.
In order to improve the channel surfing experience, significant changes are needed. In particular, an encoder is needed that: (a) generates a synchronization frame (i.e., I-frame of the prior systems) without a fixed time delay; (b) employs a small number of future frames to minimize algorithmic delay; and (c) compensates for possible packet loss or insurmountable delay, rather than relying on receiving end buffering as the sole mechanism for error resilience. Because transport delays can cause significant impact to channel-change time, generic video teleconferencing codec cannot completely eliminate the delay problems.
Traditional video codecs, for example H.261 and H.263 (used for person-to-person communication purposes such as videoconferencing) or MPEG-1 and MPEG-2 Main Profile (used in Video CDs and DVDs, respectively), are designed with single layer coding, which provides a single bitstream at a given bitrate. Some video codecs are designed without rate control, thus resulting in a variable bitrate stream (e.g., MPEG-2). However, video codecs used for communication purposes (e.g., H-series codecs) establish a target operating bitrate depending on the specific infrastructure. These video codec designs assume that the network is able to provide a constant bitrate due to a practically error-free channel between the sender and the receiver. The H-series codecs offer some additional features to increase robustness in the presence of channel errors but are still only tolerant to a small percentage of packet losses.
A limitation of single layer coding exists where a lower spatial resolution is required, such as for a smaller frame size. The full resolution signal must be sent and decoded at the receiving end, thus wasting bandwidth and computational resources. However, support for lower resolutions is essential in a channel surfing application displaying several channels simultaneously, as one goal is to fit as many channels displayed in mini browsing windows (MBWs) as possible into a specific screen area, and the MBWs are naturally of lower resolution than the main video program.
Layered codecs, alternatively known as layered coding or scalable codecs/coding, are media (for example, video) compression techniques that has been developed explicitly for heterogeneous environments. In such codecs, two or more layers are generated for a given source video signal: a base layer and at least one enhancement layer. The base layer offers a basic representation of the source signal at a reduced quality, which can be achieved, for example, by reducing the Signal-to-Noise Ratio (SNR) through coarse quantization, using a reduced spatial and/or temporal resolution, or a combination of these techniques. The base layer can be transmitted using a reliable channel, i.e., a channel with guaranteed or enhanced quality of service (QoS). Each enhancement layer increases the quality by increasing the SNR, spatial resolution, or temporal resolution, and can be transmitted with reduced or no QoS. In effect, a user is guaranteed to receive a signal with at least a minimum level of quality of the base layer signal.
Accordingly, there exists a need in the art for techniques for transmitting audio-visual signals using low-delay and layered codec and the corresponding low-delay transport to enable customized display to enable fast channel surfing.