1. Technical Field
The disclosed invention relates to compressed digital video distribution systems such as cable television (CATV), satellite television, Internet protocol television (IPTV) and Internet-based video distribution systems. In particular, it relates to digital video distribution systems to enable fast browsing of video content of multiple TV channels or video files while simultaneously watching one or more selected TV channels or video files. It is also concerned with the technology used in the endpoints of a digital video distribution system, such as a set-top-box or game console.
2. Background Art
Subject matter related to the present application can be found in co-pending U.S. patent application Ser. Nos. 12/015,956, filed Jan. 17, 2008 and entitled “System And Method For Scalable And Low-Delay Videoconferencing Using Scalable Video Coding,” 11/608,776, filed Dec. 8, 2006 and entitled “Systems And Methods For Error Resilience And Random Access In Video Communication Systems,” and 11/682,263, filed Mar. 5, 2007 and entitled “System And Method For Providing Error Resilience, Random Access And Rate Control In Scalable Video Communications,” and U.S. Pat. No. 7,593,032, filed Jan. 17, 2008 and entitled “System And Method For A Conference Server Architecture For Low Delay And Distributed Conferencing Applications,” each of which is hereby incorporated by reference herein in their entireties.
Traditionally, TV programs are often carried over CATV networks. CATV is one of the most popular broadband digital cable networks in Europe, Australia, America, and Asia. With a CATV system, many video channels are multiplexed on a single cable medium with very high bandwidth and distributed through dispersed cable head-end offices that serve a geographical area. The cable head-end of the CATV infrastructure simultaneously carries the digitized and encoded video of each and every channel, regardless of whether the user watches a channel or not.
Recently, IPTV, which transmits TV programs over packet networks, has gained significant momentum due to its advantage in delivering new services with ease. One of the drawbacks of IPTV is the relatively narrow bandwidth of the user's access line. For example, a user's access line may be a telephone line employing asymmetric digital subscriber line (ADSL) or similar technologies, which have limited bandwidth available to deliver high quality video content. Sending a large number of programs at the same time is not practical in an IPTV system due to the aforementioned lack of bandwidth. Furthermore, given the vast amount of video material available over the Internet, it is practically impossible to deliver all video content of interest to the user simultaneously. In addition, IPTV may rely on public Internet or a private IP network, which may have notable transport delays. In addition, while the CATV infrastructure is designed for broadcast TV systems, video on demand (VoD) and pay per view (PPV) services, which require a unicast transmission to a user's TV for “personalized TV” services, are ideally fit for IPTV.
Endpoints designed for video conferencing have been disclosed, amongst other things, in co-pending U.S. patent application Ser. No. 12/015,956, incorporated herein by reference. Video distribution, e.g., IPTV, endpoints share many commonalities with video conferencing endpoints relevant to this invention.
Referring to FIG. 1, a typical endpoint (101) includes a set of devices and/or software that is located at the user's premises. One typical endpoint includes a network interface (102) (for example, a DSL modem, a cable modem, or an ISDN T1 interface) connected to a network (103) (for example, the Internet or another private or public IP network), a computer (104) (for example, a set-top box, game console, personal computer or another type of computer) that connects via a local area network (105) (for example, Ethernet) to the network interface (102), a video display (106) (for example, a TV or computer monitor), and an audio output (for example, a set of loudspeakers). The set-top-box translates the data received from the Internet into a signal format the TV understands; traditionally, a combination of analog audio and video signals are used, but recently all digital interfaces (such as HDMI) have become common. The set-top-box therefore typically includes analog or digital audio/video outputs and interfaces. Both TV monitor and set-top-box device are typically controlled by an input device (107), alternatively known as a pointing device (for example, a remote control, computer mouse, keyboard, or another input device). However, most prior art set-top-boxes lack media input devices, such as camera or microphone, that are common to videoconference endpoints.
As depicted in FIG. 2, a set-top-box (200) has a hardware architecture similar to a general purpose computer: a central processing unit (CPU) (201) executes instructions stored in Random Access Memory (RAM) (202) and/or read-only-memory (ROM) (203), and utilizes interface hardware to connect to the network interface (204), the audio/video output interface (205), and the user interface (206) (which is connected to a user input device (207), for example, a remote control). All these components are under the control of the CPU. A typical set-top-box also includes an accelerator unit (208) (for example, a dedicated Digital Signal Processor (DSP)) that helps the CPU (201) with computationally complex tasks, such as video decoding and video processing. An accelerator unit (208) is typically present for reasons of cost efficiency, rather than for technical necessity. That is, a much faster CPU can often substitute for accelerator or DSP, but those much faster CPUs (and their required infrastructure such as power supplies and faster memory) may be more expensive than dedicated accelerator units.
General purpose computers, such as Personal Computers (PCs), can often be configured to act like a set-top-box. In some cases, additional hardware can be added to the general purpose computer to provide the interfaces a typical set-top-box contains, and/or additional accelerator hardware can be added to augment the CPU for video decoding and processing.
The operating system controlling the set-top-box typically offers services that can be used (for example, receivers and transmitters according to certain protocols). The protocols of most interest here are those for the transmission of real-time application data: Internet Protocol (IP), User Datagram Protocol (UDP) and/or Transmission Control Protocol (TCP), and Real-time Transport Protocol (RTP). RTP receivers and transmitters can also be implemented in the application, rather than in the operating system. Most operating systems support the parallel or quasi-parallel use of more than one protocol receiver and/or transmitter.
The term “codec” is equally used to describe techniques for encoding and decoding and for implementations of these techniques. A (media) encoder converts input media data into a bitstream or a packet stream, and a (media) decoder converts an input bitstream or packet stream into a media representation suitable for presentation to a user (for example, digital or analog video for presentation on a video display, or digital or analog audio for presentation through loudspeakers. Encoders and decoders can be dedicated hardware devices or building blocks of a software-based implementation running on a general purpose CPU and/or an associated accelerator unit.
Set-top-boxes can be constructed such that many encoders or decoders run in parallel or quasi-parallel. For hardware encoders or decoders, one easy way to support multiple encoders/decoders is to integrate multiple instances in the set-top-box. For software implementations, similar mechanisms can be employed. For example, in a multi-process operating system, multiple instances of encoder/decoder code can be run quasi-simultaneously.
A basic approach to program navigation, i.e., successive channel skipping or “channel surfing,” was suitable in the early days of broadcast TV systems, where there were only a few channels. As the number of broadcasting channels increased to many hundreds, successive channel skipping has become more cumbersome and time consuming. Although several proposed solutions, such as text-based electronic program guides, have been offered to alleviate this problem, they are not substitutes for the easy-to-use channel surfing experience of the older systems.
Increases in channel-change times have made channel surfing more difficult. Digital video codecs, alternatively known as digital video coding/decoding techniques (e.g., MPEG-2, H-series codecs such as H.263 and H.264), in conjunction with packet network delivery, have increased channel-change times to several hundred milliseconds or even seconds in many cases, for at least the following two reasons:
(1) Transport Delays: These delays result from buffering by the decoder at the receiving end, i.e., the endpoint, which is necessary to alleviate the effects of: (a) bandwidth changes in the transport network (such as variable link bandwidths experienced in wireless networks); (b) delay jitter caused by varying queuing delays in transport network switches; and/or (c) packet loss in the network.
(2) Encoding Delays: To display a video, the decoder at the endpoint, alternatively known as the receiver, receiver/receiving end, or receiver/receiving application, must receive an I-frame, alternatively known as an intra-coded frame, from the encoder before a video can be decoded. The temporal interval between I-frames in an encoder is in most prior art systems fixed (for example, 0.25 sec or more in most CATV systems) to reduce the required coding bandwidth. Therefore, when a user changes a channel, it can take as long as 0.5 seconds or more before the receiver can decode the video. Furthermore, it is well known that increasing the interval between I frames improves the coding efficiency. As a result, many IPTV service providers trade channel change times for better picture quality, with the result that channel change times of several seconds are not uncommon in deployed IPTV systems.
While CATV and satellite TV systems suffer only from encoding delays, IPTV and other packet network-based video distribution systems also suffer from transport delays, which can involve a significantly longer delay. In the evolving IPTV environment, the channel change time has become significantly longer, particularly when video channels are delivered over a best effort network such as the public Internet, where the network conditions are completely unpredictable.
In order to improve the channel surfing experience, significant changes are needed. In particular, an encoder is needed that: (a) generates a synchronization frame (i.e., the I-frame of the prior systems) only when needed (that is, not necessarily in a fixed time interval); (b) employs no or only a small number of future frames to minimize algorithmic delay; and (c) compensates for possible packet loss or insurmountable delay, rather than relying on receiving end buffering and error mitigation as the sole mechanism for error resilience. Because transport delays can cause significant impact to channel-change time, even a generic video teleconferencing codec (which normally implements all aforementioned features) cannot completely eliminate the delay problems.
Traditional video codecs, for example H.261 and H.263 (used for person-to-person communication purposes such as videoconferencing) or MPEG-1 and MPEG-2 Main Profile (used in Video CDs and DVDs, respectively), are designed with single layer coding, which provides a single bitstream. Depending on the application, that bitrate can be either fixed, or variable and dictated by the media content. That is, the more complex a scene becomes, the higher a bitrate is generated.
A limitation of single layer coding exists where, in the final rendering on the screen, a lower spatial resolution is required compared to the one typically utilized for full-screen video reproduction (such as in TV). The full resolution signal must be sent and decoded at the receiving end, but the spatial resolution needs to be reduced to fit the low required spatial resolution, thus wasting both bandwidth and computational resources. However, support for lower resolutions is essential in a channel surfing application displaying several channels simultaneously, as one goal is to fit as many channels displayed in mini browsing windows (MBWs) as possible into a specific screen area—which results in the MBWs being naturally of lower resolution than the main video program.
Layered video codecs, alternatively known as layered or scalable codecs/coding, are video compression techniques that have been developed explicitly for heterogeneous environments. In such codecs, two or more layers are generated for a given source video signal: a base layer and at least one enhancement layer. The base layer offers a basic representation of the source signal at a reduced quality, which can be achieved, for example, by reducing the Signal-to-Noise Ratio (SNR) through coarse quantization, using a reduced spatial and/or temporal resolution, or a combination of these techniques. The base layer can advantageously be transmitted using a reliable channel, i.e., a channel with guaranteed or enhanced quality of service (QoS). Each enhancement layer increases the quality by increasing the SNR, spatial resolution, or temporal resolution, and can often be transmitted with reduced QoS or best effort. In effect, a user is guaranteed to receive a signal with at least a minimum level of quality of the base layer signal.