With the evolution of packet-based communications and the Internet, virtually all types of media may be delivered to end users in real-time over the Internet. The most recent trend is to deliver broadcast television services to end users over the Internet in an effort to compete with over-the-air (OAR), cable, and satellite television services. Providing television services that have numerous channels over the Internet is generally referred to as Internet Protocol Television (IPTV). As with other types of services, such as telephony services that are delivered over the Internet, end users expect the quality and user experience to emulate that of legacy systems. With OAR, cable, and satellite television services, end users may change from one television channel to another with little, if not imperceptible, delay. One of the major issues with providing television services over the Internet is the delay incurred when the end user changes from one channel to another. If the time interval between an end user pushing a button to change channels and the television content for the new channel appearing on the screen for IPTV services is much greater than that for legacy systems, the end user will likely not be satisfied with the IPTV services. There are many sources of channel change delay in IPTV systems.
One significant source of delay is the manner in which the media content is encoded in IPTV systems that employ MPEG (Motion Pictures Expert Group) encoding, such as MPEG-2 or MPEG-4, for audio and video content. Video content is generally broken into a sequence of still frames, which are displayed in rapid succession to impart the illusion of motion without artifacts. There are three types of frames used in MPEG compression: I frames (intra-coded frames), P frames (predicative frames), and B frames (bi-directional frames). I frames are compressed without depending on any other frames. In other words, the I frames are compressed using just the information in the frame itself in essentially the same manner in which still images are compressed. Encoding a frame without depending on another frame is referred to as intra-coding. For video, there are generally two or more I frames during each second of video.
P frames and B frames are encoded based on information from other frames, and are thus inter-coded. In particular, P frames are encoded using forward prediction, wherein each P frame depends at least in part on a previous frame in the frame sequence. B frames are encoded using both forward and backward prediction, wherein each B frame depends at least in part on both a previous frame and a future frame in the frame sequence. B frames may also be encoded such that they depend only on future frames. The use of forward and backward prediction increases compression rates because it is only necessary to record those changes in the video content from one frame to the next.
MPEG defines the concept of a Group of Pictures (GOP). A GOP always starts with an I frame followed by zero or more B frames and zero or more P frames. The MPEG standard appears to allow more than one I frame in a GOP. A GOP is generally defined as a group of frames starting with an I frame followed by zero or more B frames and zero or more P frames up to, but not including, the next I frame in the video stream. The number of frames in a GOP may vary depending on the encoding parameters required for different video and display formats.
In each GOP, the I frame is the only frame that is not dependent on other frames in the GOP. All of the other B and P frames in the GOP are either directly dependent on an I frame or dependent on a frame that is directly or indirectly dependent on an I frame. If the I frame of a GOP is not available, none of the other B or P frames can be decoded. Thus, when an end user changes channels, the video receiver is not able to start decoding the video stream for the new channel until an I frame is received. If the channel change occurs after the I frame of a GOP has passed, the B and P frames for this GOP are not decodable. Decoding of the video stream will start when the first I frame of the next GOP is received. The following B and P frames for the next GOP can then be decoded based on the decoding of the I frame.
Thus, the delay between selecting a new channel and having the new channel displayed may approach the length of a GOP. It is not uncommon to have GOPs that have 15 or more frames. If the frame rate is 30 frames per second (fps), the delay incurred while waiting for an I frame can approach or exceed 0.5 seconds, which is a significant portion of the overall channel change delay in IPTV systems. Accordingly, there is a need for a technique to reduce the channel change delay associated with decoding media streams that employ encoding techniques where certain frames in a GOP are dependent on other frames in the GOP. There is a further need to decrease the overall channel change delay in IPTV systems.
For the following description, a group of frames (GOF) is defined as a series of frames that includes an I frame (intra-coded frame) and one or more frames that depend on the I frame. These dependent frames are referred to generally as D frames. For an MPEG video application, A GOP is a type of GOF wherein P frames (predicative frames) and B frames (bi-directional frames) are considered D frames.