A video signal is typically made up of a number of frames, where each frame represents an image. The individual frames are displayed at a high enough rate that provides a moving image to appear to a viewer. A digital video signal is a video signal in digital form. A digital video signal may be created using a digital video source, such as a digital camera. Alternatively, an analog video signal may be converted to digital form through the use of a frame grabber or other similar device.
Large amounts of data may be used in a digital video signal to produce a video for a viewer. Typically higher quality digital videos require more data than lower quality videos. In addition, the longer the digital video sequence, the more data must be transmitted.
Compression schemes, such as MPEG-2, MPEG-4, H.263+, and the like, are often used to reduce the amount of data used to represent digital video signals. Compression reduces transmission costs, and where a fixed transmission capacity is available, results in a better quality of multimedia presentation. As an example, a 6-MHz analog cable TV channel can carry between four and ten digitized, compressed channels, thereby increasing the overall capacity (in terms of the number of programs carried) of an existing cable television plant. Alternatively, a 6-MHz broadcast television channel can carry a digitized, compressed High-Definition Television (HDTV) signal to give a significantly better audio and picture quality without requiring additional bandwidth.
Compression involves eliminating redundancy present in the frames of a digital video signal. There are two different types of redundancy, spatial and temporal. Spatial redundancy refers to redundant information within a single frame. Temporal redundancy refers to redundant information between different frames. Intra frames, or I-frames, eliminate only spatial redundancy, and are encoded independently of other frames. Predictive frames, or P-frames, eliminate both spatial and temporal redundancy, and are encoded with respect to immediately previous I- or P-frames. A group of pictures (GOP) consists of an I frame and any number of P frames that successively follow the I frame. The higher the GOP value, the more P frames there are that successively follow a single I frame.
Transmitting I frames requires more bandwidth than transmitting P frames. Therefore, to reduce the bandwidth required to transmit a digital video signal, it is advantageous to encode a digital video signal using a high GOP value. A decoder, however, cannot begin decoding an encoded bitstream at a P frame. If a decoder randomly accesses an encoded digital video signal having a high GOP value, there is a relatively high likelihood that the decoder will first access a P frame. In such case, the decoder waits until it receives an I frame before it may begin the process of decoding. Benefits may be realized by methods and apparatus that may, on average, reduce this latency associated with randomly accessing an encoded digital video signal.