1. Field
This disclosure is directed to methods, apparatus and systems for distributing digital data encoded in a way to enable random access of a data stream.
2. Description of the Related Art
Digital video and audio compression technologies have ushered in an era of explosive growth in digital multimedia distribution. Since the early 1990's, international standards groups such as, for example, the Video Coding Experts Group (VCEG) of ITU-T and the Motion Pictures Expert Group of ISO/IEC, have developed international video recording standards. The standards developed include, for example, MPEG-1, MPEG-2, MPEG-4 (collectively referred to as MPEG-x), H.261, H.262, H.263, and H.264 (collectively referred to as H.26x).
The international video recording standards follow what is known as a block-based hybrid video coding approach. In the block-based hybrid video coding approach, pixels serve as the basis of digital representation of a picture or, as it is commonly called and will be referred to in this application, a frame. A group of pixels form what is known as a block. A common block size for performing digital compression operations on is known as the macroblock. Macroblocks are made up of 16×16 pixels. Sub-macroblocks are made up of smaller sets of pixels including, for example, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 pixels. Compression operations can also be performed on sub-macroblocks, therefore in order to not obscure the inventive aspects of the invention, the operations will be discussed as operating on portions of a frame which can include all block sizes or groups of block sizes. A group of macroblocks form what is known as a slice. Slices can be made up of contiguous macroblocks in the form of, for example, a row, a column, a square, or a rectangle. Slices can also be made up of separated macroblocks or a combination of separated and contiguous macroblocks. Slices are grouped together to form a frame at one point in time of a sequence of frames that form a video sequence.
The MPEG-x and H.26x standards describe data processing and manipulation techniques that are well suited to the compression and delivery of video, audio and other information using fixed or variable length source coding techniques. In particular, the above-referenced standards, and other hybrid coding standards and techniques will compress video information using intra-frame coding techniques (such as, for example, run-length coding, Huffman coding and the like) and inter-frame coding techniques (such as, for example, forward and backward predictive coding, motion compensation and the like). Specifically, in the case of video processing systems, hybrid video processing systems are characterized by prediction-based compression encoding of video frames with intra-frame and/or inter-frame motion compensation encoding.
Inter-frame coding techniques exploit temporal correlation between frames in video sequences. Temporal prediction, which is typically used for this purpose, reduces the random access points in the compressed bitstream because decoding of the current temporally predicted frame cannot be accomplished unless the frame upon which the current frame references is previously decoded. Hence, at the decoder or user application end, the received bitstream (in the form of downloaded files or streamed bits in the case of streaming media) may not be played back instantaneously. Instead, decoding may start only at pre-determined random access points in the stream/file such as, for example, Intra-coded frames or IDR frames. IDR, or Instantaneous Decoder Refresh, frames were introduced in H.264 and may be used as a random access point. Information prior (in time) to an IDR frame may not be used as a reference for subsequent frames with any of the above mentioned inter-coding techniques. In video streaming applications, particularly in multicast scenarios, the ability to decode instantaneously (or sooner than later) may be preferable from a user experience point of view.
Intra-coding techniques result in less compression than inter-coding techniques. As a result, increasing the frequency of IDR and Intra-coded frames may cause too high a bit rate while supplying frequent random access points. An improved, potentially lower bit rate method of providing a random access point is needed.
Streaming video systems usually may need to switch between different channels. The maximum time spent on switching from an old channel to a new channel should usually be upper-bounded to improve user experience.
Traditionally intra (I-) frames are introduced at the beginning of every group of picture (GOP) to limit the drifting between the encoder and the decoder. I-frames can also be used to mitigate error propagation caused by noisy channels, and they are especially effective combined with the concept of instantaneous decoding refresh (IDR) in the frame work of advanced video coding (AVC).
The methodology of using I-frames can be borrowed for channel switching. An IDR I-frame can be placed at the beginning of every GOP, which can remove the dependency of the video content in the new GOP on the content in the old GOP.
However, there are several disadvantages by using this scheme.
First, I-frames are bulky in size, which typically causes a peak in instantaneous bit rate at the beginning of every GOP. Peak-to-average ratio of frame sizes is increased with huge I-frames, which may require a bigger decoding buffer and more stringent decoder timing, otherwise, bursts of data may clog the decoder. This effect can make the design of hardware decoders based on ARM or DSP more complex and expensive.
Second, there is a lot of spatial dependency between macroblocks (MBs) in an I-frame. Although the AVC standard allows spatial prediction inside of I-frames, the prediction is limited to adjacent neighbors and in a causal fashion only. The total number of intra-coded MBs is at least the number of MBs in a picture, because all the MBs are intra-updated at the same point in time. However, if we choose to intra-update a portion of a picture in multiple points in time, we may use motion estimation to reduce the number of intra-MBs that may be required.
Third, some intra-coded MBs in the initial I-frame may never be referenced in ensuing pictures. For example, an object can disappear in the period of several frames. This occurs if the object moves out of the picture, or it is covered by other objects. In this case, the MBs representing this object may not be required to be intra-coded because ensuing frames do not contain this object any more and are not predicted from it. Another example is single frame camera flashes. Due to the significant luma shift at a camera flash frame, its MBs are normally useless for prediction of future frames (without camera flashes). Similarly, the camera flashed area may not be intra-coded to improve encoding efficiency.