Video codecs (COmpressor-DECompressor) are compression algorithms designed to encode/compress and decode/decompress video data streams to reduce the size of the streams for faster transmission and smaller storage space. While lossy, video codecs attempt to maintain video quality while compressing the binary data of a video stream. Examples of popular video codecs are MPEG-4, AVI, WMV, RM, RV, H.261, H.263, and H.264.
A video stream is comprised of a sequence of video frames where each frame is comprised of multiple macroblocks. A video codec encodes each frame in the sequence by dividing the frame into one or more slices or sub-portions, each slice containing an integer number of macroblocks. A macroblock is typically a 16×16 array of pixels (although other sizes of macroblocks are also possible) and can be divided into partitions for encoding and decoding. As an example, FIG. 1 illustrates the different ways that a macroblock can be partitioned in the H.264 compression standard. As shown in FIG. 1, a macroblock can be partitioned in one of 259 possible ways:                1. one partition        2. two vertical partitions        3. two horizontal partitions and        4. four smaller square partitions.In the last case, each resulting square partition can be partitioned in a similar manner (accounting for the other 256 ways to partition a macroblock) for up to a maximum of 16 partitions for a single macroblock.        
Macroblock content can be self-contained or predicted from one or two different frames. In a received bitstream (created during the encoding process), the following predictive information can be derived for each partition: a motion vector (comprised of x and y components) and an associated indicator to a frame (in a sequence of frames) that the motion vector is based upon. This indicator may be, for example, a reference frame index that is used in conjunction with an associated reference frame list to indicate from which particular frame the motion vector is based upon.
FIG. 2 illustrates the concept of reference frame indexes and reference frame lists. For each slice of a frame, there is stored one or more reference frame lists that are used to identify particular frames for motion vectors. In the example of FIG. 2, a first and second reference frame list is used to identify particular frames for motion vectors. Typically, when a slice is received, the header of the slice contains information to derive the frame reference lists.
A reference frame index associated with a motion vector specifies an entry (containing a frame number) in a reference frame list that indicates the frame in a sequence of frames that the motion vector is based upon. In the example of FIG. 2, there are seven active frames (i.e., frames that are presently held in storage) numbered 0 through 6. The frame numbered “3” is currently being processed. If a reference frame index specifies a value of 0 for an associated first motion vector, this indicates that the frame number in the first entry of the first reference frame list is the frame that the motion vector is based upon. Therefore, as shown in the example of FIG. 2, the frame numbered “2” is the frame that the first motion vector is based upon. As a further example, if a reference frame index specifies a value of 2 for an associated second motion vector, this indicates that the frame number in the third entry of the second reference frame list (for second motion vectors) is the frame that the second motion vector is based upon. Therefore, as shown in the example of FIG. 2, the frame numbered “6” is the frame that the second motion vector is based upon.
FIG. 3 shows a conceptual diagram of a conventional storage structure 305 containing partition data for a slice of a frame. In the example of FIG. 3, the slice is comprised of three macroblocks where a first macroblock (macroblock 0) is comprised of 1 partition, a second macroblock (macroblock 1) is comprised of 16 partitions, and a third macroblock (macroblock 2) is comprised of 2 partitions.
Typically, during decoding, storage is allocated for a slice on a “worst case scenario” basis that assumes each macroblock of the slice is divided into the maximum number of partitions (e.g., 16 partitions under H.264 standards). As such, under H.264 standards, for each macroblock of the slice, there is allocated enough storage space for a header and 16 partition entries. A partition entry in a data structure stores partition data during decoding of a frame. Each partition entry contains data for a single partition (e.g., motion vector and reference frame index data).
The diagram of FIG. 3 shows an allocated portion 310 of the storage structure that has been allocated for the slice. Since each macroblock of a slice will typically not be divided into 16 partitions, a macroblock will often be allocated storage for more partition entries than partitions contained in the macroblock. As such, the allocated portion of the storage structure for a macroblock will typically contain one or more used partition entries (entries that contain data for an actual partition of the macroblock) as well as one or more unused partition entries (entries that do not contain data for a partition of the macroblock). A used partition entry contains meaningful/useful data (such as motion vector and reference frame index data for a partition) whereas unused partition entries do not contain meaningful/useful data.
As shown in FIG. 3, for each macroblock of the slice, the storage structure contains a header section and a partition entry section. Typically, during decoding, storage is allocated for a header on a “worst case scenario” basis that assumes that the macroblock is divided into 16 partitions. As such, for each macroblock header, there is allocated enough storage space for 16 header partition entries. A conventional header for a macroblock contains data describing how the macroblock is partitioned. Such descriptive data includes, for example, position and dimension data of each partition. FIG. 4 shows a conceptual diagram of a conventional header 405 stored in the storage structure for macroblock 2. Macroblock 2 is divided into 2 partitions. As such, the header will include 2 used header partition entries, each entry containing descriptive data of a particular partition. The remaining 14 header partition entries will be empty (unused). In addition, each header typically contains data indicating the number of partitions in the macroblock.
As shown in FIG. 3, a first portion 315 of the storage structure 305 contains data for macroblock 0. Since macroblock 0 is comprised of 1 partition, the storage structure contains a used partition entry (partition entry 0) only for a first partition of macroblock 0, while the remaining 15 partition entries allocated for macroblock 0 (partition entries 1-15) are unused entries. A second portion 320 of the storage structure 305 contains data for macroblock 1. Since macroblock 1 is comprised of 16 partitions, the storage structure contains a used partition entry (partition entries 0-15) for a first through sixteenth partition of macroblock 1 so that all entries allocated for macroblock 1 is used. A third portion 325 of the storage structure 305 contains data for macroblock 2. Since macroblock 2 is comprised of 2 partitions, the storage structure contains used partition entries for a first and second partition of macroblock 2, while the remaining 14 partition entries allocated for macroblock 2 are unused.
As such, the partition data for the slice is typically stored in the storage structure in a haphazard pattern where unused partition entries are interspersed with used partition entries. This haphazard pattern of data storage in the storage structure causes decoding of the slice to be inefficient. This is due to the fact that when a CPU is loading partition data from storage during decoding, it retrieves chunks of memory (such as adjacent partition data) from the storage structure rather than retrieving only the precise data the CPU requires at the moment. The retrieved chunks of data may contain used and unused partition entries. The retrieved chunks of data are stored in a cache (e.g., CPU cache) that the CPU can access quickly (typically in a significantly shorter time than the CPU can access the storage structure).
If the CPU later needs particular partition data during processing of the slice, the CPU first determines if the particular partition data exists in the cache since the particular partition data may have been included a previously retrieved chunk of data and the access time to the cache is shorter than to the storage structure. If the particular partition data exists in the cache, this is referred to as a “cache hit” where retrieval of the particular partition data from the cache is fast. If the particular partition data does not exist in the cache, this is referred to as a “cache miss” and the CPU must then retrieve the particular partition data from the storage structure which is slower.
When partition data is stored in the storage structure in a haphazard manner where unused partition entries are interspersed with used partition entries, there is typically a higher rate of “cache misses” during processing of the slice since the retrieved chunks of data will also contain unused partition entries interspersed with used partition entries, where the unused partition entries contain non-useful data.
As such, there is a need for a method of organizing partition data in the storage structure that allows for more efficient processing of the partition data.