Certain embodiments of the invention relate to the field of compression and decompression of digital video signals, also referred to as coding, or encoding, and decoding. More specifically, certain embodiments of the invention relate to a method and apparatus for DRAM 2D video word formatting.
In some conventional systems that perform compression or decompression of digital video, one or more pictures or pictures or fields of video may be stored in dynamic random access memory (DRAM). The video data in DRAM is stored in a format that may be chosen by the system designer. In certain instances, there are conflicting goals for the choice of the DRAM storage format of video pictures.
Video encoders and decoders may use specialized hard wired logic, software operating on general purpose processors, software on specialized processors, or some combination of these. Digital video used in high volume applications such as digital video broadcasts, storage, and video on demand (VOD), most commonly utilizes the format known as MPEG-2, following the main profile of MPEG-2. These high volume applications are utilized in for example, terrestrial broadcast, digital cable systems, digital satellite systems, digital video discs (DVD), video over DSL, and other applications. Main profile specifies that video pictures are defined to use the so-called 4:2:0 sample format. In the 4:2:0 format, chroma sampling is defined such that there is one chroma component pair, for example Cb and Cr, for each pixel (2×2) set of luma samples. Similar 4:2:0 sampling is also used in other video formats, including MPEG-1 and proprietary formats.
DRAM systems and/or sub-systems have Word widths that are determined by the design of the DRAM system or sub-system. As demand for system performance continues to increase, the DRAM word width tends to increase accordingly. For example, in many MPEG-2 main profile at main level (MP@ML) decoders, the DRAM word width is 32 bits. High definition, for example MPEG-2 main profile at high level (MP@HL) decoder, commonly utilize 64-bit word widths. High performance decoders with, for example, unified memory (UMA), capability for decoding multiple streams of MPEG-2 MP@HL, or those capable of decoding more advanced formats such as MPEG-4 AVC, may utilize wider DRAM word widths such as 128 bits or greater. The word width for double data rate (DDR) DRAM is twice the width of the data port. For example, a 64-bit DDR has a 128-bit word width. The term GWord is used herein to refer to a data word with a width of 128 bits. The term JWord is used to refer to a data word with a width of 256 bits.
In a decoder or an encoder, video data is generally arranged into pictures in DRAM, where a picture can be picture structured or field structured. Due to the nature of video compression and decompression algorithms, generally the same data structure in DRAM is used for writing decoded blocks of pixels, reading previously decoded pixels for motion compensation, for example, and for reading decoded pictures for display. Additional functions may also require DRAM access. An arrangement of data that is efficient for one of these purposes may not be the most efficient for another of these purposes.
An important decision in the design of a video encoder or decoder is the arrangement of video sample data in DRAM. Video samples are generally 8 bits per sample in most consumer applications. Typically, conventional video samples are arranged in DRAM words in raster scan order, with separate DRAM words for luma and chroma. Chroma is typically grouped such that the two chroma components, Cb and Cr, are interleaved. Therefore, a 32 bit DRAM word may contain either 4 luma samples from one scan line, or 2 chroma samples from each chroma component, again from one scan line. Similarly, a 64 bit DRAM word may contain 8 luma samples or 4 chroma sample pairs, in both cases from one scan line each, and a 128 bit DRAM word or GWord would contain 16 luma samples from one scan line or 8 chroma sample pairs from one scan line.
With regard to displaying digital video, these arrangements of video samples in DRAM are efficient and sensible, since display of video is generally in raster scan order. Similarly, such arrangements are usually efficient for writing blocks of video samples to DRAM, since most common video formats utilize a macroblock structure wherein each macroblock is 16 pixels wide. However, such arrangements may be inefficient for reading video samples from DRAM for motion compensation reference data fetching.
Motion compensation generally requires reading groups of video samples from DRAM where the address, width and height of the data to be read is highly dependent on the video data being decompressed. In certain instances, it may be possible for a video stream compliant with the applicable standard to result in a large number of DRAM reads from DRAM addresses that result in inefficient DRAM operation. In MPEG-2 video, motion compensation blocks can be 16×16 or 16×8 samples of luma (width×height), and the blocks of samples to be read may be 16 or 17 pixels wide and 8, 9, 16 or 17 pixels high, depending on the motion vectors and other parameters found in the compressed data stream. Chroma motion compensation blocks are correspondingly reduced in size according to the chroma sampling such as 4:2:0. In more advanced video formats such as AVC, motion compensation blocks may be as small as 4×4 luma samples, with widths and heights each ranging from 4, 8, or 16 samples. Due to the effect of the 6 tap motion compensation filter in the AVC standard, the number of samples to be read from DRAM may include 5 additional samples in each dimension. As a result, a large number of possible groups of luma pixels may have to be read from DRAM, including such shapes as 9×9, 13×9, 9×13, etc. up to 21×21. There is also another set of sizes and shapes of blocks of chroma samples that may need to be read from DRAM.
The number of DRAM cycles required for motion compensation fetches may be significantly more than the number of cycles required for display or for writing to DRAM, particularly in the case of decoding worst case compliant streams. This is very significant because a well designed decoder should be able to perform all steps of decoding worst case streams in real time, and the availability of DRAM cycles may be a limiting factor in the performance of the decoder. Similar considerations apply to encoders.
Conventional arrangements of video samples in DRAM words result in inefficient use of DRAM cycles when performing motion compensation fetches, and as a result the number of DRAM cycles required for decoding may be increased. For example, with a 128 bit DRAM word arranged as 16 luma samples in raster scan order, a motion vector in the incoming bit stream may require reading a 4×4 block of luma samples where the block straddles DRAM page boundaries in both horizontal and vertical directions. As a result, 8 GWords of 128 bits each (2 horizontally and 4 vertically) would have to be read from DRAM, from possibly 4 different DRAM banks. Since there are 16 bytes of data required in this block, and each of the 8 DRAM words accesses 16 bytes, 7/8 or 87.5% of the DRAM bandwidth is wasted, plus many DRAM cycles may have to be spent to account for the use of different DRAM banks. This problem is aggravated as the DRAM word width increases.
The problem is further compounded by utilizing both frame type and field type picture codings with the types intermixed within a video stream. In frame coding, the lines of a picture are sequential from top to bottom. In field coding, a frame is conceptually divided into two fields, namely an odd field containing the odd numbered lines, and an even field containing the even numbered lines. With the picture types intermixed within a video sequence, a motion vector could require blocks of video samples to be read from frame coded pictures, top field pictures or bottom field pictures. Accordingly, it is desirable to find improved arrangements of data in DRAM that result in more efficient use of DRAM cycles when performing all the of the DRAM accesses required for decoding video in real time.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.