HDTV promises a very impressive quality of picture compared to the contemporary standard definition digital TV by substantially increasing the picture resolution. The picture resolution for HDTV application can be as high as 2K×1K, demanding a very high rate of compression of the video data. The standards like H.264 or WMV9 are able to provide 60:1 or higher compression ratios at a particular level, making it suitable for HDTV application at the cost of increasing the complexity of the compression tools. An example of the motion estimation (ME) process in H.264, which is the major source of compression, will give an insight into the complexities involved and the corresponding consequences on the decoder side. In order to achieve very high compression, the standard allows to interpolate quarter pixel locations in the reference frame for motion estimation, and the number of reference frames can be as high as 16 for certain levels. Reference may be had to FIG. 2 in this context, showing different block sizes. The block sizes are variable and include sizes such as 16×16 (200), 16×8 (201), 8×16 (202), 8×8 (203), 8×4 (206), 4×8 (205) and 4×4 (204), because of which a 16×16 macroblock can be made of any combination of blocks. As a consequence, the decoder has to handle high complexities like more computations due to the quarter pixel interpolation, handling of multiple reference frames and above all huge data transfer with the external memory. Having reference to FIG. 3, for the quarter pixel interpolation of the luminance component, the motion compensation MC (104) needs 5 more pixels from both the directions which leads the MC to fetch a 9×9 block 301 of pixels 300 to compensate a 4×4 block 302 which means 4 times extra data for each 4×4 block. In a real time application, the huge amount of data to be fetched during a high resolution frame decoding time and its 2-D nature in the external memory demands extremely high external memory bandwidth, thereby creating a bottleneck in the design of a HDTV video decoder. Added bandwidth inefficiency results owing to fixed burst size of DDR (Double Data Rate) memory 106 (which is invariably required due to the high bandwidth requirement in HDTV applications) because of which extra data has to be fetched along with the required data. The amount of required data fetch cannot be reduced since the standard mandates that for a high compression ratio, the bandwidth requirement for MC can be reduced by packing the frame data in an efficient manner in the external memory which will reduce both extra data fetch and page or bank change latencies.
The data in the external memory can either be stored in raster scan order of the pixels, or grouping a few macroblocks and storing the pixels of each group in raster scan order in one page in the memory. If the data in the external memory is in a raster scan order of pixels, then a fetch of one M×N block may need “N” page changes which involve high latency. Even if the pixel rows are stored in different banks in a round-robin order, row precharge and activation time for different banks can't be completely hidden for small burst sizes. As the 4×4 block needs to fetch 5 times more data for interpolation (9×9 chunks), such kind of frequent row change latency will make the worst case bandwidth requirement extremely high, and the bus efficiency extremely poor.
Although Motion Compensation takes around 70% of the total bandwidth, the data required for other compression tools like in-loop filter 107 in both H.264 and WMV9 standard is also significant, whereby the data storage scheme needs to be suitable for all the requirements by different tools.
As prior art in the related field, the following publications may be referred to:
1. Tetsuro Takizawa (Multimedia Research Laboratory, NEC Corporation) and Masao Hirasawa (1st system LSI Division, NEC Electron Devices), “An efficient memory arbitration algorithm for a single chip MPEG2 AV decoder”, IEEE Transaction on Consumer Electronics, Vol. 47, No. 3, August 2001.2. Marco Winzkerl, Peter Pirsch (Laboratorium fur Informationstechinologie, Universitat Hannover, Germany) and Jochen Reimers (Deutsche Bundespost TELEKOM, Forschungs-und Technologiezentrum, Germany), “Architecture and memory requirements for stand-alone and hierarchical MPEG2 HDTV-decoders with synchronous DRAMs.pdf”, IEEE.3. Egbert G. T. Jaspers and Peter H. N. de, “Bandwidth reduction for video processing for consumer systems”, September 2001.U.S. Pat. No. 6,614,442, titled “Macroblock tiling format for motion compensation”, issued to Ouyang, et al., tiles the luminance and chrominance components of several MBs and places them in a single page in the memory. The problem with such storage is that the tile size is fixed. A small block, say 9×9 in H.264 may have to fetch the complete tile if the DDR RAM (Random Access Memory) is configured for a larger burst or it has to fetch data in small bursts to avoid fetching of redundant data which is highly inefficient for other kind of data transfers. This tiling does not give any advantage where the block sizes are variable and does not separate top and bottom fields for an interlaced picture.
All the prior art work generally relates to tiling of data in the external memory packs set of macroblocks in a different fashion, and is only suitable for a fixed and bigger size of data fetch.
There is therefore need for an efficient data storage technique which is adaptable and which reduces the bandwidth requirement for variable block size and is suitable for various tools in the decoder which needs the external memory transactions.