Commercial memory integrated circuits (chips), particularly low-cost, 16 Mbit dynamic random access memory (DRAM) and synchronous dynamic random access memory (SDRAM), are organized into two xe2x80x9cbanksxe2x80x9d, each bank typically consisting of 2-4 k pages of 256 words per page. A word might be 8 or 16 bits wide. In the former case there are 4 k pages, and in the latter 2 k pages, so the total number of bits is always 16 M. The chips may be ganged in parallel to increase the effective word widthxe2x80x94e.g., to 32-bits for a pair of chips. To illustrate the present state of the art, the case of 2 k pages of 16-bit (or ganged to 32-bits) words is described below.
Reading or writing data to/from such a memory involves opening (activating) a single page in one bank. Once the bank is open, any number of read and write operations to that page can be performed quickly. To access data on a different page of that bank, the first page must be closed, the bank containing the new page precharged and the new page opened. The precharge and activation involves a processing overhead of up to 9 (or even more) computer cycles and is a major factor in limiting the quantity of data that can be accessed in a given time period (i.e., the effective memory bandwidth.)
This overhead can be minimized in various waysxe2x80x94both banks can be used (their operation is essentially independent apart from sharing common communication channels with the accessing device), a bank can be precharged as soon as its use is complete thus preparing it for another access, and the data can be organized in such a way that many words of data are accessed on a single page.
In a video decoder that is compliant with the moving pictures experts group (MPEG) standard, commonly referred to as an MPEG decoder, a stored decoded anchor frame is accessed to predict a first approximation to a macroblock (MB) in a frame that is being decoded. Generally, an MPEG encoder contains an embedded decoder that is used to perform prediction encoding of the video frames. The decoder within an encoder stores decoded anchor frames in the same manner as a stand-alone decoder stores anchor frames.
In writing the anchor frames to memory for use in subsequent decoding operations, large enough quantities of data can be buffered (accumulated in a register bank in the decoding device) to make the memory usage fairly efficient. However, in reading this data to construct the macroblock predictions, the data is typically accessed in small quantities scattered in a random manner throughout the memory. The small size of the data retrieved with each access makes each transaction very inefficient, and the random distribution of the data makes traditional caching strategies ineffective. The result is that reading anchor frame data requires a very large memory bandwidth that makes decoders difficult to implement in a cost-efficient manner.
To be more specific about this storage problem: the memory is commonly addressed in a linear manner. That is, the data words are regarded as a sequence in order of increasing address with the column address (identifying the word on a page), bank index, and page address being treated as the successively more significant parts of the overall address. The rectangular array of pixels constituting the luma or chroma information of a frame or field of video is then written into memory in a raster scan fashion. That is, pixels are sequenced and written into memory in a sequenced order with the scanning being left to right horizontally along each row with the rows taken in succession top to bottom. Several pixels are typically written into a single data word; for example if a data word is 32-bits wide and the pixels are one byte each, four pixels are packed into each word. (Position of a pixel in the data word is effectively the least significant part of the pixel address.)
The video data is decoded on a macroblock by macroblock basisxe2x80x94for present purposes to be regarded simply as a rectangular array of pixels. Because the decoding proceeds in a raster scan order on a macroblock basis, it is possible to accumulate several macroblocks adjacent along a horizontal row before writing the decoded data to the memory. The data for each horizontal row of the array can be efficiently written since the storage method can make certain that the data is all on a single page. Successive rows may require a page change, with or without a bank change as well. The frequency of such changes can be minimized by accumulating several macroblocks if necessary. Furthermore, writing this data is regular and can be aligned on data word boundaries, which increases the access efficiency.
In reading the macroblock data for motion compensated prediction, however, the data is not aligned. As such, the desired rectangular array of data that is required to decode a predicted frame may begin and end in the middle of the group of pixels packed into a word. Consequently, the extra words must be read in order to extract the desired data. Furthermore, the rectangular array of pixels needed for prediction may be broken into subarraysxe2x80x94by field, for examplexe2x80x94which further tends to convert a few large memory transactions into many small ones. Consequently, the process used to read macroblock data is very inefficient.
Therefore, a need exists in the art for a method of storing pixel data to facilitate efficient memory utilization and reduce the memory bandwidth required in a predictive video decoder.
The disadvantages of the prior art are overcome by the invention of a method of storing pixel data in a memory of a predictive video decoder or such a decoder that is embedded in a predictive video encoder. The method stores pixel data from spatial blocks of pixels in each data word. For example, each data word contains pixel data from a rectangular block of pixels (RBP), e.g., a 2xc3x972 pixel block is stored in a 4 byte data word. These data words for a horizontal row of RBP""s are stored on a succession of pages from the same memory bank. Any left over word storage space on the last such page is used for other purposes other than storing video data. As such, a row of RBP""s does not overlap from one page to another. The next lower row of RBP""s is stored in a similar sequence beginning with a new page in the other memory bank.