Throughout the disclosure, the term “block” of video data is used to denote a subset of the data comprising a frame of video data having spatial location within a rectangular region the frame. A block of video data can but need not consist of compressed (or otherwise encoded) video data. Examples of blocks of video data are the conventionally defined macroblocks of MPEG-encoded video frames.
Conventional media processors often cache video data during decoding or other processing, but are typically highly inefficient in several ways including in that the data cache hit rate is low, power consumption and costs are high, and performance is low. Even with the availability of huge on-chip RAM, it may not be feasible to cache all the necessary data on the chip on which the processor is implemented. As a result, most current media processor designs do not use caches for referencing pixel data.
In one type of conventional video decoding, a Motion Compensation Unit (MCU) generates decoded video data from encoded (compressed) frames and reference frames of encoded video. In typical operation, an MCU consumes much data bandwidth. Most of the memory accesses triggered by an MCU during decoding operation are for the purpose of bringing in two-dimensional pixel blocks needed for computations. These accesses are typically very expensive for several critical reasons, including the following: (1) most of the reference pixel data (pixels of reference frames) are in external SDRAM (SDRAM external to the chip in which the MCU is implemented) and hence, accessing such data consumes much power; (2) two-dimensional blocks of reference pixel data are needed at any pixel boundary, and due to restrictions on operation of SDRAM and memory controllers for controlling accesses to SDRAM, the actual pixel bandwidth is higher than the data actually requested by the MCU; and (3) latency to bring data into an MCU from external SDRAM is also high and hence, accesses to the external SDRAM result in performance loss. To overcome these issues, it has become a standard practice to include huge amounts of on-chip RAM in integrated circuits that implement video decoding. By providing a sufficient amount of internal storage (on-chip RAM or “IRAM”) for reference pixel data, it is possible to reduce bandwidth for accesses from external memory (e.g., an external SDRAM) and to reduce power consumption.
It has been proposed to cache video data (from external RAM) in on-chip memory during decoding or other processing. For example, U.S. Patent Application Publication No. 2004/0264565, published Dec. 30, 2004, discloses caching of video data blocks (that have previously been read from a DRAM) in a cache memory organized into tag blocks, with dynamic tracking of pixel addresses. However, this caching technique consumes undesirably large amounts of power in typical applications, since maintaining each tag RAM and performing address comparisons (i.e., comparing requested data addresses with tag block address ranges in response to each read request for a block of data) consumes much power (the calculations and address mapping need to be made on every block transaction).
U.S. Pat. No. 6,618,440, issued on Sep. 9, 2003, U.S. Patent Application Publication No. 2006/0002475, published on Jan. 5, 2006, and U.S. Patent Application Publication No. 2006/0023789, published on Feb. 2, 2006, also disclose caching of video data from external RAM during decoding or other processing.
Even with a large amount of on-chip RAM (“IRAM”) available for caching reference pixel data, it may not be practical to store complete reference frames in such IRAM. For example, many encoding schemes (e.g., MPEG-2) require that an MCU employ two reference frames during operation, and since a single reference frame at 720×480 resolution would consume 720×480*1.5 bytes=518 Kbytes, caching the large amount of data comprising two reference frames is typically not practical. IRAM memory requirements for caching are strongly dependent on the resolution and format of the video being decoded. As a result, most conventional caching schemes do not cache reference data in IRAM when the resolution of the video data undergoing processing is greater than a certain limit (e.g., 352×288 pixels).
Having analyzed motion vectors associated with typical video content, the present inventors have recognized that caching of partial reference frames can be effectively used to minimize bandwidth for accesses to external memory during video decoding. FIG. 1 shows the vertical range of reference data requested by an MCU during decoding of MPEG-2-encoded video streams from four different movies. In FIG. 1, bars “A” represent requested blocks of data in lines of a reference frame corresponding to those within a vertical range of 15 pixels of the lines (of the current frame being decoded) containing the current macroblock undergoing decoding (i.e., blocks consisting of data in lines of the reference frame vertically separated by not more than 15 lines from the “current macroblock lines”), bars “B” represent requested blocks (other than those in bars “A”) consisting of data in lines of a reference frame within a vertical range of 31 pixels of the current macroblock lines, bars “C” represent requested blocks (other than those in bars “A” and “B”) consisting of data in lines of a reference frame within a vertical range of 47 pixels of the current macroblock lines, and bars “D” represent requested blocks (other than those in bars “A,” “B,” and “C”) consisting of data in lines of a reference frame within a vertical range of 63 pixels of the current macroblock lines. Consistent with FIG. 1, the inventors have recognized that most (typically more than 90%) of the blocks of reference data requested during typical video decoding are contained in lines of a reference frame that are vertically separated by not more than 16 pixels from the current macroblock lines. Thus, a class of preferred embodiments of the present invention perform on-chip pre-caching of lines (of reference frames) that are vertically separated by not more than 16 pixels from the current macroblock lines, to minimize accesses during video decoding to external memory (e.g., an external SDRAM) in which each reference frame is stored.
The inventors have recognized that caching of partial reference frames also presents its own set of problems. For example, if a two-dimensional region of a reference frame is cached in IRAM, and an MCU requests data from this cached region, the data can be sent directly from the cache memory without any access to external SDRAM. However, if the MCU requests a block of reference data that is only partially cached (e.g., a boundary block), fulfilling the request is inherently complicated and requires accessing of at least some requested data from external memory.
The present invention accomplishes efficient caching (in IRAM) of reference data needed to perform video decoding (or other processing) by pre-caching of partial reference frames. Some preferred embodiments of the invention accomplish efficient on-chip pre-caching of partial reference frames, and efficient handling of requests from a video processor for reference data, in a manner that is conveniently programmable (e.g., in firmware).