Digital image video devices (e.g., high definition television, digital video recorders, cameras, and conference systems) have data rate or bandwidth limitations. A digital image video recording creates an enormous amount of digital data. To accommodate digital image video device bandwidth limitations, and deliver digital data at increased transmission speeds, digital image data is compressed, or encoded, before being transmitted or stored. A compressed digital video image is decompressed, or decoded, prior to its display. Examples of widely used compression techniques (“standards-based”) are those that comply with the Moving Pictures Experts Group (“MPEG”) and Joint Pictures Experts Group (“JPEG”) standards.
Decoding encoded image data is computationally intensive and is associated with a heavily pipelined data-path where vast amounts of data move through a processor. The decoding process may be performed with a dedicated hardware decoder or it may be performed on a general purpose computer. The present invention is applicable to both types of decoders. However, by way of example, the present invention is disclosed in the context of a decoder implemented on a general purpose computer.
A number of advanced processor instruction sets (e.g., Sun SPARC VIS, sold by Sun Microsystems, Inc., Palo Alto, Calif.) have been introduced for use in general purpose computers. These advanced processor instruction sets optimize the computational aspects of standards-based video decoding. However, there is still a need for a standards-based video decoder design that optimizes the heavily pipelined data-path aspects of the video decoder's underlying memory system. CPU performance is directly related to the time the CPU spends in executing a program and the time that the CPU waits for the memory system. By reducing memory system access times, CPU performance can be enhanced.
An optimized decoder design involves several challenges. The design must consider effects that its implementation can have on performance aspects of the decoder's underlying memory system. For example, to facilitate rapid access to data and instructions, a general purpose CPU typically uses a Data Cache (“D-cache”) and a separate Instruction Cache (“I-cache”). Cache use is optimized when required data and instructions are located in a respective cache. CPU stall-cycles come primarily from cache misses (a cache miss occurs when the necessary data or instructions are not in a cache). To optimize cache use and increase overall processor performance, a video decoder should design its data-path both in a way that cache misses are minimized and caches are not underutilized.
Most existing cache use optimization schemes are optimized for a particular platform. Thus, a single cache use optimization scheme is not readily ported to different computer architectures.
Designers often need to make difficult cache use design tradeoffs to design a video decoder that is portable across several architectures. These tradeoffs involve balancing cache use factors that cannot all be maximized at the same time. The results of implementing these design tradeoffs are unpredictable and often lead to costly software rewrites to accommodate some new knowledge about the costs and benefits of the design tradeoffs.
In view of the foregoing, it would be highly desirable to provide an improved video decoder with scalable buffers that can be dynamically re-sized to optimally process a video input stream.