This invention relates to processor architecture and image processing applications, and more particularly to the register file(s) in a mediaprocessor.
Built-in parallelism in superscalar and Very Long Instruction Word (xe2x80x98VLIWxe2x80x99) architectures allows mediaprocessors, such as Philips Trimedia processor and Hitachi/Equator Technologies MAP, to perform multiple operations per clock cycle. The multimedia data processed by these mediaprocessors are typically supplied by streams of data. A stream is a sequence of data with predictable addresses. This attribute makes a stream a good candidate for cache prefetching.
Processing these streams in a cache-based system, however, is inefficient for two main reasons. First, many data streams have little temporal locality. For example, a video data stream is not used again in many cases. This makes placing them in a data cache wasteful. Second, many streams have a non-unit stride, which results in transferring the data to the cache and never referencing much of the data. Stride is the distance between successive steam elements.
One method of improving cache performance is to place prefetched data into stream buffers instead of the cache. One kind of a stream buffer is a FIFO prefetch buffer, which holds consecutive cache blocks. If a memory address produces a miss in the cache but a hit in the stream buffer, the data are moved from the stream buffer into the cache instead of having to go out to the external memory. Since many algorithms use multiple streams at a time, multi-way stream buffers have been developed. Multiway stream buffers are a group of stream buffers in parallel, which allow multiple streams to be prefetched concurrently. FIG. 1 shows a typical architecture with stream buffers.
The stream buffers discussed above require a large additional silicon area for storing the streamed data. In addition, the storage area of unused queues is wasted. Accordingly, there is need of a more efficient and effective architecture for handling streams of data.
According to the invention, a processor includes a register file with a dynamically configurable operand queue extension. The register file is configured by a user""s application program into registers and operand queues. The program designer determines how the register file is to be configured. Specifically, the programmer determines the trade-off between the number and size of the operand queue(s) versus the number of registers to be available to the program.
According to one aspect of the invention, all or a portion of the register file is allocatable into registers, and a portion of the register file is allocatable into zero or more operand queues. In one embodiment an additional address bit is used for each register address to define whether it is functioning as a register or part of an operand queue.
According to another aspect of the invention, the application program sets the locations and depth of each operand queue within the register file. A given queue occupies a consecutive set of registers, although multiple queues need not occupy consecutive registers.
According to another aspect of the invention, queue state logic maintains operand queue status information, such as a header pointer, tail pointer, start address, end address and number of vacancies for a given operand queue.
According to an advantage of this invention, by implementing the operand queues as a configuration of registers in the register file the size of the queue can be optimized for efficient use of silicon. Wasted area as for conventional stream buffers is avoided. The number of queues and the depth of each queue can vary, (e.g., one function may need three queues each with a depth of ten while another function may need five queues each with a depth of six). Furthermore, in the case where no queues are needed, there is no silicon sitting unused since the operand queue memory can be used for general-purpose registers.
These and other aspects and advantages of the invention will be better understood by reference to the following detailed description taken in conjunction with the accompanying drawings.