This invention relates to memory circuits, and more particularly to addressable memory circuits located on a processor chip.
It is common for processors to dedicate a significant portion of their die area to an on-chip static memory. Such on-chip memory typically is organized as a level-1 or a level-2 cache. The on-chip cache memory serves to temporarily hold data from (and for) external main memory, and to hold intermediate data used in processing. As the throughput of processors has increased, outpacing the speeds of off-chip memory, the on-chip cache has come to play a key role in keeping the functional processing units of the processor busy. The on-chip cache fetches data in a small block around a requested word, and attempts to keep frequently accessed data in storage, replacing less frequently accessed data. A pattern of referencing data which has high temporal locality takes advantage of the cache and enables efficient processing. On-chip cache, typically however, does not reduce access latency relative to off-chip memory when there is little spatial or temporal locality, and when the data set is significantly larger than the cache memory size. In particular for streaming data applications, such as image processing, the cache tends to replace a lot of other useful data with streaming data. Such streaming data is not likely to be accessed again within a short time. When it is re-accessed or nearby data is accessed, chances are high that the corresponding data block has already been replaced by other data. Accordingly, on-chip caches typically do not yield the same benefits for streaming data applications as for other applications.
Mediaprocessors have been developed to handle streaming data applications more efficiently. Some mediaprocessors such as the Texas Instruments TMS320C80 and the TMS320C6x replace the on-chip cache with a similar-sized addressable on-chip memory. The TM-1000 of Philips Trimedia family and the MAP1000 developed by Equator Technologies, Inc. and Hitachi Ltd. have lockable on-chip caches that can be reconfigured into addressable memory. Addressable on-chip memory is more desirable for streaming data, such as in image processing applications.
Streaming data are often fetched from external main memory sequentially. A direct memory access (xe2x80x98DMAxe2x80x99) engine does not need to be very complex to handle such access. When the addressable on-chip memory can fit an entire data structure, the memory is very effective at keeping the processor""s functional units busy.
The most significant disadvantage of the on-chip addressable memory is the complexity in managing it. The programmer specifies exactly how data is to be laid out in the addressable on-chip memory and initiates all DMA transfers at the correct times. It is a challenge for programmers to achieve such management efficiently with current compiler technologies. Another disadvantage is that the streaming data is short-lived. Still another disadvantage is that extra registers are needed to achieve the lowest cycle time for processing the streams of data.
Accordingly, there is a need for an efficient on-chip memory scheme for handling streaming data.
According to the invention, a multi-ported pipelined memory is located on a processor die serving as an addressable on-chip memory. Such on-chip memory enables efficient processing of streaming data. Specifically, the memory sustains multiple wide memory accesses per cycle. It clocks synchronously with the rest of the processor, and it stores a significant portion of an image.
According to one aspect of the invention, the multi-ported pipelined memory is able to bypass the register file and serve as a direct data provider to the processor""s functional units. When operated in such manner, multiple wide access patterns are achieved per cycle. This is desirable and advantageous for multimedia applications and multiprocessing environments. It also is desirable and advantageous when using a superscalar or a very long instruction word (xe2x80x98VLIWxe2x80x99) architecture.
According to another aspect of the invention, the multi-ported pipelined memory includes multiple memory banks which permit multiple memory accesses per cycle. In a preferred embodiment the memory banks are connected in pipelined fashion to pipeline registers placed at regular intervals on a global bus. The pipelined registers allow wire lengths to be kept short and are omitted in some embodiments to reduce the number of cycles for an on-chip memory access operation. The multi-ported pipelined memory sustains multiple transactions per cycle, and at a larger memory density than that of a multi-ported static memory (e.g., a register file).
According to another aspect of the invention, the multiported pipelined memory performs read and write operations on a shared data bus of a read write port simultaneously, significantly reducing the number of wires allocated.
According to another aspect of the invention, a given read port is able to perform multiple memory transactions in a single access operation. In a preferred embodiment, such read port is able to perform four parallel 64-bit memory transactions in a single access operation. The transactions are returned as a single concatenated word. An advantage of such feature is that the number of accesses performed by a multimedia application (e.g., warping, histogram equalization) accessing non-consecutive memory words is reduced.
According to another aspect of the invention, one port of the multiple ports is capable of serving as a read port or a write port. A given access request initiated during any given clock cycle may be a read access request or a write access request. As the request is processed in pipelined fashion over multiple clock cycles, the ensuing access requests also may be either read access requests or write access requests. Thus, within the pipeline of the one port, both read and write operations may be implemented concurrently. In a sense the multiple transactions are overlapping in part.
These and other aspects and advantages of the invention will be better understood by reference to the following detailed description taken in conjunction with the accompanying drawings.