The technical field of this invention is data processing systems and particularly data processing systems with cache memory, static random access memory and direct memory access.
Data processing systems typically employ data caches or instruction caches to improve performance. A small amount of high speed memory is used as the cache. This cache memory is filled from main memory on an as needed basis. When the data processor requires data or an instruction, this is first sought from the cache memory. If the data or instruction sought is already stored in the cache memory, it is recalled faster than it could have been recalled from main memory. If the data or instruction sought is not stored in the cache memory, it is recalled from main memory for use and also stored in the corresponding cache. A performance improvement is achieved using cache memory based upon the principle of locality of reference. It is likely that the data or the instruction just sought by the data processor will be needed again in the near future. Use of cache memories speeds the accesses needed to service these future needs. A typical high performance data processor will include instruction cache, data cache or both on the same integrated circuit as the data processor core.
Cache memories are widely used in general purpose microprocessors employed in desktop personal computers and workstations. Cache memories are frequently used in microprocessors employed in embedded applications in which the programmable nature of the microprocessor controller is invisible to the user. Caching provides a hardware managed, programmer transparent access to a large memory space via a physically small static random access memory (SRAM) with an average memory access time approaching the access time of the SRAM. The hardware managed and programmer transparent aspect of cache systems enables better performance while freeing the programmer from explicit memory management.
Cache memories are typically not used with digital signal processors. Digital signal processors are generally used in applications with real time constraints. Such real time constraints typically do not operate well with cache memories. When employing cache memories the access time for a particular instruction or data cannot be predetermined. If the sought item is stored in the cache, then the access time is a known short time. However, if the item sought is not stored in the cache, then the access time will be very much longer. Additionally, other demands for main memory access will make the access time from main memory vary greatly. This variation in memory access time makes planning for real time applications extremely difficult or impossible.
Digital signal processors will more typically include some directly addressable SRAM on the same integrated circuit as the data processor core. The programmer must manage transfer of critically needed instructions and data to the on-chip SRAM. Often this memory management employs a direct memory access unit. A direct memory access unit typically controls data moves between memories or between a memory and a peripheral ordered by the data processor core. Once begun on a particular data transfer the direct memory access unit operates autonomously from the data processor core. Once stored in the on-chip SRAM, these items are available to the data processor core at a greatly lowered access time. Thus these items will be available to service the real time constraints of the application. Note that both the data processor core and the direct memory access unit may access the on-chip SRAM. The memory management task is difficult to program. The programmer must anticipate the needs of the application for instructions and data and assure that these items are loaded into the on-chip SRAM ahead of their need. Additionally, the programmer must juggle conflicting needs for the typically limited space of the on-chip SRAM. While this is a difficult programming task, it is generally preferable to the unknown memory latencies of cache systems in real time applications.
Digital signal processor architectures are becoming more complex. The complexity of new applications have increased and their real time constraints have become more stringent. These advances have made the programming problem of real time memory management using on-chip SRAM increasingly difficult. This has slowed applications development. With variety in the size of on-chip SRAM and the variations in external memory latency, these programs have increasingly been limited to specific product configurations. Thus it has not been possible to employ the same set of instructions to solve a similar memory management problem in a similar product. This need for custom algorithms for each product prevents re-use of instruction blocks and further slows product development. The increasing architectural capabilities of processors also require bigger on-chip memories (either cache or SRAM) to prevent processor stalls. Processor frequencies are increasing. This increasing memory size and processor frequency works against easy scaling of the on-chip memory with increasing data processing requirements.
These increasing demands upon digital signal processors creates a need in the art for a cache system that better utilizes the data movement hardware to achieve better cache performance.
This invention is a data processing system including a central processing unit executing program instructions to manipulate data, at least one level one cache temporarily storing at least one of program instructions and data, a level two unified cache for supply of instructions and data to the level one data cache, a directly addressable memory and a direct memory access unit adapted for connection to an external memory. A superscalar memory transfer controller schedules plural non-interfering memory movements to and from the level two unified cache and the directly addressable memory each memory cycle in accordance with a predetermined priority of operation. The at least one level one cache preferably includes a level one instruction cache and a level one data cache.
The level two unified cache includes a cache tag memory with plural read ports and a single write port. The superscalar memory transfer controller is capable of scheduling plural cache tag memory read accesses and one cache tag memory write access in a single memory cycle.
The level two unified cache includes a cache access state machine for each level one cache, a cache access state machine for the direct memory access unit, a cache access state machine for level two unified cache read miss service, a cache access state machine for level two unified cache write miss service and a cache access state machine for victim eviction service. The superscalar memory transfer controller is capable of scheduling plural of cache access state machines in a single memory cycle.
The level two unified cache consists of plural memory banks. The superscalar memory transfer controller is capable of scheduling plural memory accesses to non-interfering memory banks of the level two unified cache in a single memory cycle.