The technical field of this invention is data processing systems and particularly data processing systems with combined cache memory and static random access memory, and direct memory access.
Data processing systems typically employ data caches or instruction caches to improve performance. A small amount of high speed memory is used as the cache. This cache memory is filled from main memory on an as needed basis. When the data processor requires data or an instruction, this is first sought from the cache memory. If the data or instruction sought is already stored in the cache memory, it is recalled faster than it could have been recalled from main memory. If the data or instruction sought is not stored in the cache memory, it is recalled from main memory for use and also stored in the corresponding cache. A performance improvement is achieved using cache memory based upon the principle of locality of reference. It is likely that the data or the instruction just sought by the data processor will be needed again in the near future. Use of cache memories speeds the accesses needed to service these future needs. A typical high performance data processor will include instruction cache, data cache or both on the same integrated circuit as the data processor core.
Cache memories are widely used in general purpose microprocessors employed in desktop personal computers and workstations. Cache memories are frequently used in microprocessors employed in embedded applications in which the programmable nature of the microprocessor controller is invisible to the user. Caching provides a hardware managed, programmer transparent access to a large memory space via a physically small static random access memory (SRAM) with an average memory access time approaching the access time of the SRAM. The hardware managed and programmer transparent aspect of cache systems enables better performance while freeing the programmer from explicit memory management.
Cache memories are typically not used with digital signal processors. Digital signal processors are generally used in applications with real time constraints. Such real time constraints typically do not operate well with cache memories. When employing cache memories the access time for a particular instruction or data cannot be predetermined. If the sought item is stored in the cache, then the access time is a known short time. However, if the item sought is not stored in the cache, then the access time will be very much longer. Additionally, other demands for main memory access will make the access time from main memory vary greatly. This variation in memory access time makes planning for real time applications extremely difficult or impossible.
Digital signal processors will more typically include some directly addressable SRAM on the same integrated circuit as the data processor core. The programmer must manage transfer of critically needed instructions and data to the on-chip SRAM. Often this memory management employs a direct memory access unit. A direct memory access unit typically controls data moves between memories or between a memory and a peripheral ordered by the data processor core. Once begun on a particular data transfer the direct memory access unit operates autonomously from the data processor core. Once stored in the on-chip SRAM, these items are available to the data processor core at a greatly lowered access time. Thus these items will be available to service the real time constraints of the application. Note that both the data processor core and the direct memory access unit may access the on-chip SRAM. The memory management task is difficult to program. The programmer must anticipate the needs of the application for instructions and data and assure that these items are loaded into the on-chip SRAM ahead of their need. Additionally, the programmer must juggle conflicting needs for the typically limited space of the on-chip SRAM. While this is a difficult programming task, it is generally preferable to the unknown memory latencies of cache systems in real time applications.
Digital signal processor architectures are becoming more complex. The complexity of new applications have increased and their real time constraints have become more stringent. These advances have made the programming problem of real time memory management using on-chip SRAM increasingly difficult. This has slowed applications development. With variety in the size of on-chip SRAM and the variations in external memory latency, these programs have increasingly been limited to specific product configurations. Thus it has not been possible to employ the same set of instructions to solve a similar memory management problem in a similar product. This need for custom algorithms for each product prevents re-use of instruction blocks and further slows product development. The increasing architectural capabilities of processors also require bigger on-chip memories (either cache or SRAM) to prevent processor stalls. Processor frequencies are increasing. This increasing memory size and processor frequency works against easy scaling of the on-chip memory with increasing data processing requirements.
A recent development is the provision of a single memory on the integrated circuit which can be partitioned into varying amounts of cache and ordinary SRAM. This development is evidenced in co-pending U.S. patent application Ser. No. 09/603,645 filed contemporaneously with this application entitled UNIFIED MEMORY SYSTEM ARCHITECTURE INCLUDING CACHE AND ADDRESSABLE STATIC RANDOM ACCESS MEMORY claiming priority from U.S. Provisional Application No. 60/144,550 filed Jul. 15, 1999 and U.S. Provisional Application No. 60/166,534 filed Nov. 19, 1999. The programmer can then select the proportions of cache and SRAM appropriate for the then current operation of the digital signal processor.
There is a need in the art for a manner of ensuring cache coherence in a data processing system employing cache, directly addressable SRAM and direct memory access.
This invention concerns a data processing system having a central processing unit, at least one level one cache, a level two unified cache, a directly addressable memory and a direct memory access unit. The data processing system further includes a snoop unit generating snoop accesses to the at least one level one cache upon a direct memory access to the directly addressable memory. The at least one level one cache preferably includes a level one instruction cache and a level one data cache.
The snoop unit generates a write snoop access to both level one caches upon a direct memory access write to the directly addressable memory. The level one instruction cache invalidates a cache entry upon a snoop hit following a write snoop access. The level one data cache also invalidates a cache entry upon a snoop hit following a write snoop access. The level one data cache further writes back a dirty cache entry to the directly addressable memory if the cache entry is dirty, that is if it has been modified in the level one data cache.
The snoop unit generates a read snoop access to the level one data cache upon a direct memory read access from the directly addressable memory. The level one data cache invalidates a cache entry upon a snoop hit a read snoop access and writes back the cache entry to the directly addressable memory if dirty.
The snoop unit generates an eviction snoop access to the level one data cache upon a cache entry eviction from the level two unified cache. The level one data cache invalidates a cache entry upon a snoop hit following an eviction snoop access and writes back the cache entry to the level two unified cache if the cache entry is dirty.
In the preferred ebodiment a level two memory is selectively configurable as part level two unified cache and part directly addressable memory.