1. Technical Field
The present invention generally relates to an improved data processing system and in particular to improved memory management in a data processing system. Still more particularly, the present invention relates to improved cache memory management in a data processing system, which includes dynamic cache management algorithms driven by includes dynamic cache management algorithms driven by processor access sequence tracking.
2. Description of the Related Art
Most data processing systems are controlled by one or more processors and employ various levels of memory. Typically, programs and data are loaded into a data processing system""s memory storage areas for execution or reference by the processor, and are stored in different portions of the memory storage depending on the processor""s current need for such programs or data. A running program or data referenced by a running program must be within the system""s main memory (primary or main storage, which is typically random access memory). Programs or data which are not needed immediately may be kept in secondary memory (secondary storage, such as a tape or disk drive) until needed, and then brought into main storage for execution or reference. Secondary storage media are generally less costly than random access memory components and have much greater capacity, while main memory storage may generally be accessed much faster than secondary memory.
Within the system storage hierarchy, one or more levels of high-speed cache memory may be employed between the processor and main memory to improve performance and utilization. Cache storage is much faster than the main memory, but is also relatively expensive as compared to main memory and is therefore typically employed only in relatively small amounts within a data processing system. In addition, limiting the size of cache storage enhances the speed of the cache. Various levels of cache memory are often employed, with trade-offs between size and access latency being made at levels logically further from the processor(s). Cache memory generally operates faster than main memory, typically by a factor of five to ten times, and may, under certain circumstances, approach the processor operational speed. If program instructions and/or data which are required during execution are pre-loaded in high speed cache memory, average overall memory access time for the system will approach the access time of the cache.
In order to enhance performance, contemporary data processing systems often utilize multiple processors which concurrently execute portions of a given task. To further enhance performance, such multiple processor or multi-processor (MP) data processing systems often utilize a multi-level cache/memory hierarchy to reduce the access time required to retrieve data from memory. A multi-processor system may include a number of processors each with an associated on-chip, level-one (L1) cache, a number of level-two (L2) caches, and a number of system memory modules. Typically, the cache/memory hierarchy is arranged such that each L2 cache is accessed by a subset of the L1 caches within the system via a local bus. In turn, each L2 cache and system memory module is coupled to a system bus (or interconnect switch) such that an L2 cache within the multi-processor system may access data from any of the system memory modules coupled to the bus.
The use of cache memory imposes one more level of memory management overhead on the data processing system. Logic must be implemented to control allocation, deallocation, and coherency management of cache content. When space is required, instructions or data previously residing in the cache must be xe2x80x9cswappedxe2x80x9d out, usually on a xe2x80x9cleast-recently-usedxe2x80x9d (LRU) basis. Accordingly, if there is no room in the cache for additional instructions or data, then the information which has not been accessed for the longest period of time will be swapped out of the cache and replaced with the new information. In this manner, the most recently used information, which has the greatest likelihood of being again required, is available in the cache at any given time.
As noted, previous cache management techniques mostly depend on least-recently-used (LRU) algorithms in selecting a cache line victim for eviction and replacement. However, empirical measurements have shown that strict least-recently-used algorithms are unsatisfactory in many cases. Various enhancements to LRU algorithms have been proposed or implemented in recent years, such as software managed LRU, pseudo-random influences, etc. Basic symmetric multi-processor snooping protocols have also been utilized to influence cache management.
Even with a cache memory management scheme, there are additional, related problems that can cause system performance to suffer. For example, in data processing systems with several levels of cache/memory storage, a great deal of shuttling of instructions and data between the various cache/memory levels occurs, which consumes system resources such as processor cycles and bus bandwidth which might otherwise be put to more productive processing use. The problem has been exacerbated in recent years by the growing disparity between processor speeds and the operational speeds of the different system components used to transfer information and instructions to the processor.
It would be desirable, therefore, to provide a system increasing the xe2x80x9cintelligencexe2x80x9d of cache management, and in particular to explicitly utilize the detection of frequently employed storage access sequences (load/store instruction streams) to dynamically optimize cache management.
It is therefore one object of the present invention to provide an improved data processing system.
It is another object of the present invention to provide improved memory management in a data processing system.
It is yet another object of the present invention to provide improved cache memory management in a multiprocessor data processing system, which includes dynamic cache management algorithms driven by processor access sequence tracking.
The foregoing objects are achieved as is now described. In addition to an address tag, a coherency state and an LRU position, each cache directory entry includes historical processor access information for the corresponding cache line. The historical processor access information includes different subentries for each different processor which has accessed the corresponding cache line, with subentries being xe2x80x9cpushedxe2x80x9d along the stack when a new processor accesses the subject cache line. Each subentries contains the processor identifier for the corresponding processor which accessed the cache line, one or more opcodes identifying the operations which were performed by the processor, and timestamps associated with each opcode. This historical processor access information may then be utilized by the cache controller to influence victim selection, coherency state transitions, LRU state transitions, deallocation timing, and other cache management functions so that smaller caches are given the effectiveness of very large caches through more intelligent cache management.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.