1. Technical Field
The present invention relates to a data processing system in general, and in particular to a data processing system having a cache memory hierarchy. Still more particularly, the present invention relates to a data processing system having a highly scalable shared cache memory hierarchy that includes multiple local invalidation buses.
2. Description of the Related Art
Broadly speaking, all processing units within a symmetric multiprocessor (SMP) data processing system are generally identical. In other words, all of the processing units within an SMP data processing system generally have the same architecture and utilize a common set or subset of instructions and protocols to operate. Each processing unit within the SMP data processing system includes a processor core having multiple registers and execution units for carrying out program instructions. The SMP data processing system may also include a cache memory hierarchy.
A cache memory hierarchy is a cache memory system consisting of several levels of cache memories, each level having a different size and speed. Typically, the first level cache memory, commonly known as the level one (L1) cache, has the fastest access time and the highest cost per bit. The remaining levels of cache memories, such as level two (L2) caches, level three (L3) caches, etc., have a relatively slower access time, but also a relatively lower cost per bit. It is quite common that each lower cache memory level has a progressively slower access time and a larger size.
Within a cache memory hierarchy, when multiple L1 caches share a single L2 cache, the L2 cache is typically inclusive of all the L1 caches. Thus, the L2 cache has to maintain a dedicated inclusivity bit per L1 cache in an L2 directory for each L1 cache line. Consequently, the L2 directory, which is a costly resource, grows substantially as the total number of L1 cache lines increases. As a result, the additional inclusivity bit information in the L2 directory leads to a relatively large L2 cache design with relatively slow access time to the L2 directory. The present disclosure provides an improved inclusivity tracking and cache invalidation apparatus to solve the above-mentioned problem.
In accordance with a preferred embodiment of the present invention, a symmetric multiprocessor data processing system includes multiple processing units. Each of the processing units is associated with a level one cache memory. All the level one cache memories are associated with an imprecisely inclusive level two cache memory. In addition, a group of local invalidation buses is connected between all the level one cache memories and the level two cache memory. The imprecisely inclusive level two cache memory includes a tracking means for imprecisely tracking cache line inclusivity of the level one cache memories. Thus, the level two cache memory does not have dedicated inclusivity bits for tracking the cache line inclusivity of each of the associated level one cache memories. The tracking means includes a last_processor_to_store field and a more_than_two_loads field per cache line. When the more_than_two_loads field is asserted, except for a specific cache line in the level one cache memory associated with the processor indicated in the last_processor_to_store field, all cache lines within the level one cache memories that shared identical information with that specific cache line are invalidated via the local invalidation buses connected between all the level one cache memories and the level two cache memory.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.