This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Since the introduction of the first personal computer (“PC”) over 20 years ago, technological advances to make PCs more useful have continued at an amazing rate. Microprocessors that control PCs have become faster and faster, with operational speeds eclipsing a gigahertz (one billion operations per second) and continuing well beyond.
Productivity has also increased tremendously because of the explosion in development of software applications. In the early days of the PC, people who could write their own programs were practically the only ones who could make productive use of their computers. Today, there are thousands and thousands of software applications ranging from games to word processors and from voice recognition to web browsers.
One of the most important advances in recent years is the development of multiprocessor computer systems. These powerful computers may have two, four, eight or even more individual processors. The processors may be given individual tasks to perform or they may cooperate to perform a single, large job.
In a multiprocessor computer system, processors may control specific processes. One of the processors may be designated to boot the operating system before the other processors are initialized to do useful work. Typically, the processor designated to boot the operating system is referred to as the bootstrap processor or BSP. The other processors in the system are typically designated application processors or APs. The system memory in a multiprocessing computer system may be connected to one of the processors, which may be referred to as a home processor or home node. Other processors may direct requests for data stored in the memory to the home node, which may retrieve the requested information from the system memory.
Each processor in the computer system may include a cache memory system, which may be integrated into the processor or external to the processor to enhance the performance. A cache memory may include the most recently accessed data or data sorted in a particular manner, which may be stored in a location to allow fast and easy access to the data. By saving this data in the cache memory system, execution time may be reduced and bottlenecks prevented by having data quickly accessible during the operation of a program or application. For instance, software programs may run in a relatively small loop in consecutive memory locations. To reduce execution time, the recently accessed lines of memory may be stored in the cache memory system to eliminate the time associated with retrieving the program from memory. Accordingly, as the speed of the system increases, the expense of the system may increase as well. Thus, in designing a cache memory system, organization, speed, and associated cost limitations may influence the configuration.
To organize the cache memory system, layers or levels may be utilized to enhance the performance of the system. For instance, in a three level cache system, a first level cache may maintain a certain amount of data and may be coupled to one of the microprocessors. A second level of cache may be connected to one or more first level caches and include the first level cache data along with other additional data. Finally, a third level cache may be connected to multiple second level caches and include the second level caches data along with other additional data. The higher levels of cache may include be designed for faster access, while the lower cache levels may be designed for slower access. The interconnectivity and complexity of the system dramatically increases with the added levels of cache, but may enable the system to operate more efficiently.
To provide an effective cache memory configuration, a cache memory system may include a large amount of dynamic random access memory (“DRAM”) along with static random access memory (“SRAM”). As SRAM is capable of providing faster access, it may be utilized as a memory cache to store frequently accessed information and reduce access time for the computer system. In selecting the appropriate combination of SRAM and DRAM, the cost and speed of the different memories may be utilized to design the appropriate cache. SRAM may be more expensive, but may enable faster access to the data. While DRAM may be less expensive, it may provide slower access to the data. Accordingly, the structure of the cache system may be influenced by the access speed and cost factors.
In operation, the cache levels may be designed to increase in size from the highest level down to the lowest level for the cache memory system to provide benefits. For instance, if the first level cache is unable to supply the data (i.e. the cache misses), then the second level cache may be able to supply the data to the requestor. Likewise, if the second level is unable to supply the data, then the third level is accessed next. If none of the caches are able to supply the data, then the memory is accessed to retrieve the data. As discussed above, the intention of the cache levels is to provide the data with the caches, which are faster than accessing the memory. If the lower level caches are the same size or smaller than the upper level caches, then the lower level caches may not be able to include all of the information within the upper level caches and satisfy the inclusion principle. In addition, even if the lower level cache is larger than the upper level caches the inclusion property may not hold in some specific designs.
Under the inclusion principle, the lower levels of caches include the information within any upper level cache that it is connected to the lower level cache in addition to any other information. This allows the lower level cache to provide additional functionality to the system and enables the system to operate more efficiently. If lower levels of cache fail to follow the inclusion principle, problems or complications may arise with the cache coherency protocol because the lower levels do not include the upper level information. This may result in the lower level caches being unable to respond to requests or probes. For the second level cache to provide this enhanced functionality to the first level cache, the second level cache may be larger than the first level cache to be able include and maintain more data. Likewise, the same principle applies to the third level cache in providing increased functionality to the second level cache. This exponential increase in the cache size for the different levels increases the cost of the system, because the faster SRAM is more expensive than the slower DRAM. Thus, the number of cache levels and size of the cache levels may be adjusted along with the interaction between levels to design caches that optimize the various factors, such as cost, interconnectivity, and speed.
In addition to the access speed and cost factors, the system design may be influenced by the information to be provided by the system. In providing information, the cache memory system may have the cache divided into individual lines of data. The individual cache lines may include information that is unique to that cache line, such as cache data and associated cache tag information. Cache data may include information, instructions, or address information for the particular line of cache. Similarly, the cache tag may include information about the status of the cache line and the owner of the cache line. Based on the information provided in each of the lines, the cache memory system may be able to enhance the performance of memory system.
As another organizational factor, a cache memory system may include a cache controller to track the information within the cache memory. In operation, the cache controller may respond to requests from processors, thus reducing the wait time experienced in the system. The cache controller may be utilized to control the flow of data or information within a cache memory system. For instance, a request for data may be received by the cache controller, which may review the request to determine the appropriate action. If the cache controller determines that the information is within the cache, it may respond to the requestor without any wait time being incurred. However, if the cache controller does not have the information, then the information may be accessed from other memory, which will likely increase the wait time. Accordingly, the cache controller may be able to manage the information within the memory better to increase performance.
To operate properly with a cache controller, the cache memory subsystem should maintain the latest updated information to insure that the cache includes the most recent data and is consistent between the multiple caches and microprocessors. The maintenance of the data within the cache may be referred to as cache consistency or coherency. Data integrate may be compromised if the copy of the line in the cache no longer matches the data stored in memory. Various techniques may be used to identify and control the individual lines of the cache. In a multiprocessor computer system, several cache subsystems may exist, which further complicates the complexity of maintaining the various caches.
With complex multiprocessor systems, a directory or snoop protocol may be utilized to control the flow of information and ensure that the consistency of the cache is maintained. For instance, in a directory-based system, the directory may act as a central controller that tracks and maintains the various lines of cache within a system. With a directory, various subsystems communicate to the directory, which manages the cache memory for the system and maintains the cache coherency protocol. A cache consistency model may be used to handle the complexity of the multi-processing environment and enable the directory to manage the caches.
For instance, a status model, such as the MESI cache consistency model, may provide a method for tracking the states of information in each cache line. Under the MESI cache consistency model, four states may exist for a cache line, such as modified, exclusive, shared, and invalid. The modified state may indicate that the cache line has been updated and may alert other systems to write the modified line to memory. The exclusive state may indicate that the cache is not available at other caches. The shared state may indicate that the copies of the cache line are also located in other caches, while the invalid state may indicate that the data in the cache line is not present, uncached, or invalid. These states may be used in handling the requests for cache lines.
Under the MESI model, each processor may maintain a list of cache information in a cache list, which may include the state and owner of the cache line. In maintaining this cache list, the coherency protocol may be utilized to control the flow of information within the system. For the list to be properly maintained, the directory is consulted with each communication or request related to the cache lines. This allows the directory to maintain the caches with the most recent and correct data.
However, the problem with this design is that size of the lower level caches may be limited in size by cost factors. If the lower level caches are limited in size, the lower caches may not be able to include the upper level caches in addition other data. Accordingly, the cache inclusion property for these caches fails to be applied. If the cache inclusion property does not hold for lower levels caches, any requests or probes may have to be forwarded to upper level caches. In this situation, a processor associated with the cache may have to compete with probes or other requests for the cache and bus time. Thus, with the processor having to compete for bus and cache access, the efficiency of the system may be reduced and coherency latency may increase.