This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Since the introduction of the first personal computer (“PC”) over 20 years ago, technological advances to make PCs more useful have continued at an amazing rate. Microprocessors that control PCs have become faster and faster, with operational speeds eclipsing a gigahertz (one billion operations per second) and continuing well beyond.
Productivity has also increased tremendously because of the explosion in development of software applications. In the early days of the PC, people who could write their own programs were practically the only ones who could make productive use of their computers. Today, there are thousands and thousands of software applications ranging from games to word processors and from voice recognition to web browsers.
One of the most important advances in recent years is the development of multiprocessor computer systems. These powerful computers may have two, four, eight or even more individual processors. The processors may be given individual tasks to perform or they may cooperate to perform a single, large job.
In a multiprocessor computer system, processors may control specific processes. One of the processors may be designated to boot the operating system before the other processors are initialized to do useful work. Typically, the processor designated to boot the operating system is referred to as the bootstrap processor or BSP. The other processors in the system are typically designated application processors or APs. The system memory in a multiprocessing computer system may be connected to one of the processors, which may be referred to as a home processor or home node. Other processors may direct requests for data stored in the memory to the home node, which may retrieve the requested information from the system memory.
Each processor in the computer system may include a cache memory system, which may be integrated into the processor or external to the processor to enhance the performance. A cache memory may include the most recently accessed data or data sorted in a particular manner, which may be stored in a location to allow fast and easy access to the data. By saving this data in the cache memory system, execution time may be reduced and bottlenecks prevented by having data quickly accessible during the operation of a program. For instance, software programs may run in a relatively small loop in consecutive memory locations. To reduce execution time, the recently accessed lines of memory may be stored in the cache memory system to eliminate the time associated with retrieving the program from memory. Accordingly, as the speed of the system increases, the expense of the system may increase as well. Thus, in designing a cache memory system, speed and associated cost limitations may influence the configuration.
In designing a cache memory configuration, levels may be utilized to further enhance the performance of the system. In the various configurations, the number of cache levels may be adjusted or the interaction between levels may be modified in organizing the cache to specific requirements. For instance, in a two level cache system, a first level cache maintains a certain amount of data and a second level of cache may include data within the first level cache along with other additional data. By structuring the cache into levels, an efficient method of access to data may be used through a hierarchical configuration. In the system, the higher level caches may remain smaller with limited amounts of data, while lower level caches may include larger amounts of data. For instance, if the first level cache is unable to supply the data (i.e. the cache misses), then the second level cache may be able to supply the data to the requestor. With the second level cache supplying the data, the system does not have to access the slower main memory for the requested data. One of the objects of the cache levels may be to provide the data from the caches, which are faster than accessing the memory.
To maintain cost and provide efficiency, a cache memory system may include a large amount of dynamic random access memory (“DRAM”) along with static random access memory (“SRAM”). As SRAM is capable of providing faster access, it may be utilized as a memory cache to store frequently accessed information and reduce access time for the computer system. In selecting the appropriate combination of SRAM and DRAM, the cost and speed of the different memories may be utilized to design the appropriate cache. SRAM may be more expensive, but may enable faster access to the data. While DRAM may be less expensive, it may provide slower access to the data. As the access speed and cost factors may influence the design of the system, the DRAM may be utilized at lower cache levels, while SRAM is utilized at higher cache levels. This allows the cache memory system to be efficient and cost effective.
In addition, the design of the cache memory system may be influenced by the information provided within the system. In providing information, the cache memory system may have the cache divided into individual lines of data. The individual cache lines may include information that is unique to that cache line, such as cache line data and associated cache tag information. Cache line data may include information, instructions, or address information for the particular line of cache. Similarly, the cache tag may include information about the status of the cache line and other information. Based on the information provided in each of the lines, the cache memory system may enhance the memory system.
As another design factor, the structure and size of the caches may influence operation. For instance, if the lower cache levels are the same size or smaller than the upper cache levels, then the lower level caches may not be able to include all of the information within the upper level caches and satisfy the inclusion principle. Under the inclusion principle, the lower cache levels may include information within any upper cache levels that are connected to the lower cache level in addition to other information. This allows the lower cache level to provide additional functionality to the system, may enable the system to operate more efficiently, and may assist in maintaining the cache coherency. Further, problems or complications may arise with the cache coherency protocol because the lower cache levels do not include the upper level information, which results in the lower level caches being unable to respond to requests or probes. Thus, for the second level cache to provide this enhanced functionality, the second level cache may be larger than the first level cache because it includes more data than the first level cache. Accordingly, as the cache levels or networking between levels increases, the amount of the SRAM implemented in the cache levels may increase dramatically.
To operate the cache structure, a cache memory system may include a cache controller to track the information within the cache memory. In operation, the cache controller may respond to requests from processors, thus reducing the wait time in the system. The cache controller may be utilized to control the flow of data or information within a cache memory system. For instance, a request for data may be received by the cache controller, which may review the request to determine the appropriate action. If the cache controller determines that the information is within the memory cache, it may respond to the requestor without any wait time being incurred. However, if the cache controller does not have the information, then the information may be accessed from other memory, which will likely increase the wait time. Accordingly, the cache controller may be able to manage the information within the memory to better increase performance.
To operate properly with a cache controller, the cache memory subsystem should maintain the latest updated information to insure that the cache includes the most recent data and is consistent between the multiple caches and microprocessors. The maintenance of the data within the cache may be referred to as cache consistency or coherency. Data integrity may be comprised if the copy of the line in cache no longer matches the data stored in memory. Various techniques may be used to identify and control the individual lines of the cache. In a multiprocessor computer system, several cache systems may exist, which further complicates the complexity of maintaining the various caches.
With complex multiprocessor systems, a directory may be utilized to control the flow of information and ensure that the consistency of the cache is maintained. The directory may act as a central controller that tracks and maintains the various lines of cache within a system. With a directory, various systems communicate to the directory to request data. For the directory to function in the system, a cache consistency model may be used to handle the complexity of a multi-processing environment and may enable the directory to manage the caches.
To handle the complexity of multi-processing environment, a status model, such as the MESI cache consistency model, may provide a method for tracking the states of the information in each cache line. Under the MESI cache consistency model, four states that may exist for a cache line, such as modified, exclusive, shared, and invalid. The modified state may indicate that the cache line has been updated and may alert systems to write the modified line to memory. The exclusive state may indicate that the cache is not available at other caches. The shared state may indicate that copies of cache line are located in other caches, while the invalid state may indicate that the cache line is not present, uncached, or contains invalid data. These states may be used in handling the requests for cache lines.
Under the directory based cache coherency system, each processor may maintain a list of cache information and may manage by a directory, which may include the state and owner of the cache line. In maintaining this directory, a coherency protocol may be utilized to control the flow of information within the system. For the list to be properly maintained, the directory is consulted with each communication or request related to data lines in the memory. This allows the directory to maintain the caches with the most recent and correct data. However, a problem with this design is that the cache list or directory may become the central point for contention and frequently is a bottleneck, which results in increased effective memory latency. In addition, if a cache line is in the exclusive state, then a request path for that cache line may be substantially increased because the request for a cache line will flow to the directory and then to the owner of the cache line, back to tag directions, and back to the requester. Thus, the resulting transaction path from the directory may increase the response time.