1. Field of the Invention
This invention relates to a data processor having a hierarchial memory arrangement including a cache memory to speed data retrieval. More specifically, the present invention relates to an improved system and method for cache control and management which processes out-of-order return data and instructions, and/or multiple fetch requests. According to another aspect of the present invention, sets of cache memory can be checked for multiple requests. Multiple instructions and/or data can then be returned to fulfill more than one request at a time.
2. Related Art
Increases in processor speed have led to the development of memory devices having very fast access times. The cost of such memory devices, however, is often proportionally related to the speed a(which the devices can be accessed. Thus, to store all of a processor's data and the program instructions in very fast memory can lead to very high memory costs.
To minimize the cost associated with high-speed memory while still reaping the benefits of fast access times, system designers have implemented cache memories. In a cache memory system, the majority of instructions and program data are stored in standard memory such as a disc, hard drive, or low speed random access memory (RAM). A relatively small amount of high-speed memory, called a cache, is provided to store a subset of the program data and/or instructions. In this patent document, the term data, when used in reference with storage in the cache, is used to generally refer to either instruction execution data or to program instructions.
Typically, those data that are most frequently accessed by the processor are stored in the cache. As a result, these data can be accessed by the processor at a much faster rate. Additionally, some systems implement an instruction prefetch, wherein instructions are fetched from low-speed storage in advance and stored in the cache. As a result, the instructions are already in cache when needed by the processor and can be accessed quickly.
System designers frequently implement cache storage to speed access times for data and instructions. In such systems, a cache control unit is often implemented to manage the storage of data in the cache and provide the data to the processor. In these systems, the instruction fetch unit and instruction execution unit go to the cache control unit to request the required data. Upon receipt of a request, the cache control unit first searches the cache for the data requested. If the requested data exist in the cache, the data are provided to the requesting unit from the cache. This condition is known as a cache hit. If the data are not present in the cache, the cache control unit retrieves the data from storage and stores the data in a known location in the cache.
Instruction requests are often handled in sequence according to their order in a particular application program. Bottlenecks and delays occur as outstanding requests, which might otherwise be executed more quickly, wait for preceding slower instructions to be processed.
For example, in multi-processor systems sharing two or more buses, a central processor unit (CPU) stall condition arises when one of the buses is occupied in servicing a particular processor. When data requests or instructions are executed in sequence, the other processors depending on the occupied bus wait for the bus to become available before proceeding to process other data requests and instructions, regardless of the availability of other buses.
Delays can further arise when sequential data requests and instructions are made to wait for data return from storage devices which have a different rate of data return. For instance, data return from faster lower-level storage devices such as dynamic random access memory (DRAM) still must wait for preceding data requests made to slower lower-level storage devices such as an input/output (I/O) device.
To improve efficiency and speed in processing program instructions, many contemporary processors are capable of dynamic scheduling of instructions. Dynamic scheduling means that instructions can be retrieved from memory, scheduled, and executed, all in an order that is different from the program order. At a general level, dynamic scheduling allows a pipelined processor to maximize the utilization of its resources by prioritizing the use of the resources between the multiple processors. Such dynamic scheduling, however, does not consider the potential disparity between the storage devices or the resources themselves in executing instructions or data requests. The CPU stalls encountered when a bus is servicing a particular processor is also not overcome.
The inventors have discovered that there is a need for optimizing instruction execution and data requests after a cache miss. In particular, there is a need for accommodating out-of-order data returns and servicing multiple requests to increase the speed and efficiency of data retrieval. In multiple-processor systems sharing multiple buses it is especially desirable to avoid CPU stall. Further, it is desirable to avoid bottleneck and to quickly and efficiently accommodate data requests made to a variety of lower-level storage devices with different response times.