1. Field of the Invention
The present invention generally relates to computer systems which include components that are subject to cycles in which data is read, modified and written back by a central processing unit (CPU) or other system device. Still more particularly, the present invention relates to a computer system implementation in which non-cacheable data or data in block-oriented devices can be selectively read in relatively large blocks and temporarily stored in a stream read buffer.
2. Description of the Relevant Art
For most computer systems, the number of clock cycles required for a data access to a memory device depends upon the component accessing the memory and the speed of the memory unit. Most of the memory devices in a computer system are slow compared to the clock speed of the central processing unit (CPU). As a result, the CPU is forced to enter wait states when seeking data from the slower memory devices. Because of the relative slowness of most memory devices, the efficiency of the CPU can be severely compromised. As the operating speed of processors increases and as new generations of processors evolve, it is advantageous to minimize wait states in memory transactions to fully exploit the capabilities of these new processors.
In an effort to reduce wait states, it has become commonplace to include one or more cache memory devices in a computer system. A cache memory is a high-speed memory unit interposed in the memory hierarchy of a computer system generally between a slower system memory (and/or external memory) and a processor to improve effective memory transfer rates and accordingly improve system performance. The cache memory unit is essentially hidden and appears transparent to the user, who is aware only of a larger system memory. The cache memory usually is implemented by semiconductor memory devices having access times that are comparable to the clock frequency of the processor, while the system and other external memories are implemented using less costly, lower-speed technology.
The cache concept is based on the locality principle, which anticipates that the microprocessor will tend to repeatedly access the same group of memory locations. To minimize access times of this frequently used data, it is stored in the cache memory, which has much faster access times than system memory. Accordingly, the cache memory may contain, at any point in time, copies of information from both external and system memories. If the data is stored in cache memory, the microprocessor will access the data from the cache memory and not the system or external memory. Because of the cache memory's superior speed relative to external or system memory, overall computer performance may be significantly enhanced through the use of a cache memory.
A cache memory typically includes a plurality of memory sections, wherein each memory section stores a block or a "line," of two or more words of data. A line may consist, for example, of four "doublewords" (wherein each doubleword comprises four 8-bit bytes). Each cache line has associated with it an address tag that uniquely associates the cache line to a line of system memory.
According to normal convention, when the processor initiates a read cycle to obtain data or instructions from the system or external memory, an address tag comparison first is performed to determine whether a copy of the requested information resides in the cache memory. If present, the data is used directly from the cache. This event is referred to as a cache read "hit." If not present in the cache, a line in memory containing the requested word is retrieved from system memory and stored in the cache memory. The requested word is simultaneously supplied to the processor. This event is referred to as a cache read "miss."
In addition to using a cache memory during data retrieval, the processor may also write data directly to the cache memory instead of to the system or external memory. When the processor desires to write data to memory, an address tag comparison is made to determine whether the line into which data is to be written resides in the cache memory. If the line is present in the cache memory, the data is written directly into the line in cache. This event is referred to as a cache write "hit." A data "dirty bit" for the line is then set in an associated status bit (or bits). The dirty status bit indicates that data stored within the line is dirty (i.e., modified), and thus, before the line is deleted from the cache memory or overwritten, the modified data must be written into system or external memory. This procedure for cache memory operation is commonly referred to as "copy back" or "write back" operation. During a write transaction, if the line into which data is to be written does not exist in the cache memory, the data typically is written directly into the system memory. This event is referred to as a cache write "miss".
While cache memory devices have proven effective in reducing latency times in processors, there are certain memory devices which contain data that cannot be cached in a cache memory. Video and graphics cards are examples of devices that contain data that typically is not cacheable. CPU accesses to memory devices which contain non-cacheable data thus tends to be inefficient because the data cannot be stored in cache memory, but instead must be directly accessed from the slower memory device. Thus, despite the fact that cache memories do improve system efficiency and reduce CPU latency, there are a number of components in computer systems which are being accessed in an inefficient manner because the data stored in these devices is non-cacheable.
In addition to the problem with non-cacheable data, there are a number of devices which fall into the category of being "block-accessed" devices. A block-accessed device may be defined as a device which operates more efficiently when data is accessed in large blocks as compared to situations in which data is read as individual bytes or words. Some devices have a block-accessed nature inherently, such as DRAM memory devices. It is preferable to read data from DRAM in large blocks because of the high latency required for the first data access. Subsequent accesses to DRAM, by contrast, have a very high bandwidth transmission. Consequently, if a large block of data in DRAM is accessed, a very high average band width transmission is obtained. Conversely, if the access is to a single byte in DRAM, a low average bandwidth results. While DRAM is given as an example of a block-access device, one skilled in the art will understand that many other structures have similar operation, and would similarly benefit from a system which is capable of accessing large blocks of data.
Thus, there exist various block-accessed devices in computer systems which ideally should be accessed in large blocks whenever possible to maximize system efficiency. Despite the readily apparent advantages of accessing large blocks of data in block-accessed devices, there currently is no support structure in existing PC systems to efficiently handle data transfers involving block-accessed devices.