1. Field of the Invention
This invention relates to accesses to memory by input/output devices and more particularly to providing more efficient accesses to system memory.
2. Description of the Related Art
Computer systems rely on complex logic to communicate data within and between its various subsystems. Referring to FIG. 1, which illustrates relevant aspects of an exemplary prior art computer system, memory controller 10 resides on an integrated circuit 12 coupled between main memory 30 and a central processing unit (CPU) (not shown). Integrated circuit 12 typically includes an input/output (I/O) controller circuit 20 that interfaces to an I/O channel such as the Peripheral Component Interconnect (PCI) bus. The memory controller controls access to the main memory by various components of the computer system. For example, read requests from the I/O channel are provided to the memory controller circuit 10 which then accesses memory 30 to retrieve the requested data.
Note that in addition, to main memory, xe2x80x9csystem memoryxe2x80x9d in computer systems may also include cache memory. It is fairly typical for the CPU to have access to one or two levels of cache memory which provide the CPU local copies of data stored in the main memory. The availability of the local copies of data (and instructions) speeds up memory access by the CPU since the CPU only has to access the local copy rather than main memory. A variety of techniques known in the art maintain coherency between the cache memories and the main memory. When a read request is received over the I/O channel for data in main memory, the cache memories are xe2x80x9csnoopedxe2x80x9d to determine if an updated version of the data is available. The most up to date data is then provided in response to the read request.
The access to main memory is an important aspect of the computer system that requires a high level of efficiency to ensure good system performance. One problem with main memory is that it is almost always dynamic random access memory (DRAM), and read accesses to this type of memory require a primary slow read access cycle, where the entire memory xe2x80x98pagexe2x80x99 is accessed, and then after that period (but before any other page is accessed) additional read accesses can occur very quickly as long as they are all addressed in that existing xe2x80x98pagexe2x80x99 (where page size is DRAM-type and size dependent). That fact encourages designs that read as much data from a DRAM as possible at one time, in a contiguous burst to reduce the average word access time.
Many CPU and CPU interface chips utilize this block read process to retrieve instructions for processors to execute, because most modern processors retrieve instructions in blocks as described above. Such accesses for blocks of instructions take a longer time (page access time) to access the first instruction in the block, e.g., several bus clock cycles, but then the remaining accesses are each very quick, taking only, e.g., one additional clock per access. That feature is further used by current cache controller designs that access memory only in cache xe2x80x98linexe2x80x99 sizes which are groups of data sized to the operation of the internal processor cache.
Because accesses to main memory can be of different types and sizes, some individual accesses and some block accesses as described above, the memory controller tailors its access to the type of request. It would be inefficient to read an entire cache line worth of memory (or even more), if only the first word was needed. Such an extraneous access, even if seemingly an efficient way to retrieve data, actually wastes memory and bus bandwidth. Also, non-processor accesses, from mastering devices on external busses, such as the PCI bus, Advanced Graphics Port (AGP) and Industry Standard Architecture (ISA), as well as the DMA (direct memory access) controller in personal computer (PC) systems, access memory in different ways. Accordingly, memory controllers have typically accepted non-CPU accesses to memory as either all individual accesses or as all block accesses. If the memory controller treats non-CPU accesses as all individual accesses, each access requires a re-issuance of each individual memory request. If the memory treats all non-CPU accesses as block accesses then the memory controller may end up reading more data than is needed, thus wasting bandwidth and memory resource.
When certain I/O devices require access to main memory, especially when reading, their accesses are mostly to sequential words in memory. Slower I/O devices may frequently read sequential words of data but read them one at a time. Such slower I/O devices issue a new read request for each sequential word of read data requested by the I/O device. Such slower I/O based accesses therefore always require a xe2x80x98newxe2x80x99 access by the memory controller to service the I/O controller request (irrespective of the memory-access policy of the memory controller). To service such memory requests, the memory controller accesses the memory using an initial access type (primary slow read access) and then continues to use the slow initial access type for each subsequent read request even though each read cycle may access the same DRAM page. The re-requesting of the same memory page is costly and robs bandwidth from other peripherals and the CPU.
Additionally, since most memory controllers are already optimized to read a full CPU cache line of data (typically four 32-bit words) at once, the additional time used to access a full CPU cache line is further wasted for each single word read request, as the I/O device uses only the single word of data, and a whole cache line (maybe the same cache line) is re-requested for each of the contiguous words of data requested by the I/O device. Thus, reading eight sequential words may require reading the same cache line eight times.
It would therefore be desirable to have an approach for I/O accesses that accounts for the type of I/O access being made (block or single word) and makes efficient use of data that has already been accessed.
In one embodiment, the invention provides an integrated circuit coupled between system memory of a computer system and an input/output channel such as the PCI bus. The integrated circuit includes an input/output request circuit that receives a read request for data in system memory from an input/output device over the input/output channel. A read ahead buffer which is coupled to the input/output request circuit, stores data from a previous read access to system memory. The input/output request circuit is coupled to selectively provide data from either the system memory or the read ahead buffer in response to the read request. The read ahead buffer is maintained as non coherent memory with respect to system memory.
In another embodiment, a method of operating a computer system is provided. The computer system includes a read ahead buffer coupled to a memory controller and an input/output controller coupled to an input/output channel. An I/O device provides an initial read request over the input/output channel which specifies an address in system memory. The memory controller retrieves an amount of data from system memory larger than specified by the read request and provides the requested data to the input/output channel and thus to the I/O device. At least a portion of the data retrieved from system memory is stored in the read ahead buffer. The read ahead buffer is marked as valid and identified by at least a portion of the address specified in the read request. When the same I/O device performs a subsequent read access, the I/O request circuit determines whether at least a portion of the address of the subsequent read request matches the portion of the address identifying the read ahead buffer and provides a tag match signal as an indication thereof. Data is then selectively provided from either the read ahead buffer or system memory to the input/output device in response to the second read request according to the tag match signal and the valid indication.