The present invention concerns the use of access hints for input/output address translation mechanisms.
Most modern computer systems include a central processing unit (CPU) and a main memory. The speed at which the CPU can decode and execute instructions and operands depends upon the rate at which the instructions and operands can be transferred from main memory to the CPU. In an attempt to reduce the time required for the CPU to obtain instructions and operands from main memory many computer systems include a cache memory between the CPU and main memory.
A cache memory is a small, high-speed buffer memory which is used to hold temporarily those portions of the contents of main memory which it is believed will be used in the near future by the CPU. The main purpose of a cache memory is to shorten the time necessary to perform memory accesses, either for data or instruction fetch. The information located in cache memory may be accessed in much less time than information located in main memory. Thus, a CPU with a cache memory needs to spend far less time waiting for instructions and operands to be fetched and/or stored.
A cache memory is made up of many blocks of one or more words of data. Each block has associated with it an address tag that uniquely identifies which block of main memory it is a copy of. Each time the processor makes a memory reference, an address tag comparison is made to see if a copy of the requested data resides in the cache memory. If the desired memory block is not in the cache memory, the block is retrieved from the main memory, stored in the cache memory and supplied to the processor.
In addition to using a cache memory to retrieve data from main memory, the CPU may also write data into the cache memory instead of directly to the main memory. When the processor desires to write data to the memory, the cache memory makes an address tag comparison to see if the data block into which data is to be written resides in the cache memory. If the data block exists in the cache memory, the data is written into the data block in the cache memory. In many systems a data "dirty bit" for the data block is then set. The dirty bit indicates that data in the data block is dirty (i.e., has been modified), and thus before the data block is deleted from the cache memory the modified data must be written into main memory. If the data block into which data is to be written does not exist in the cache memory, the data block must be fetched into the cache memory or the data written directly into the main memory. A data block which is overwritten or copied out of cache memory when new data is placed in the cache memory is called a victim block or a victim line.
For further related information on cache memories, see U.S. Pat. No. 4,928,239 issued on May 22, 1990 to Allen Baum, et al., for Cache Memory with Variable Fetch and Replacement Schemes.
Input/output (I/O) adapters which interact with memory need to be designed to integrate with all features of the computing system. To this end, address translation maps within the I/O adapters are often used to convert I/O bus addresses to memory addresses. Such address translation maps have been used when the I/O bus address range is smaller than the memory address range, so that I/O accesses can reference any part of memory.
In the prior art, I/O address translation maps have been managed by software. Each entry in the address translation map is explicitly allocated and loaded by operating system software. When an I/O adapter accesses the main memory in a system where one or more processors utilizes a cache, it is necessary to take steps to insure the integrity of data accessed in memory. For example, when the I/O adapter accesses (writes or reads) data from memory, it is important to determine whether an updated version of the data resides in the cache of a processor on the system. If an updated version of the data exists, something must be done to insure that the I/O adapter accesses the updated version of the data. An operation that assures that the updated version of the data is utilized in a memory references is referred to herein as a coherence operation.
Various schemes have been suggested to insure coherence of data accessed by an I/O adapter from the system memory. For example, previous schemes have included software explicitly flushing caches prior to performing I/O operations.
When designing an I/O adapter, it is necessary to analyze the computing system to assure the I/O adapter is optimized for the computing system. Some I/O buses require that data transactions be atomic. That is, other transactions need to be "locked out" during atomic data transactions. An I/O adapter which interfaces with such an I/O bus needs to be able to implement this feature. However, when an I/O adapter performs atomic transactions, this slows down system performance.
Likewise, when data is stored sequentially, it is an advantage if the I/O adapter pre-fetches data when reading from memory and to coalesce data when writing to memory. Pre-fetching means fetching data before a request for the data is actually made. Generally, data is selected for prefetch because it sequentially follows data that is actually accessed. Coalescing is used when data arrives from the I/O bus in units which are smaller than the units which can be transferred onto the memory bus. For example, coalescing can be used for blocks which are smaller than a cache line. Data transfer is made more efficient by coalescing the blocks received from the I/O bus into a cache line before forwarding the entire cache line over the memory bus to the memory. When data is not stored sequentially, such prefetching and coalescing can be a hindrance to system performance.
Sometimes it is an advantage on data transactions to invalidate all corresponding data in system caches. Other times, it is preferable to update cache on writes or cleanse cache data on reads. Further, for systems in which data is aligned on cache line boundaries, "fast" DMA can be performed relying on the alignment. However, for systems in which data is not aligned on cache line boundaries, it is necessary to perform a slower "safe" DMA.
When an I/O adapter is designed to facilitate a number of different uses within a computing system, it is desirable for the I/O adapter to run optimally with each usage and when performing all types of data transactions. Such a versatile I/O adapter is not found in the prior art.