The invention relates to the field of input-output (I/O) address space mapping hardware for computer systems. In particular, the invention relates to I/O address translation devices capable of automatically maintaining coherency in computer systems having multiple DMA-capable peripherals and potentially multiple processors.
Most modern computer systems use virtual memory addressing. These systems compute effective addresses of operands in a virtual memory space. An address translation system then maps portions of the virtual memory space to actual data storage. The actual data storage typically includes one or more swap files or partitions as well as RAM memory. The advantages of these systems are well known in the art.
Virtual memory systems typically allocate memory in discrete blocks known as pages. When a program references memory at a particular virtual address, that address is translated to determine whether the associated real memory page is in memory, or located in a swap file. If the address corresponds to data in a swap file, real memory is allocated and loaded from the swap file. When the address corresponds to data in real memory (possibly after loading memory from a swap file), the address is translated to a physical memory address in the associated page of real, physical, memory. The reference is then allowed to take place.
A subset of these modern computer systems also use virtual addressing for Input-Output (I/O) devices. In these systems, a virtual I/O address space is created for Direct Memory Access (DMA) operations. This virtual I/O address space is typically used for I/O buffers, including disk data buffers. These buffers may each comprise contiguous locations in virtual I/O address space, but may map to discontiguous locations in physical memory. Typically, DMA-capable peripherals such as disk interface adapters transfer blocks of data between themselves and memory, generating virtual I/O addresses to address the memory.
A subset of computer systems supporting virtual I/O addressing, including those supporting the InfiniBand I/O interconnect system (InfiniBand documents are available at http://www.infinibandta.org), also support Remote Direct Memory Access (RDMA) operations. In these systems, DMA operations may be initiated by peripherals that are remotely located, they may, but need not, be physically located in the same cabinet as the processor and memory. The InfiniBand architecture requires support for RDMA read, write, and atomic operations to buffers in a virtual I/O address space. The term DMA-capable herein includes RDMA-capable devices.
Such systems have hardware and software for address translation between DMA-capable I/O devices and physical memory to support virtual I/O addresses. I/O address translation of this type is a requirement of systems supporting the InfiniBand I/O architecture.
Systems supporting address translation between DMA-capable I/O devices and system memory allow for rapid and convenient dynamic allocation of memory buffers and I/O cache space. In these systems, it is possible to configure a DMA-capable I/O device, such as a disk drive, to transfer a block of data. Address translation software and hardware then allows the transfer to take place even if the block of data straddles multiple discontinuous pages of real memory.
An I/O adapter having address translation capability is described in U.S. Pat. No. 5,784,708, hereinafter the ""708 patent, the disclosure of which is hereby incorporated by reference. The input-output adapter of the ""708 patent incorporates an I/O Translation Lookaside Buffer (TLB). A TLB is a hardware mapping device that stores part, or all, of a page table used for address translation. Typically, a TLB stores one or more entries of the page table, thereby permitting rapid access to those page table entries, while the full page table is stored in memory. When an I/O device specifies target addresses in memory, those addresses are checked against the page table entries stored in the TLB; if a corresponding page table entry is found those addresses are translated according to the page table entry and the requested transfer occurs. If no corresponding page table entry is found, the TLB is updated from memory with the corresponding page table entry before the transfer occurs.
Modern computer systems often have multiple DMA-capable I/O devices, and may have multiple processors. These systems may have multiple I/O TLBs. It is known that page tables of such systems may change as the system operates and memory pages are allocated, moved, pushed out into swap file, fetched from swap files, and deallocated. It is also known that page tables of such systems may change as a video bitmaps are replaced with updated, alternate, bitmaps.
A TLB is considered coherent with a page table in memory if each map entry in the TLB refers to the same place in real memory or swap file as corresponding entries in the page table data in memory. Multiple TLBs are considered coherent with each other if each is coherent with the same page table in memory.
It is desirable that each I/O TLB of the system be kept coherent with its associated page table data in memory, regardless of whether each I/O TLB uses a common or a unique page table in memory.
It is known that the coherency of I/O TLBs with their associated page tables in memory can be maintained through software, and this is common practice in the art. Software maintenance of coherency can, however, consume considerable processor cycles, during which time the processor is unable to perform other useful work. It is therefore desirable to improve, indeed automate, the maintenance of TLB coherency.
Coherency is also required in cache memory of computer systems, especially in those having multiple processors. There are several known protocols for maintaining cache coherency in multiple processor cache-equipped systems. One such protocol is known as xe2x80x9csnooping.xe2x80x9d Snooping involves having each processor cache of the system monitor a bus for cacheline accesses by other caches of the system. When a cache sees an access to a cacheline it holds, it returns or invalidates its copy of the cacheline if necessary.
Another known protocol for maintaining cache coherency is xe2x80x9cdirectory-basedxe2x80x9d. In these systems, a central controller maintains a cache directory, a directory of cachelines xe2x80x9cownedxe2x80x9d by each cache in the system; a cacheline being xe2x80x9cownedxe2x80x9d by a cache if that cache has a copy of associated data. All accesses of data in the system are cleared against the cache directory to ensure that xe2x80x9cdirtyxe2x80x9d data is written to memory and to ensure that prior owning caches are invalidated as needed. This may be done by messaging each prior owning cache when ownership of a cacheline changes.
Another form of directory-based cache coherency protocol is one in which a memory system stores a cache state for each cacheline. This cache state includes information about caches owning the associated cacheline, and is maintained by a system controller. With this system, the system controller messages each prior owning cache when ownership of a cacheline changes.
While snoop-based and directory-based coherency maintenance mechanisms have historically been used to maintain cache coherency, they are not typically used to maintain coherency of address mapping hardware such as TLB""s.
A bridge is a device that couples through a first port to a first computer interconnect and through a second port to a second computer interconnect. The first and second interconnect may, but need not, be of different types, and are often, but need not be, busses. Bridges transfer data between the first and second ports. For example, a typical computer system has a host bridge that couples a processor""s local bus to a PCI or other I/O local bus, and perhaps a second bridge that couples the PCI bus to an ISA bus. A bridge may also couple the interconnect of a host computer to the internal interconnect of a peripheral device. For example, a peripheral device card may incorporate a bridge that couples a PCI bus and connector to an internal PCI bus, the internal PCI bus couples a processor and other devices local to the peripheral device.
In modern computer systems, a memory system may attach directly to a processor""s local bus, or the memory controller and host bridge may be integrated as a single component.
Bus bridges typically include at least a first and a second interconnect interface, with state machines for transmitting and receiving data on either interconnect interface. Bus bridges typically have address window hardware, such that they respond to a particular set of addresses on the first bus interface, and to a particular set of addresses on the second bus interface. They also have buffers capable of receiving data on the first bus interface, and transmitting it on the second bus interface, and of receiving data on the second bus interface, and transmitting it on the first bus interface.
I/O address translation logic can be located in a bus bridge. Systems exist wherein I/O address translation logic is located in a host bridge, where the I/O translation logic serves to translate virtual I/O addresses received over the I/O bus into physical addresses in the memory system.
A computer system is provided with I/O address mapping hardware. In a particular embodiment, this I/O address mapping hardware is located in a host bridge. In a particular embodiment, this address mapping hardware incorporates an I/O TLB.
The address mapping hardware is provided with apparatus for maintaining coherency between its mapping entries, and a page table in memory. In a particular embodiment, this apparatus for maintaining currency includes apparatus for snooping.
In another embodiment, a host channel adapter has I/O address mapping hardware that incorporates an I/O TLB. In this embodiment, the I/O address mapping hardware cooperates with a directory-based coherency protocol, such that mappings stored in the I/O TLB are invalidated when their corresponding page table entries in memory are modified by some other cache in the system.