The virtual interface architecture (VIA) has been jointly developed by a number of computer and software companies. VIA provides consumer processes with a protected, directly accessible interface to network hardware, termed a virtual interface. VIA is especially designed to provide low latency message communication over a system area network (SAN) to facilitate multi-processing utilizing clusters of processors.
A SAN is used to interconnect nodes within a distributed computer system, such as a cluster. The SAN is a type of network that provides high bandwidth, low latency communication with a very low error rate. SANs often utilize fault-tolerant technology to assure high availability. The performance of a SAN resembles a memory subsystem more than a traditional local area network (LAN).
The VIA is described in the Virtual Interface Architecture Specification, Draft Revision 1.0, Dec. 4, 1997. The VI Architecture is comprised of four basic components: Virtual Interfaces, Completion Queues, VI Providers, and VI Consumers. The VI Provider is composed of a physical network adapter and a software Kernel Agent. The VI Consumer is generally composed of an application program and an operating system communication facility. The organization of these components is illustrated in FIG. 1.
A VI is depicted in FIG. 2 and consists of a pair of Work Queues: a send queue and a receive queue. VI Consumers post requests, in the form of Descriptors, on the Work Queues to send or receive data. A Descriptor is a memory structure that contains all of the information that the VI Provider needs to process the request, such as pointers to data buffers.
The VI Provider is the set of hardware and software components responsible for instantiating a Virtual Interface. The VI Provider consists of a network interface controller (NIC) and a Kernel Agent (KA).
The VI NIC implements the Virtual Interfaces and directly performs data transfer functions. The NIC provides an electromechanical attachment of a computer to a network. Under program control, a NIC copies data from memory to the network medium, i.e., transmission, and from the medium to the memory, i.e., reception.
The Kernel Agent is a privileged part of the operating system, usually a driver supplied by the VI NIC vendor, that performs the setup and resource management functions needed to maintain a Virtual Interface between VI Consumers and VI NICs. These functions include the creation/destruction of VIs, VI connection setup/teardown, interrupt management and/or processing, management of system memory used by the VI NIC, and error handling. VI Consumers access the Kernel Agent using standard operating system mechanisms such as system calls. Kernel Agents interact with VI NICs through standard operating system device management mechanisms.
The VI Architecture requires the VI Consumer to identify memory used for a data transfer prior to submitting the request. Only memory that has been registered with the VI Provider can be used for data transfers. This memory registration process allows the VI Consumer to reuse registered memory buffers, thereby avoiding duplication of locking and translation operations. Memory registration also takes this processing overhead out of the performance-critical data transfer path.
Memory registration enables the VI Provider to transfer data directly between the buffers of a VI Consumer and the network without copying any data to or from intermediate buffers.
Memory registration consists of locking the pages of a virtually contiguous memory region into physical memory and providing the virtual to physical translations to the VI NIC. The VI Consumer gets an opaque handle for each memory region registered. The VI Consumer can reference all registered memory by its virtual address and its associated handle.
Memory is registered with the VI NIC for two reasons:
1) to allow the NIC to perform virtual to physical address translation PA1 2) to allow the NIC to perform protection checking. PA1 a) a valid indication bit PA1 b) a physical page address PA1 c) a protection tag PA1 d) an RDMA Write Enable Bit PA1 e) an RDMA Read Enable Bit PA1 f) a Memory Write Enable Bit
Consumers are able to use virtual addresses to refer to VI Descriptors and communication buffers. The VI NIC is able to translate from virtual to physical addresses through the use of its Translation and Protection Table (TPT). The TPT of the NIC described in the VIA Specification resides on the NIC in order to assure fast, noncontentious access and because it is accessed during performance critical data movement. A TPT and method of accessing the TPT are depicted in FIG. 3. The fields of each TPT entry are:
The size of the TPT is configurable. There is one entry in the TPT for each page that can be registered by the user. A memory region of N contiguous virtual pages consumes N contiguous entries in the TPT.
When a memory region is registered with the NIC, the Kernel Agent allocates a contiguous set of entries from the TPT and initializes them with the corresponding physical page addresses and protection tag specified by the process that registered the memory region. The protection tag specified by the process when it creates a VI is stored in the context memory of the VI. The NIC has access to the protection tag in both of these areas, allowing it to compare these values to detect invalid accesses. Page sizes larger than 4 KB are supported and page size may differ among nodes of the SAN.
The above-described implementation of the TPT has several disadvantages. If TPT entries are allowed to exist anywhere in memory, an application could set-up bogus TPT entries which point to any physical address. A RDMA Write descriptor could then be set up, given appropriate Virtual Address and Memory Handle to use this bogus TPT entry and scribble anywhere in memory. The standard solution is to limit the locations of legal TPT entries. The requirement of allocation of contiguous memory to facilitate bounds checking consumes a large amount of memory. Another problem resulting from the standard solution is that it may lead to fragmentation of entries in the TPT which can result in a failure when attempting to find multiple consecutive entries required when registering large memory regions.
The fragmentation problem is illustrated in FIG. 4 which depicts an exaggerated example where the TPT range is limited to only eight entries. There are three active registered memory regions, with TPT owner IDs X, Y, and Z, which differentiate the registered memory regions. An application cannot register a new two page memory region, Mem Region 4, because, due to previous fragmentation of the TPT, no two TPT entries are contiguous. Thus, Mem Region 4 cannot be registered even though there are three available entries in the TPT.
If the Memory Handles could be reassigned, then larger contiguous sets of free locations could be found. Unfortunately, this is not possible because the Memory Handles returned to the applications earlier are already in use in descriptors and it would be undesirable to stop VI processing and update all the descriptors.