1. Field of the Invention
The present invention relates to interaction between a user mode (ring 3) process and a host channel adapter configured for communication with target channel adapters in an InfiniBand™ server system.
2. Background Art
Networking technology has encountered improvements in server architectures and design with a goal toward providing servers that are more robust and reliable in mission critical networking applications. In particular, the use of servers for responding to client requests has resulted in a necessity that servers have an extremely high reliability to ensure that the network remains operable. Hence, there has been a substantial concern about server reliability, availability, and serviceability.
In addition, processors used in servers have encountered substantial improvements, where the microprocessor speed and bandwidth have exceeded the capacity of the connected input/output (I/O) buses, limiting the server throughput to the bus capacity. Accordingly, different server standards have been proposed in an attempt to improve server performance in terms of addressing, processor clustering, and high-speed I/O.
These different proposed server standards led to the development of the InfiniBand™ Architecture Specification, (Release 1.0), adopted by the InfiniBand™ Trade Association. The InfiniBand™ Architecture Specification specifies a high-speed networking connection between end nodes (e.g., central processing units, peripherals, etc.) and switches inside a server system. Hence, the term “InfiniBand™ network” refers to a private system area network (SAN) that connects end nodes and switches into a cluster within a server system, enabling the sharing of cluster resources. The InfiniBand™ Architecture Specification specifies both I/O operations and interprocessor communications (IPC).
A particular feature of the InfiniBand™ Architecture Specification is the proposed implementation in hardware of the transport layer services present in existing networking protocols, such as TCP/IP based protocols. The hardware-based implementation of transport layer services, referred to as a “channel adapter”, provides the advantage of reducing processing requirements of the central processing unit (i.e., “offloading” processor code execution), hence offloading the operating system of the server system. Host channel adapters (HCAs) are implemented in processor-based nodes, and target channel adapters (TCAs) are implemented in peripheral-based nodes (e.g., network interface devices, mass storage devices, etc.).
However, arbitrary hardware implementations may result in substantially costly or relatively inefficient hardware designs. One example involves the servicing of work notifications, also referred to as “doorbells”. Doorbells are generated by verbs consumer processes (e.g., operating system supplied agents) that post a work request (e.g., a work queue entry (WQE)) to a prescribed queue of an assigned queue pair in system memory; the verbs consumer process then sends the work notification to notify the host channel adapter (HCA) of the work request in system memory.
One concern in implementing the servicing of work notifications is the susceptibility of the HCA to unauthorized work notifications. In particular, the InfiniBand™ Architecture Specification specifies that the verbs consumer processes may be implemented as “ring 0” (kernel mode) or “ring 3” (user mode) processes: kernel mode have unrestricted access to any hardware resource accessible by the operating system. Hence, a concern exists that if a malicious or malfunctioning process improperly accesses an unauthorized address, for example a work notification address assigned to a second verbs consumer process, such improper access may cause the HCA to erroneously determine that the second verbs consumer process generated a work notification. Hence, the susceptibility of HCA to unauthorized work notifications by a malicious or malfunctioning process may cause a reliability concern that affects HCA operations. Moreover, concerns arise that such a malicious or malfunctioning process may further affect the reliability of the overall server system, for example compromising security routines normally utilized to prevent unauthorized transmission of private data (e.g., credit card information, etc.) across a public network such as the Internet.
In view of the foregoing, there is a concern about providing an efficient arrangement enabling user mode processes to access InfiniBand resources without compromising security. In particular, the InfiniBand™ Architecture Specification indicates that the Operating System (OS) can provide its clients with communication mechanisms that bypass the OS kernel and directly access HCA resources. Hence, there is a need to provide a user mode process with access to HCA resources using kernel bypass, because: (1) the InfiniBand™ Architecture Specification allows no more than 5 microseconds for any transitions to kernel mode, and (2) existing user mode to kernel mode transitions cannot be completed within the 5 microseconds limit specified by the InfiniBand™ Architecture Specification.
FIG. 1 is a diagram illustrating the kernel bypass concept proposed by the InfiniBand™ Architecture Specification. In particular, a computing node 10 includes user mode (ring 3) socket applications and verbs consumer processes 12 configured for performing user operations (e.g., file system calls) without any knowledge of the HCA 14. System calls by the socket application 12 are intercepted by a sockets or virtual interface provider library (VIPL) applications programming interface (API) 16. Conventional kernel mode transition using existing OS resources involves passing the system call to a TCP/IP sockets provider 18 operating according to a user/kernel boundary 20. The TCP/IP sockets provider 18 accesses a TCP/IP transport driver 22, which references a driver 24 that needs to transition to kernel mode before accessing the HCA 14.
The computing node 10 also includes a dynamically linked library (DLL) 26 operating as a ring 3 (user mode) process configured for accessing a SAN management or VIPL driver 27 (requiring kernel transition by the driver 24).
As illustrated in FIG. 1, it is contemplated that the DLL 26 is able to perform kernel bypass operations 28 in order to “ring the doorbell” for the HCA 14. However, to date there has been no disclosure or suggestion on how to implement the proposed kernel bypass operations 28. In particular, there is no disclosure or suggestion on how the SAN sockets/VIPL provider 26 can communicate with system memory 29 to deposit descriptors of work, and then ring the HCA doorbells, uniquely for each ring 3 process 12, without requiring the services of kernel mode software.
In addition, concerns exist about any proposed implementation for kernel bypass operations 28 that require a substantial amount of processing code to be added, reducing HCA performance throughput.
Page-based addressing has been used in processor architectures, for example the Intel-based x86 architectures, to reconcile differences between physical address space and virtual address space. For example, a personal computer capable of addressing 512 Mbytes may only have 128 Mbytes of installed memory; the operating system uses memory segments divided into discrete blocks, referred to as pages, that can be transferred between the physical memory and virtual memory allocated on a hard disk. Hence, the attempted execution of executable code that does not reside in physical memory results in generation of a page fault exception, causing the processor to swap unused pages in physical memory with the pages in virtual memory containing the required executable code. However, different processes still may access the same physical page of memory, since the operating system typically will provide processes a common mapping between the virtual page address and the physical memory address in I/O address space to enable the processes to access the same I/O device control registers of an I/O device within the I/O address space.