1. Field of the Invention
Embodiments of this invention relate generally to field of computer processing and more specifically relate to discovery and capability exchange management in a virtualized computing environment that is utilizing a SR-IOV adapter.
2. Description of the Related Art
Within the computing industry, significant effort has been expended to increase the effective hardware resource utilization (i.e., application execution, etc.) through the use of virtualization technology. The Single Root I/O Virtualization and Sharing Specification (SR-IOV) defines extensions to the PCI Express (PCIe) specification suite to enable multiple System Images (SI) to share PCI hardware resources.
The generic platform configuration comprises a processor (i.e. general purpose, embedded, or specialized processing element, etc.), a memory (i.e. general purpose, embedded, etc.), a PCIe Root Complex (RC), a PCIe Root Port (RP) where each RP represents a separate hierarchy per the PCI Express Base Specification. Each hierarchy is referred to a single root hierarchy to delineate it from the multiple hierarchy technology defined within the Multi Root I/O Virtualization Specification, a PCIe Switch which provides I/O fan-out and connectivity, a PCIe Device or adapter (e.g., network adapter, storage adapter, etc.), a System Image or software such an operating system that is used to execute applications, trusted services, such as a shared or non-shared I/O device driver, a Single Root PCI Manager (SR-PCIM) Software that is responsible for the configuration of the SRIOV capability, management of Physical Functions and Virtual Functions, and processing of associated error events and overall device controls such as power management and hot-plug services, a Physical Function (PF) which is a PCIe Function (per the PCI Express Base Specification) that supports the SR-IOV capability and is accessible to an SR-PCIM, a VI, or an SI, and a Virtual Function (VF) which is a PCIe Function that is directly accessible by an SI.
In order to increase the effective hardware resource utilization without requiring hardware modifications, multiple Sis can be executed. Software termed a Virtualization Intermediary (VI) is interposed between the hardware and the SI. The VI takes sole ownership of the underlying hardware. Using a variety of methods outside of the scope of the standard, the VI abstracts the hardware to present each SI with its own virtual system. The actual hardware resources available to each SI can vary based on workload or customer-specific policies. While this approach works well for many environments, I/O intensive workloads can suffer significant performance degradation. Each I/O operation—inbound or outbound—must be intercepted and processed by the VI adding significant platform resource overhead.
To reduce platform resource overhead, PCI-SIG® developed SR-IOV technology having the following benefits: The ability to eliminate VI involvement in main data movement actions—DMA, Memory space access, interrupt processing, etc.; Elimination of VI interception and processing of each I/O operation can provide significant application and platform performance improvements; Standardized method to control SR-IOV resource configuration and management through Single Root PCI Manager (SR-PCIM); The ability to reduce the hardware requirements and associated cost with provisioning potentially a significant number of I/O Functions within a device; The ability to integrate SR-IOV with other I/O virtualization technologies such as Address Translation Services (ATS), Address Translation and Protection Table (ATPT) technologies, and interrupt remapping technologies to create a robust, complete I/O virtualization solutions.
For more information about SR-IOV please refer to the “Single Root I/O Virtualization and Sharing Specification Revision 1.0”, herein incorporated by reference in its entirety.
The data center discovery and capability exchange protocol (DCBX) is used by Data Center Bridging (DCB) devices to exchange configuration information with directly connected peers. DCB devices have certain capabilities for supporting multiple traffic classes on a single switch/port. The DCBX protocol may also be used for misconfiguration detection and for configuration of the peer.
DCBX is used to determine the capabilities of the peer device. It is a means to know if the peer device supports a particular feature such as Priority Groups (PG) or Priority-based Flow Control (PFC). For example, it can be used to determine if two link peer devices support PFC. DCBX can be used to detect misconfiguration of a feature between the peers on a link. Misconfiguration detection is feature-specific because some features may allow asymmetric configuration. DCBX can be used by a device to perform configuration of DCB features in its link peer.
Each DCB feature has a set of parameters. DCB parameters are classified into two broad categories: Exchanged Parameters and Administered Parameters. Exchanged parameters are sent to the peer. Within the Exchanged parameter group, there are two sub-groups: Administered parameters and Operational Parameters. Administered parameters are the configured parameters. Operational parameters are associated with the operational state of the related administered parameter. Operational state might be different than the administrative/configured state, primarily as a result of the DCBX exchange with the peer. Operational parameters accompany those administered parameters where there is a possibility that the operational state is different from what was set by their administrator. The operational parameters may be included in the Link Layer Discovery Protocol (LLDP) message for informational purposes. It might be used by a device to know what is the current operational state of the peer. Local parameters are not exchanged in LLDP messages.
DCBX uses LLDP to exchange parameters between two link peers. LLDP is a unidirectional protocol. It advertises connectivity and management information about the local station to adjacent stations on the same IEEE 802 LAN.
DCBX is defined as a DCBX control state machine and a set of DCB feature state machines. The DCBX control state machine ensures that the two DCBX peers get in sync by exchanging LLDPDUs after link up or following a configuration change. The DCB feature state machines handle the local operational configuration for each feature by comparing and synchronizing with the peer's feature settings. The DCBX Control state machine uses a DCBX Control sub-TLV (Type Length Value) to exchange information with the peer. In addition, it maintains some additional local state variables to manage the state machine operation.
For more information about DCBX please refer to the “DCB Capability Exchange Protocol Base Specification, Rev. 1.01”, herein incorporated by reference in its entirety.