In a PCIe system, a root complex entity utilizes PCIe ports to connect a host processor and its memory subsystem to one or more endpoints. The host memory subsystem can include a number of caching levels and off-chip RAM memory. The one or more endpoints often provide interfaces to non-cabled endpoints or external cabled communications links, such as Ethernet. Alternatively, in a PCIe based server, the one or more endpoints provide access to bulk storage devices, such as hard disk drives, solid-state storage drives, or other types of storage media. Bulk storage devices may be connected directly to a host processor and may communicate with the host processor utilizing a PCIe protocol. Alternatively, bulk storage devices may be connected to a host processor via a PCIe bridge and may communicate utilizing a “storage specific protocol”, such as Serial Attached Small Computer System Interface (SAS) protocol or Serial Advanced Technology Attachment (SATA) protocol. When a required number of endpoints exceed the number of PCIe ports that are natively available from a computer chipset of a host processor, an external PCIe switch provides port expansion.
An external PCIe switch passes PCIe transactions between a root complex entity and endpoints, or between two endpoints within a single domain. Examples of PCIe transactions include request transactions, such as a Memory Read Request and a Memory Write Request, response transactions, such as Completions, and configuration transactions, such as a Configuration Read Request and a Configuration Completion. Generally, a domain includes exactly one root complex entity, and the root complex entity is responsible for enumerating all switch ports and endpoints in a particular domain. Enumeration refers to the discovery and numbering of buses, for example by reading a vendor ID and device function. A PCIe switch with a single domain may also be referred to as a transparent PCIe switch because the PCIe switch allows configuration transactions from a root complex entity to pass to all endpoints. The PCIe switch also allows transactions to pass between any initiator and target attached to the PCIe switch.
An example of a known PCIe system 100 that includes a transparent PCIe switch 102 is shown in FIG. 1. In the PCIe system 100, a host processor 104 is connected to multiple endpoints EP1, EP2, EP3, EP4, . . . , EPn through the transparent PCIe switch 102. The transparent PCIe switch 102 includes an upstream switch port 106, and multiple downstream switch ports DSSP1, DSSP2, DSSP3, DSSP4, . . . , DSSPn. The host processor 104 includes a root complex entity 108. The root complex entity 108 of the host processor is connected to the upstream switch port 106 of the transparent PCIe switch 102. Each downstream switch port DSSP1, DSSP2, DSSP3, DSSP4, . . . , DSSPn of the transparent PCIe switch 102 is connected to a respective one of the endpoints EP1, EP2, EP3, EP4, . . . , EPn. As shown in FIG. 1, the endpoints EP1, EP2, EP3, EP4, . . . , EPn, which are physical storage devices, such as hard disk drives or solid state drives, are external to the transparent PCIe switch 102. Alternatively, when one or more of the endpoints EP1, EP2, EP3, EP4, . . . , EPn are not physical storage drives, these endpoints may be internal to the transparent PCIe switch 102.
In contrast to the PCIe system 100 shown in FIG. 1, a PCIe bridge is a component of a PCIe system that supports translation, such as between protocols or between virtual domains. There are generally two types of PCIe bridges. The first type is a PCIe bridge that changes a communication protocol between an initiator and target. For example, a PCIe to PCI-X bridge performs protocol conversion between two different protocol standards to allow communication between an initiator and a target.
The second type is a PCIe bridge that allows transactions to pass between two distinct and separate Virtual Switches, also known as PCIe switch domains. PCIe switch domains are used, for example, to provide domain isolation, such as electrical and logical isolation of processor domains. A Non-Transparent PCIe Bridge (NTB) is an example of the second type of PCIe bridge. In an NTB, two or more entirely separate PCIe switch domains, each with a host root complex entity, may communicate with each other and may share communications with any number of endpoints in either PCIe switch domain. A significant limitation of a NTB is that a root complex entity on one side of the bridge is unaware of endpoints, or another root complex entity, on an opposing side of the bridge. The development of custom host drivers that utilize a switch vendor proprietary mechanism is required to communicate information about a PCIe switch domain to a host processor behind the NTB bridge.
An example of a known PCIe system 200 that includes a single domain PCIe switch 202 with Non-Transparent Bridge support is shown in FIG. 2. The single domain PCIe switch 202 connects two host processors 204, 206 to multiple endpoints EP1, EP2, EP3, EP4, . . . , EPn.
The single domain PCIe switch 202 includes an upstream switch port 208, a non-transparent bridge (NTB) 210, a switch crossbar 212, and multiple downstream switch ports DSSP1, DSSP2, DSSP3, DSSP4, . . . , DSSPn. The upstream switch port 208 and the NTB 210 are each connected to PCIe switch routing, or PCIe switch crossbar, 212. Each downstream switch port DSSP1, DSSP2, DSSP3, DSSP4, . . . , DSSPn is also connected to the switch crossbar 212 and to a respective one of the storage drives EP1, EP2, EP3, EP4, . . . , EPn.
The host processor 204 includes a root complex entity 214 that is connected to the upstream switch port 208 of the single domain PCIe switch 202. The host processor 206 includes a root complex entity 216 that is connected to the NTB 210 of the single domain PCIe switch 202.
The NTB 210 allows the root complex entity 216 to communicate with PCIe targets within this single domain by presenting two endpoints 218, 220, each with associated Base Address Registers (BARs), memory windows, and address translation between the PCI Address spaces. Other features such as doorbell registers to support messaging between domains may also be supported.
While the PCIe protocol can provide access to individual storage drives or endpoints, there are often situations in which it is desirable to use a redundant array of inexpensive disks (RAID) system. A storage RAID system generally includes one or more host devices, a RAID controller, and two or more storage drives. In general, separate communication protocols are utilized for the host device to communicate with the RAID controller, the RAID controller to communicate with the storage drives. The RAID controller presents one or more RAID volumes to the host device. The RAID volumes are virtual storage elements that may bear no resemblance to a physical storage drive topology.
A host device is typically interconnected with a RAID controller utilizing a PCIe interconnect protocol, while the RAID controller may be interconnected with storage drives utilizing another protocol, such as the SAS protocol, the SATA protocol, or the PCIe protocol. When the PCIe protocol is utilized to connect a RAID controller to the multiple storage drives, inherent problems exist due to the domain switching and address ranges that are addressed.
A generic block diagram for a RAID system is shown in FIG. 3. The RAID system 300 includes a host system 302, two storage drives 304, and a RAID controller 306 that connects the host system 302 to the two storage drives 304. The RAID controller 306 connects and presents the two storage drives 304 as a logical unit, or volume, to the host system 302. The logical unit, as seen by the host system 302, bears no resemblance to the physical typology of the storage drives 304. Data is distributed to the two storage drives 304 by the RAID controller 306 to improve redundancy and/or performance as compared to using only a single storage drive.
The host system 302 does not directly address or access the two storage drives 304 but rather communicates with them through the RAID controller 306. The RAID controller 306 provides redundant protocol algorithms, virtualizes transactions between the host system 302 and the two storage drives 304, addresses of the two storage drives 304, and provides error handling. The RAID controller 306 includes a controller host interface 308, a RAID processing engine 310, and a drive bridge 312.
The host system 302 communicates with the controller host interface 308 of the RAID controller 306 using a protocol, such as the PCIe protocol. The RAID processing engine 310 may communicate with the storage drives 304 through the drive bridge 312 using a protocol, such as the PCIe protocol, the SAS protocol, or the SATA protocol. The drive bridge 312 performs the translation between the protocols. The RAID processing engine 310 may also provide read and/or write caching of data from the two storage drives 304. The RAID processing engine 310 may also temporarily stage data that passes between the host system 302 and the two storage drives 304, which increases the latency of transactions between the host system 302 and the two storage drives 304.
RAID storage systems may benefit from the availability of high performance, low latency PCIe based storage drives. However the traditional RAID controller architecture and the existing PCIe based switching solutions either fail to meet the requirements of a RAID system, or significantly decrease the performance benefits of RAID systems that use low latency PCIe Drives.
In a storage RAID system, the host system must not directly address or access the physical storage drives. Instead, the host system must only see virtualized drives comprised of RAID volumes inside a RAID controller.
For SAS or SATA based storage systems, this virtualization is achieved by having logical protocol separation between the host bridge and the controller bridge. When SAS or SATA based storage drives are simply replaced by PCIe based storage drives, the I/O data needs to be staged temporarily within a RAID controller. This significantly increases the latency of transactions between storage drive and the host system, which is undesirable and contrary to the performance requirements of a native PCIe architecture. An alternative architecture uses existing PCIe switching technology which exhibits inherently low latency.
Existing PCIe switches, however, do not inherently provide support for storage drive virtualization. Significant effort in the development of custom drivers is therefore necessary to support the requirements of a RAID system. For example, an existing transparent PCIe switch cannot be used in a RAID system because the host system will enumerate and have access to the physical storage drives, which is unacceptable in a RAID system. Thus, the RAID controller must use two physical or virtual switches to separate the host domain from the drive domain.
When existing multi-domain switches are used in a RAID system, all transactions are required to pass through the RAID controller's internal memory. This increases the latency between host processor and the storage drive relative to the latency achieved when the storage drive is directly connected to the host processor. The increased latency when using a multi-domain switch in a RAID system negates the primary benefit of adopting PCIe based drives.
A Non-Transparent Bridge (NTB), implemented using a PCIe switch with multiple domains, allows isolation between a host domain and drive domain, while still allowing transactions to flow between the host domain and the drive domain. The use of an NTB in a RAID controller, however, creates complications. For example, when an NTB is connected to an upstream switch port, as shown in FIG. 2, the host processor may only see the endpoint associated with the NTB. The host processor cannot enumerate the resources, including the virtual functions, in the internal endpoint of the RAID controller and standard PCIe configuration cycles cannot be used to configure and manage the RAID controller. Thus, standard PCIe compatible drivers cannot be used to communicate between the host processor and a RAID controller. This makes configuration and management of a RAID controller by the host processor significantly complex. Also, the NTB inherently adds additional latency to all transactions when the transactions must traverse two endpoints and address translation logic within the NTB.
Improvements in methods for communicating between a storage device and a host utilizing a PCIe communication protocol are desirable.