A SAS expander can be generally described as a switch that allows initiators and targets to communicate with each other, and allows additional initiators and targets to be added to the system. In SAS-1.1, the total number of initiators and targets was limited to on the order of hundreds of devices. However, with SAS-2, SAS expanders can be connected to each other up to 16 levels deep, and therefore thousands of initiators and targets may be connected.
FIG. 1 illustrates two SAS expanders 102 and 114 connected together in exemplary storage network 100. Each SAS expander 102 and 114 can include a plurality of Phy 104, expander connection manager (ECM) 106 for allowing pathways to be built between two Phy, expander connection router (ECR) 108 which makes decisions regarding routing connections between Phy, and broadcast primitive processor (BPP) 110 for propagating broadcast change notifications (BCNs) to all other ports in the SAS expander except the port that caused the BCN to be generated.
FIG. 1 also shows an initiator 112 connected to a Phy 104 on first SAS expander 102, a SAS disk drive or other device 116 connected directly to another Phy on the first SAS expander as a direct attached drive, and second SAS expander 114 connected to the first SAS expander. Newly inserted device 116 is normally identified through the IDENTIFY process, and a broadcast change notification (BCN) is then generated by BPP 110 in SAS expander 102 to notify other devices in storage network 100 of the change. External initiators can then use the SAS Management Protocol (SMP) Discover command to identify device attachments. In addition, if device 116 was not properly seated in its connector, had a bad cable, or was otherwise misbehaving in ways that produced errors, BPP 110 would normally send out BCNs to other ports in SAS expander 102, which would cause all initiators (e.g. initiator 112) to perform a re-discovery process for every device in the network to understand the contents of the SAS fabric. If device 112 was generating intermittent errors, many BCNs (a “BCN storm”) could be transmitted, creating many re-discovery processes to be performed. Alternatively, if a downstream SAS expander had an illegal configuration such as two Phy (on the same expander) connected together, this could also cause a BCN storm. Constant BCNs will impact the usability of a SAS network.
Therefore, there is a need to be able to isolate, test and validate devices before they are made visible to the network.
Two device types, SAS and Serial Advanced Technology Attachment (SATA), can commonly be connected to a storage network using SAS expanders. SAS devices have a unique 64-bit SAS address already assigned to them. However, current SATA devices do not have a SAS address. This is important, because to enable a SAS expander-attached SATA device to be visible to other devices in the network, the Phy to which it is attached must be assigned a unique SAS address. (Note that every Phy in a SAS domain must have a unique SAS address (with the exception of wide-ports, which all connect to the same device and can therefore share a common SAS Address)). Because SATA devices do not present a SAS address, they are assigned an address by the SAS expander. Each expander port maintains an SATA Tunneling Protocol (STP) SAS address, which identifies a SATA device connected to the port. However, if the SATA device is removed and replaced by a new SATA device, the old STP SAS address remains bound to the port, and thus any outstanding input/output (I/O) requests targeted to the removed SATA device will instead be delivered to the new SATA device connected to the same port, potentially corrupting the new SATA device.
FIG. 1 illustrates exemplary SAS expander 114 connected to SAS and SATA devices 118 and 120, respectively, in storage network 100, and the addressing problem created by attached SATA devices. SAS device 118 connected to Phy 122 within SAS expander 114 has its own SAS address Z, and thus if the SAS device is moved at 132 to another Phy (see Phy 134), the SAS address follows the SAS device. In contrast, SATA device 120 is connected to STP port 124 and Phy 126. STP port 124 provides translation functionality between SATA and SAS. SATA device 120 is assigned an address X which is bound to Phy 126, so that if the SATA device is moved to another port at 136 and a new SATA device is inserted in its place, the address X stays with the Phy, and the new SATA device receives the old STP address X. When this happens, outstanding I/O requests may complete to the new SATA device plugged into Phy 126 with address X, and the data will be written to or read from the wrong device, resulting in corruption.
One initiator-based solution to this problem is as follows. When a SATA device is removed, a BCN is generated. Because the source of the BCN cannot be distinctly identified down to an expander and port, all SATA drives in the SAS domain are placed in a hold state. All existing SATA I/O requests are aborted and new SATA I/O requests are rejected until the driver re-validates the SATA devices and removes them from the hold state.
A disadvantage of this solution is that any change in the domain (indicated by a BCN) causes all SATA devices to be placed into the hold state (as there is no way to know what changed). Additionally, any BCN (such as a SAS hot-insert) causes all SATA devices to be placed into the hold state. While this provides the maximum protection, it also places an additional burden on initiators for managing domain changes (above and beyond normal discovery).
Therefore, there is also a need to provide some level of persistent binding for a SATA device. If a SATA device is moved from one port to another, the STP SAS address should follow it to the new port.
As mentioned above, in earlier versions of SAS (e.g. SAS-1.1), only several hundred devices could practically be attached in the network. However, with SAS-2, due to changes such as higher link rates (3G to 6G) and innovations such as connection multiplexing which allows a single link to be time-division multiplexed to improve access, more than a thousand devices can be attached. This large number of devices can cause problems, as will be discussed below.
FIG. 1 illustrates initiator 112 and several SAS expanders 102 and 114 chained together. Each SAS expander 102 and 114 is self-configuring—that is, each expander takes care of its own route-table programming. Each expander has a plurality of ports to which devices or other expanders may be attached.
SMP allows initiators to perform discovery, in which each initiator communicates with every device to discover what is attached within the network. Each SAS expander may have to send an SMP REPORT_GENERAL command to each initiator, indicating how many devices are attached and providing other basic SAS expander information back to the initiator. Additionally, one SMP DISCOVER command would have to be sent out for each Phy on each SAS expander, and SAS expanders typically have 36-38 Phys. Initiators must therefore keep track of the device tree (which devices are connected to which expander, etc.).
Similarly, if a device is unplugged or otherwise changed, the discovery process requires that the SAS expander to which the device was attached send a BCN is sent out to all devices in the network. The BCN provides a notification of fabric changes without specific details. The initiator must then determine which devices are now unavailable (the expander, and all devices behind it). To accomplish this, each initiator must perform a full re-discovery of the entire SAS domain. Full rediscovery involves many SMP DISOVER commands to pinpoint one or two changes that have occurred in the fabric. For large SAS networks of 1000+ drives (requiring as many as 42 expanders), thousands of SMP commands can be performed to identify what changed in the network. In other words, there is no mechanism available to determine what changed in the SAS network without enumerating every Phy of every SAS expander. This process can be time-consuming, and in multi-initiator settings, can be performed by multiple initiators simultaneously. This has the effect of burdening the SAS network with management traffic and reducing the available I/O bandwidth. Thus, SAS does not scale well to large networks.
Therefore, there is also a need to enable an initiator to quickly and efficiently discover SAS network by obtaining re-discovery information about all devices in a network without having to perform a full re-discovery process.