Conventional FC SANs. Fibre Channel (FC) is a serial transport protocol that was developed for carrying other transport protocols. In conventional Storage Area Networks (SANs), FC carries Small Computer System Interconnect (SCSI), which is a parallel protocol. In other words, parallel SCSI commands are encapsulated within FC frames and transported over FC links in FC SANs.
FIG. 1 illustrates an exemplary conventional SAN 100 which includes one or more hosts 102 connected to two Redundant Array of Independent Disks (RAID) controllers 104 over a network 106. The host side of the RAID controllers 104 is referred to as the “front end” 112. In conventional SANs 100, the RAID controllers 104 are connected to a plurality (e.g. 30 to 100) of drives in disk drive enclosures 108 and send and receive FC frames over a FC link 110. The disk drive enclosure side of the RAID controllers 104 is referred to as the “back end” 114. In conventional SANs 100, the disk drives within the disk drive enclosures are FC drives 118 that operate according to the SCSI protocol.
FC-ATA SANs. FC drives offer the best performance, but are expensive. Therefore, less expensive (but lower performance) Advanced Technology Attachment (ATA) drives of the type commonly used in desktop or notebook computers have been used in place of FC drives, or along with FC drives in what is referred to as tiered storage. The ATA drives may be Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) drives. FIG. 1 illustrates a SAN in which one of the disk drive enclosures 108 contain PATA drives 120 rather than FC drives. PATA drives require a FC-to-PATA bridge 116, which is relatively expensive and effectively makes the PATA disk drives 120 appear as SCSI drives to the RAID controller 104. In other words, the RAID controllers 104 send FC encapsulated SCSI commands to the disk drive enclosures, and receive FC encapsulated SCSI commands from the disk drive enclosures, and the conversion between FC and PATA occurs in the bridge 116, transparent to the RAID controllers 104 and the rest of the SAN 100. Because PATA drives are different from FC drives in terms of interfaces, error recovery and discovery, FC-to-PATA bridges are designed to be specific to a particular type of PATA drive. As a consequence, every time a new PATA drive is developed, the FC-to-PATA bridge may require modification.
In disk drive technology, as well as in transport technology, there are speed and cable distance benefits to utilizing serial protocols rather than parallel protocols. SATA drives, the serial counterpart to PATA drives, are therefore now being contemplated as an upgrade to PATA. SATA was envisioned for consumer applications.
SAS-SATA SANs. FC, as described above, is a serial transport protocol that has historically been used for carrying the SCSI protocol in enterprise applications over large connectivity spaces. Serial Attached SCSI (SAS) is a relatively new serial protocol intended to replace parallel SCSI within an enterprise host or computer. Both FC and SAS use 8b10b encoding and similar ordered sets, and both are high performance and expensive. SAS includes several protocols. One such protocol is the Simple Management Protocol (SMP), a protocol for device-to-device management that enables each entity to communicate with other entities regarding management aspects.
To take advantage of lower cost SATA drives, SATA drives have been utilized alongside higher cost, higher performance SAS drives in SAS networks (a SAS network including the initiator, target, and any attached expander devices). As mentioned above, tiered storage is the concept of having different types of drives in the same network (e.g. some 73 GByte FC drives and some 200-500 GByte SATA drives), each for a different purpose. FIG. 2 illustrates a SAS SAN incorporating tiered storage, where SATA drives are utilized in addition to SAS drives. As illustrated in FIG. 2, within a host 200, a motherboard (MB) 202 includes a processor 204, an embedded SAS Input/Output Controller (IOC) 206, and a SAS expander 208 to provide multiple ports to the MB 202 and multiple connections to drives. Connected to the host 200 are SAS drives 210 and SATA drives 212 within the host 200. In addition, the host 200 is connected to enclosures 214 containing both SAS and SATA drives. To accommodate tiered storage, another protocol was developed within SAS, the SATA Tunneling Protocol (STP), which enables lower cost SATA drives to be employed in SAS systems.
Unlike FC, which is a loop technology where drives share a common infrastructure, SAS is a point-to-point technology. SAS employs a shared infrastructure with the ability to create a point-to-point connection between two devices through which data may be transferred without interruption. Similar to FC, SAS goes through a discovery process where the first SAS entity that is discovered is the SAS expander 208. The number of ports in the SAS expander 208 is also discovered. Each port is then discovered in turn by the initiator, and the device connected to each port is determined (e.g. a SAS device). For example, if a SAS discovery ordered set is sent to a SAS drive, the SAS drive returns an affirmative response indicating that it is a SAS drive. However, if the SAS ordered set is sent to a SATA drive, nothing is returned. Similarly, if a SATA discovery ordered set is sent to a SATA drive, the SATA drive returns an affirmative response, indicating that it is a SATA drive. From that point forward, the initiator communicates with the device as a SATA device.
In the simplified ladder diagram of FIG. 2 showing a half-duplex operation, SAS ordered sets are sent between the initiator 200 and the enclosure expander. The enclosure expander makes a connection between the initiator 200 and the correct target. Once the connection is created, SATA ordered sets 216 flow between a host or initiator 200 and a target 218. The SAS communications effectively build a point-to-point connection between the SAS IOC 206 and a target (e.g. SATA drive 212), and thereafter SATA ordered sets are passed across this connection that are natively understood by the SATA drive 212. Intermixed with the SATA ordered sets will be SATA File Information Structures (FISs) flowing from the initiator 200 to the target 218 (see reference character 220), and from the target 218 to the initiator 200 (see reference character 222) according to STP.
Because of the reliability, speed and cable distance benefits inherent in FC, and the lower cost of SATA drives, there was a need to utilize SATA drives in FC SANs that have historically utilized SCSI drives. Conventional solutions for utilizing SATA drives in FC SANs provided a conversion interface, or bridge, between the FC link and the SATA device. These conversion interfaces terminated all FC exchanges and initiated corresponding SATA exchanges at or near the targets. These bridging solutions required a bridge unit per SATA device or a bridge per SATA enclosure and as a result became a prohibitively expensive solution in a SAN environment. In addition, all error cases were dealt with at or near the drive level. In the other direction, SATA exchanges were also terminated and FC exchanges are created and sent to the FC initiator. Because the FC to SATA translation was performed independently at each SATA drive or enclosure, there was no clean way of performing this conversion and the approach was prone to performance and interoperability issues. Error recovery in FC is also much different than SATA. The interface had to deal with the differences, which added complexity and additional cost to the system.
Therefore, there was a need to be able to utilize SATA drives while preserving the FC infrastructure and FC transport to the greatest extent possible to minimize the changes needed to legacy FC SANs. There was a further need to move the translation and protocol handling into the RAID controllers, which is a much more cost effective solution because the RAID controllers can perform the protocol translation for a large number of drives.
FC-SATA SANs. FIG. 3 illustrates a SAN 300 including SATA drives and a conversion from FC to SATA that satisfied the needs described above. Such a system is described in further detail in U.S. patent application Ser. No. 11/104,341, filed on Apr. 11, 2005 and entitled “Method and Apparatus for SATA Tunneling Over Fibre Channel,” the contents of which are incorporated herein by reference. In particular, FIG. 3 illustrates a system that encapsulates Serial Advanced Technology Attachment (SATA) Frame Information Structures (FISs) into Fibre Channel (FC) frames for transmission over FC SANs that utilize SATA disk drives.
When SCSI commands are to be sent from host 330 to SATA drives 342 in disk drive enclosure 332, a FC HBA 334 in host 330 sends FC frames encapsulating the SCSI commands out over the fabric 318 to a RAID controller 320, where they are received in one of the ports 336 on the RAID controller 320. Note that the ports 336 may also be connected to other hosts in the SAN 300. Note also that a RAID controller need not be employed, but rather any device providing an IOC function may be utilized. The FC frames are then routed to FC IOCs 322 in the RAID controller 320. The SCSI commands within the FC frames are then de-encapsulated by the FC IOCs 322 and passed over a Peripheral Component Interconnect (PCI) bus 324 to a processor 326, which performs the RAID function and creates multiple commands to satisfy the received SCSI command. The created commands may be SCSI commands or SATA commands and will be sent to one or more disk drives within enclosures 332.
The SCSI commands 306 are then passed from the processor 326 over a custom interface 328 (which may include, but is not limited to a PCI bus) to Fibre Channel Attached SATA Tunneling (FAST) enabled IOCs 304. The FAST IOCs 304 contain the same hardware as conventional FC IOCs, but include additional firmware 302 to allow it to handle both FC and SATA. SCSI commands 306 from processor 326 are converted in SCSI-to-SATA translation firmware 308 to SATA FISs. Alternatively, the SCSI-to-SATA translation may be performed by the processor 326 rather than in the FAST IOC 304. The SATA FISs are then encapsulated by FAST encapsulation firmware 312 into FC frames. In particular, each 8 kByte SATA FIS is encapsulated into four 2 kByte FC frames along with modifications to the header in the FC frames that enable the SATA-encapsulated FC frames to traverse a FC link. The FAST IOC 304 then sends the FC frames out over a FC link 346 via a FC port 344.
The FC frames are received by FAST switches 340 in disk drive enclosures 332, which are utilized instead of FC-to-SATA bridges. Because FC-to-SATA bridges are no longer required, the problem of new SATA drive types requiring reworking the FC-to-SATA bridge disappears. The drives can be presented as pure ATA throughout the SAN, while using FC as the transport. The FAST switches 340 include a FAST engine 352, which de-encapsulates the FC frames to retrieve the SATA FISs, handles initialization, sequences, exchanges, and all of the low-level FC commands and structures. Note that conventional FC switches only route frames between the initiator and target (which handle all exchanges themselves). However, because SATA drives do not utilize the concept of exchanges, the FAST switches are responsible for creating and terminating exchanges. The de-encapsulated SATA FISs are then communicated over a pure SATA connection 348 to the SATA drives 342.
Note that the front end devices 350 and the SAN 300 are not aware of the existence of the back end devices 338. For example, when host 330 sends SCSI data to a particular logical drive, it acts as a front-end initiator and sends the FC-encapsulated SCSI data to a virtual address associated with one of the ports 336 and a FC IOC controller 322 connected to that port 336, which acts as a front-end target. Unknown to the host 330, the processor 326 performing the RAID function identifies multiple addresses in multiple disk drive enclosures 332, and sends the SCSI data to one or more FAST IOCs 304, which act as back-end initiators. The FAST IOCs 304 translate the SCSI data into SATA FISs, encapsulate the SATA FISs into FC frames, and send the FC frames to those multiple addresses in multiple disk drive enclosures 332, which act as back-end targets. This process is referred to as virtualizing the storage. The processor 326 maintains the association between the virtual address and the addresses in the multiple disk drive enclosures, so that when a request to read that data is received from the host 330, the data can be pulled out of the multiple disk drive enclosures and sent back to the host 330.
The reverse of the above-described process is employed when a SATA drive 342 sends SATA FISs back to the host 330. Thus, when SATA FISs are to be sent from a SATA drive 342 to the RAID controller 320, the SATA FISs are sent over the SATA connection 348 to the FAST switch 340, where it is encapsulated in FC frames. The FAST switch 340 then transmits the FC frames over the FC link 346 to the RAID controller 320, where they are received by the FAST IOC 304. The FAST IOC 304 receives the FC frames, de-encapsulates the frames to retrieve the SATA FISs, and performs a SATA to SCSI translation 308 so that the RAID controller will see the target drive 342 as a SCSI device. The SCSI commands are sent to the processor 326 over PCI bus 328, which performs the RAID function and identifies the hosts (initiators) for which the SCSI data is destined. The SCSI data is then sent to the FC IOCs 322 over PCI bus 324, where they are encapsulated into FC frames and sent to the appropriate hosts over the fabric 318. The hosts then de-encapsulate the FC frames to retrieve the SCSI commands.
The benefit of performing the encapsulation/de-encapsulation and the SATA/SCSI translation in the FAST IOC 304 is that other than the addition of the FAST IOC 304, legacy RAID controllers 320 need not be changed to support SATA commands. Because FC is a mature interconnection technology, the FC link 346 is retained between the RAID controller 320 and the multiple disk drive enclosures 332, even though the FC frames are now encapsulating SATA FISs. The conversion from SCSI to SATA could occur in the FAST IOCs 304 or in the processor 326. In either case, the FAST IOCs 304 would then communicate SATA FISs to the disk drive enclosures 332 over a pure FC connection. In general, the SCSI/SATA translation and FAST encapsulation could occur anywhere on the initiator side of a FC link, while the FAST de-encapsulation/encapsulation could occur anywhere on the target side of the FC link.
A primary difference between SAS-SATA SANs described above and the system of FIG. 3 is that in SAS-SATA SANs, there is a mixture of SATA FISs and SAS in the interconnect, while in FIG. 3, everything in the interconnect is FC. There are no SATA FISs, just FC frames with SATA FISs encapsulated within them.
Alternatively, a host may encapsulate SATA FISs in FC frames and pass these frames to a RAID controller, where the SATA FISs may either be de-encapsulated, virtualized and re-encapsulated into FC frames destined for multiple SATA drives in the back end, or simply passed through the RAID controller and sent directly to SATA drives through the FC network.
The FAST switch. FIG. 4 illustrates an exemplary FAST switch 400 resident in a FAST disk drive enclosure (not shown). The FAST switch 400 contains a number of FC Phy 402 and FC link layers 404 for interfacing with the FC ports on one or more RAID controllers over a FC link 404. The FC Phy 402 and FC link layers 404 handle all the primitives in FC. These layers monitor received FC primitives, modifying the active switch matrix connections in response to traffic going across the FC link. The FC link layers 404 are connected to a crossbar switch 406, which is also connected to a number of port link layers 408 for connecting to either a FC device or a SATA device. The crossbar switch 406 operates in FC Arbitrated Loop (FC_AL) space, and performs a switching function. It uses the FC Arbitrated Loop Physical Address (AL_PA) and OPeN (OPN) ordered sets to determine the destination of a connection request, and makes a connection across the crossbar switch to the target device.
Each port link layer 408 includes a FC/SATA link layer 410, a FC Tunneling SATA (FTS) layer 412, and a FC/SATA Phy 414. The FTS layer 412 contains logic which detects whether the port link layer 408 is connected to a SATA drive by detecting SATA ordered sets, and determines the status of the SATA drive. The FC/SATA Phy 414 are connected to SATA or FC drives 416.
Also connected to the crossbar switch 406 are FAST port/buffers 418 coupled to the crossbar switch 406 and one or more (e.g. four) FAST engines 420. The FAST engine 420 contains a full SATA core (and a Register Transfer Level (RTL) state machine) that understands the lower levels of the SATA protocol. The FAST engines 420 are viewed as initiators to the SATA drives 416. Note that because it would take up too much space to have a FAST engine and buffers for each port, a reduced number of FAST engines and buffers are shared between the port link layers 408. A small Ordered Set (OS) generation and detection circuit in the FC link layer 404 is used to keep the SATA drive interface serviced. The OS generator sends ALIGN characters to the SATA drive when not connected to the SATA link-layer in one of the FAST engine blocks. The detection circuit determines when the SATA drive is making a request that requires servicing by the SATA link-layer block in the FAST engine and passes the request to the router 422 to request a connection. The router 422 is connected to the crossbar switch 406 and makes routing decisions within the crossbar switch 406. Also connected to the crossbar switch 406 is an enclosure management function 424 controllable by a Central Processing Unit (CPU) port 426. The CPU port is a path to allow a processor to monitor FC frames locally.
To handle the FC protocol for the SATA targets, the FAST switch will take a FC address for each SATA device connected to a port (this would be an AL_PA for FC_AL topologies and a Destination IDentifier (D_ID) for Fabric or Point-point topologies) during the initialization sequence. The FAST engine will also respond to all Port LOGIns (PLOGIs) and PRocess LogIns (PRLIs) and will generate a Fabric LOGIn (FLOGI) if a fabric is present. The FAST engine knows from the presence of AL_PA 00 that a fabric is present. The logins will identify the targets as SATA devices. This allows a tunneling capable initiator to discover the devices and initiate a SATA connection to them. All non-ATA commands will be directed to the CPU port for analysis and response. Using the CPU port to process the FC login commands allows the flexibility of firmware to handle error and out of bound conditions.
The FAST port/buffers 418 are notified by the FAST engine 420 that there is an active SATA drive attached, and perform several functions. During FC loop initialization, the FAST port/buffers 418 take an AL_PA to reserve an address in the FC_AL subsystem. The FAST port/buffers 418 act as a FC target and receive FC primitives, OPNs, ARBitrates (ARBs), IDLEs, Loop Initialization Primitives (LIPs) and frames, and generate and send OPNs, ARBs, LIPs, and frames back to the initiator to make it appear to the FC port like a virtual disk drive. The FAST port/buffers 418 also terminate all FC frames coming across the FC link, handle all the FC protocols, and put the data into a First In First Out (FIFO) buffer for subsequent processing by the FAST engine 420. The FAST port/buffers 418 can also be statically configured by setting a bit to support either standard FC or SATA-encapsulated FC frames, and thus can be connected to either FC or SATA drives. The FAST port/buffers 418 also have buffers to translate from FC speeds to SATA speeds and perform speed matching with the drives.
SATA supports up to an 8k frame size. In order to transfer the SATA frames through FC environment, the SATA frames must be divided into the negotiated FC frame size. This is accomplished by filling a FC buffer in the FAST port/buffers 418 and then sending a HOLD to the SATA target until another FC buffer is available. When the FC buffer is available, the HOLD is released and additional data is received from the SATA device. This facility requires that the FAST switch increment the SEQuence IDentifier (SEQ_ID) on each frame of a response to detect out of order and lost frame tracking by the initiator. When data is being received on the FC side destined for the SATA device, the data will come in as the negotiated frame size where it will be assembled into the proper SATA frame sizes. If the entire SATA frame is not available to send to the target, a HOLD is sent to the target until the next frame is received from FC interface or the last sequence of the exchange is received.
The FAST engines 420 emulate the host initiator to the SATA drive 416, and take the data out of the FIFO buffer (in the case of data going from the initiator to the SATA drive). The FAST engines 420 also check incoming frames to determine if they are valid SATA frames to be passed on to the SATA drive 416, if it is a PLOGI frame that needs to be responded to (low level responses) without involvement of the driver, or if it is a frame that the FAST engine does not recognize and must send to the CPU port for the processor to handle on an exception basis.
When a SATA drive 416 is ready to transfer data, the SATA drive sends a Transmit (Tx) Receiver ReaDY (R_RDY) ordered set to the port link layer 408, indicating that data is ready to be transferred. However, if the FAST switch is not ready to receive the data because no FAST engine 420 is available, for example, the FTS 412 detects this condition and continues to send an idle character to the SATA drive, which will not start sending data until R_RDY ordered sets are sent, signaling it is okay to start transmitting data back to the FAST switch. When a FAST engine 420 becomes available, the FTS 412 sends a routing request to the router 422 (who knows from the discovery process that the requestor is a SATA drive 416), requesting that the SATA drive be connected to a FAST engine 420. When a FAST engine 420 is assigned, the FAST engine becomes the initiator to the SATA drive (the target). Because SATA is a point-to-point protocol, SATA believes there is only the initiator and the target, nothing else. In effect, there is no addressing, because none is needed.
Affiliations. SATA targets are designed to be controlled by a single initiator. In order to use these devices in a multi-initiator environment, an affiliation method is deployed. The affiliation method provides a reserve and release control mechanism to ensure non-queueable commands from multiple initiators do not collide. Prior to issuing a non-queueable command, an initiator must request and be granted an affiliation with the desired SATA target. Only after being granted an affiliation may an initiator issue a non-queueable command to a SATA drive.
Affiliations can be used in loop and fabric environments. The initiator first sends an affiliation frame to the target. The FAST switch receives this frame and, assuming the disk is available, generates a response frame for granting the affiliation to the initiator. If the disk is unavailable, either by being already in an affiliation or processing queueable commands, the FAST switch may queue up the affiliation request, to be processed when the disk is available, or may immediately generate an affiliation not available response.
Once the point-to-point connection is made, the FAST engine 420 is responsible for accepting the responses from the SATA drive 416, performing all the handshaking with the SATA drive, encapsulating the received data (e.g. data in response to a read request) into FC frames along with the proper context for the response, and storing the encapsulated FC frames into the FAST port/buffers 418. The FAST engine 420 tracks that the request came in from a particular device with a particular OXID, Source Identifier (S_ID) and D_ID. The FAST engine 420 utilizes this context information to build FC frames, move completed FC frames (having the SATA FIS encapsulated within) to the FAST port/buffers 418 and ensure that the response is sent to the right place, using the correct exchange, and in the proper sequence. The SATA core in the FAST engine 420 is also responsible for telling the drive 416 to hold off if the FAST port/buffers 418 are full.
When multiple FC frames have been built and stored in the FAST port/buffers 418, and either all buffers are full or the SATA response is complete and a complete exchange is stored, the buffer state machine makes a routing request to the router 422, who has access to the context of the response and knows who the initiator is, to route the FC frames out of a FC Phy 402 connected to the initiator.
The FAST engines are also used in a similar manner to fill the frames for write commands to the SATA drives.
When the current initiator is finished sending requests, a close affiliation frame is sent to the FAST switch, which removes the affiliation at the completion of all pending I/Os. The tunnel device then sends an affiliation removal accept following the completion of the last outstanding I/O. Affiliations may also be closed by the FAST switch, to facilitate fairness between the multiple initiators. To close and affiliation, the FAST switch generates an affiliation close frame to the initiator if it desires to close the affiliation and allow another initiator access to the target. Upon receipt of the affiliation close, the initiator normally stops sending I/O requests and forwards an affiliation close accept frame. The initiator could reject the affiliation close request from the switch, keeping the affiliation active if the initiator so desires.
In a private environment the affiliations are handled automatically. In this configuration, the first initiator to send an I/O request receives the affiliation. This affiliation is kept open for as long as the initiator has outstanding I/Os to the target, or has expired the associated timer. Upon completion of the last outstanding I/O, the affiliation is automatically closed. The initiator can close an affiliation by not sending additional I/Os to the target and allow the outstanding I/Os to complete. If a second initiator sends an open to the target, the target will respond with a close. The FAST engine may place the AL_PA of the initiator into the requestor stack and when an affiliation is available, sends a full duplex open to the initiator. An affiliation close frame is still used to allow the target to break an affiliation. In this case, the target generates an affiliation close frame to the initiator if it desires to close the affiliation and allow another initiator access to the target. Upon receipt of the affiliation close, the initiator stops sending I/O requests and the affiliation is closed when all remaining I/Os complete. The fabric affiliation method may also be used in private mode if desired.
Upon removal of an affiliation, the current affiliation owner is removed from the affiliation stack. An affiliation response frame is generated and sent to the next highest priority member of the affiliation stack with an affiliation field of 0 and the cycle is repeated.
FAST engines. FIG. 5 illustrates an exemplary FAST engine 500. In FIG. 5, the FC Receiver (Rx) frame decoder/router 502 checks received frames and dispositions them as described above. The FC Rx frame decoder/router 502 checks the frame R_CTL and Type fields to determine where to route the received frame. Routing of all frame types may be selectable via hardware registers to allow hardware generated responses, routing tunneled frames to the SATA IP core 510 or firmware processing via a SATA processor port.
If the frame is a valid SATA-encapsulated FC frame to pass on to a SATA drive, the frame is sent to the Tx FAST Link/Transport Layer block 504, where it is processed at a higher level to set up an active exchange between the host initiator and the virtual target. The Tx FAST Link/Transport Layer block 504 also de-encapsulates the FC frame and strips off and maintains the context information, and sends the SATA FISs to the SATA IP interface block 508 and a SATA IP core 510. The SATA IP interface block 508 contains any glue logic required to tie the SATA IP core 510 into the design. Among features supported is resetting targets on non-stealth LIPs, aborting the transmit requests on errors, etc. The SATA IP core 510 contains the physical and link layer logic for interfacing to SATA devices, sorts the SATA ordered sets, makes sure the spacing between frames is correct, and processes holds and hold acknowledgements and other low level SATA protocols.
The hardware response manager block 514 offloads the FAST switch's local processor by generating response frames to many FC commands. Unrecognized frames are routed through the CPU port to the local processor to be resolved. This block will respond to at least the frames listed below:
IOC-Generated FramesFAST Engine Response FramesFLOGILS_ACC (Class 3 info only)PLOGILS_ACC (Class 3 info only)PRLILS_ACC (Type code is TBD)Command w/oP_RJT/F_RJT (Port busy,available affiliationreason is N_PORT busy)Command notP_RJT/F_RJT ( )supported
Transaction specific fields (D_ID, S_ID, OXID, etc) are read from all received frames to allow insertion in the proper fields of the response frame. A response frame request may be generated by either the FC Rx frame decoder/router 502 as a canned response to a known received frame type or by the affiliation and command queue manager block 516 to indicate busy conditions. While each FAST engine contains its own Hardware Response Block control circuitry, the data for these frames is held in a module shared by all the FAST engines to save gates. Note that only a small number of FAST engines are used to save gates. Because the router connects requests to/from SATA ports on a first-come, first-served basis, the FAST engine associated with a particular request probably will not be the port that the response is returned through. Thus, a shared context table must be maintained so all FAST engines can correctly store and/or generate the appropriate headers for the FC frames.
When a response is returned from the SATA drive, the context can be associated with the return data so that it can be routed back through the FC fabric to the initiator. In other words, the Tx FAST Link/Transport Layer block 504 opens the FC exchange structure and keeps track of the context of all frames being sent to the drives, so that when a response comes back from a drive, the Tx FAST Link/Transport Layer block 504 can put the context back into the FC frame wrapper. Tx FAST Link/Transport Layer block 504 monitors the received FC frames, and after verifying a valid Cyclic Redundancy Check (CRC) on the frame, the logic accesses the affiliation and command queue manager block 516 to determine if an affiliation already exists with the targeted SATA drive. If an affiliation exists and the frame is part of a transaction within the existing affiliation, the frame is forwarded to the disk. If the frame is not from an initiator affiliated with the disk, the frame is buffered within the FAST switch until the affiliation is closed and the disk is available to receive the command. Note that only non-queueable command FISs must be protected within the affiliation mechanism, because queueable command FISs do not require affiliations.
The logic verifies the received frame is the next frame in the sequence. If so, the frame is passed to the SATA IP interface 508 to be forwarded on to the target device, and the status of the transaction is sent to the affiliation and command queue manager block 516 to update the database. Because FAST protocol engines 500 are dynamically assigned, transaction information that spans more than one frame is passed through the affiliation and command queue manager 516 for storage in the shared database, accessible by all FAST engines. If the received FAST frame is not the next frame in the sequence, the frame is discarded.
In the case of a write, once all the data has been written to the SATA drive, the SATA drive sends a “status good” response through the SATA IP core 510 and the SATA IP interface 508 back to the Rx FAST link/transport layer 506, which retrieves the context information and sends it out through the FC Tx frame multiplexer 512 as a FC frame back to the host, indicating to the host that the write is complete. The stored FC fields are used to generate the FC frame. The module also checks and generates Cyclic Redundancy Checks (CRCs) as part of receive/retag operation. If a bad CRC is received from the disk, the CRC generator will generate a bad CRC, passing the error handling responsibility up to host.
The Tx FAST Link/Transport layer block 506 and FC Tx frame mux blocks can also detect corrupted frames and pass them to a local processor to handle as an exception, and the local processor can send frames down to the SATA as needed to do some background work.
The FC TX Mux block 518 selects between the various sources of data to send the desired data to the FAST Port interface.
Communications between mixed speed devices. FIG. 6 is a simplified illustration of an exemplary Application Specific Integrated Circuit (ASIC) 600 containing the functions of FAST switch 400 of FIG. 4. One application of the ASIC 600 is as a FC switching device capable of servicing multiple Just a Bunch Of Disks (JBODs), making them Switched Bunch Of Disks (SBODs), to boost overall performance of a system. The basic architecture is shown in FIG. 6.
Each port 602, identified individually as Port0-N, may be configured as a FC port of a particular FC speed (e.g. 1 Gbit/sec, 2 Gbit/sec, or 4 Gbit/sec) connectable to a FC loop (e.g. FC loop 604) that may contain one or more FC devices (e.g. a JBOD). The ports 602 in FIG. 6 correlate to the port link layers 408 in FIG. 4. The ports 602 may also be configured as a SATA port of a particular SATA speed (e.g. 1.5 Gbit/sec or 3 Gbit/sec), connectable to a single SATA device (e.g. SATA device 606).
It is desirable that devices on one port be able to communicate with devices on another port. In the event that two ports are running at different rates, the ASIC 600 routes traffic from an incoming port to a rate-matching memory Buffer Bank (BB) 610 before sending it to the outgoing port. Note that each BB 610 also includes some associated control logic. Shown in FIG. 6 is a router/switch core 608 and multiple BBs 610, identified individually as BB0-M. Generally, M will be less than N (where N+1 is the number of ports). Each BB 610 is capable of storing a fixed number of frames of data (e.g. four). The BBs 610 in FIG. 6 correlate to the FAST Port/Buffers 418 in FIG. 4. The router/switch core 608 makes connections between the ports 602 and the BBs 610. The router/switch core 608 correlates to the router 422 and crossbar switch 406 in FIG. 4.
Each port 602 contains information identifying the speed of the port. Note that each port 602 is capable of supporting only one speed at a time. During initialization of the ASIC 600, each port 602 performs speed negotiation to determine the speed of the connected device, and the speed information is read from the ports and stored in the router/switch core 608 in a port speed bit vector 612.
Connections for communicating between mixed speed devices. By way of example, supposed that a FC HBA 1 operating at 4 Gbit/sec connected in a FC loop 604 to Port0 wants to send data to SATA device 606 operating at 1.5 Gbit/sec connected to Port1. In this case, FC HBA 1 arbitrates for the FC loop 604 using well-known FC protocols, and upon obtaining access to it, sends an OPN primitive to port0 (see reference character 614) in an attempt to open a connection between FC HBA 1 and the SATA device 606. Port0 then forwards the OPN to the router/switch core 608 (see reference character 616), which then extracts the source and destination addresses from the OPN primitive and determines that Port0 is the source port and Port1 is the destination port.
The router/switch core 608 then determines from the port speed bit vector 612 that a speed mismatch exists. Because a speed mismatch exists, the router/switch core 608 selects an available buffer bank 610 (assume BB0 in this example) to act as a buffer and absorb the speed mismatch. The router/switch core 608 then forwards the OPN primitive to BB0 (see reference character 618). BB0 then makes repeated attempts to send the OPN primitive to router/switch core 608. When the arbitration and fairness scheme implemented by router/switch core 608 identifies BB0 as having the highest priority, and if destination port Port1 is not busy (e.g. not involved in another connection), the router/switch core forwards the OPN primitive to Port1 (see reference character 620), and a connection between FC HBA 1 and the SATA device 606 is established.
In addition, after BB0 receives the OPN primitive 618, it may send back one or more buffer credits to Port0 (see reference character 622) through router/switch core 608, one buffer credit for each available frame in BB0. Alternatively, BB0 may wait until it receives confirmation that a connection with Port1 has been established before sending buffer credits 622 back to Port0. In either case, upon receiving the buffer credits 622, Port0 sends back one or more R_RDY primitives back to FC HBA 1 (see reference character 624), one R_RDY primitive for each buffer credit, indicating to FC HBA 1 that it may now commence with the transmission of one frame of data for every R_RDY primitive received.
At this point, FC HBA 1 transmits one or more frames 626 to SATA device 606 through Port0, router/switch core 608, BB0, and Port1, respectively. As space becomes available in BB0, BB0 sends a buffer credit 622 to Port0 through router/switch core 608. Port0 then sends an R_RDY primitive back to FC HBA 1, and an additional frame 626 is transmitted from FC HBA 1 to Port0. When all frames are indicated as having been transmitted and received by the SATA device 606, the connection is closed down by the router/switch core 608 after Port0 sends in a CLS primitive.
Premature connection closure. In the example of FIG. 7, suppose FC HBA 1 wants to send data to a 1 Gbit/sec FC device 728 connected in a FC loop to Port2. Again, the router/switch core 708 will detect a speed mismatch, and will select an available buffer bank (e.g. BB1). After a connection with Port2 is established as described above, FC HBA 1 begins transmitting frames to FC device 728.
It should be understood, however, that because the number of ports, N, is greater than M, the number of BBs 710, BBs in ASIC 700 are shared. Because of this sharing, it is inevitable that a port 702 connected to a BB 710 will, on occasion, become disconnected before completion of a data transfer due to various error or normal conditions, such as when a destination port becomes unable to accept more data.
Thus, in the example of FIG. 7, suppose that before all frames from FC HBA 1 have been received by FC device 728, the FC device closes the connection prematurely by sending a CLoSe (CLS) primitive to Port2. This closing may be the result of receiving a higher priority task or a destination port becoming unavailable, among other things. As a result of the premature connection closure, some frames remain in BB1. Further assume that FC HBA 1 sends a CLS primitive to Port0, thus closing down the connection between Port0 and BB1. The FC HBA 1 still has frames to send to Port2. Further assume that based on the priority and fairness scheme implemented by the router/switch core 708, BB1 is considered to have received its chance to connect to Port2, and has lost its priority.
Now assume that FC HBA 1 connected to Port0 wants to send data to FC device 728, and sends an OPN primitive 730 to Port0 to finish transferring its frames, which is then forwarded to the router/switch core 708 and BB0. BB0 then makes repeated attempts to send the OPN primitive to router/switch core 708. When the arbitration and fairness scheme implemented by router/switch core 708 identifies BB0 as having the highest priority, router/switch core 708 forwards the OPN primitive to Port2 (see reference character 732). Note that Port2 is no longer busy due to the premature connection disclosure described above. Further assume that BB0 now has a higher priority than BB1 according to the fairness scheme implemented by the router/switch core 708. Because Port2 is ready to accept connections, it will receive the OPN primitive from BB0, establish a connection between Port0 and Port2, and receive frames from FC HBA 1. Note that because BB1 still contains frames from FC HBA 1 that were not received by FC device 728, the FC device will receive frames that are out of order. This is the Out-Of-Order (OOO) frame problem.
Therefore, there is a need to drain any data remaining in a BB that was prematurely disconnected from a port before any other initiator can connect to it, if the reception of OOO data is to be avoided.