A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office patent files or records, but otherwise reserves all copyright rights whatsoever.
This invention is directed to application buffer-to-buffer transfers over a network, and more particularly to DMA transfer over a network between application buffers using Fibre Channel.
Fibre Channel is a data transport mechanism that includes hardware and a multi-layer protocol. Fibre Channel is described in xe2x80x9cFibre Channel Physical and Signaling Interface (FC-PH)xe2x80x9d (ANSI X3.230-1994) by the American National Standard for Information Systems, which is incorporated by reference in its entirety. Fibre Channel is used today as a communication path between computers and disks. For example, Fibre Channel is used in Storage Area Networks (xe2x80x9cSANsxe2x80x9d). When Fibre Channel is used as a communication path between computers and disks, the Small Computer System Interface (xe2x80x9cSCSIxe2x80x9d) protocol runs on top of the Fibre Channel protocol so that legacy SCSI drivers can still be used to control the data flow. Since a common use of Fibre Channel protocol is to interpret SCSI commands, Fibre Channel adapter cards often have built-in SCSI Assist Hardware to accelerate this process.
Fibre Channel includes a buffer-to-buffer DMA transfer mechanism. If two computers are connected together with Fibre Channel and the Fibre Channel adapter card in the sending computer is given the address of a sending buffer and the Fibre Channel adapter card in the receiving computer is given the address of a destination buffer, the two adapter cards can transfer data across a Fibre Channel media (e.g., a copper or optical cable) from the sending buffer to the receiving buffer in a single DMA burst. This feature works whether the two nodes are connected point-to-point, through a Fibre Channel hub connecting up to 126 nodes together, or through a series of Fibre Channel switches connecting up to 16 million nodes together. When used to connect computers to disks, the disk hardware serves as one of the computers and the buffer-to-buffer DMA transfer simply moves data between an application buffer in the computer and a buffer in the disk.
The SCSI Assist Hardware in Fibre Channel adapter cards accelerates the common SCSI disk transactions. SCSI Assist Hardware lets the host driver place the SCSI command containing the SCSI disk request into the card hardware and relieves the host computer from being interrupted until the data has been transferred and the response phase of the SCSI operation completes. Thus, SCSI Assist Hardware allows a Fibre Channel adapter card to execute the SCSI command phase, the SCSI data phase, and the SCSI response phase without interrupting the host computer.
Networks today communicate by breaking application data into smaller units, called datagrams. Each datagram is sent across the network as a separate unit. Breaking long messages into smaller network units is done to share the network resource so that a long message does not dominate the bandwidth.
Network applications uses a protocol stack to interface the application to the physical network. FIG. 1 shows the layers of a conventional protocol stack based on the Open System Interconnection (xe2x80x9cOSIxe2x80x9d) Seven Layer Reference Model. FIG. 1 compacts layers 5-7 into a single Application layer for ease of reference in relation to the present disclosure. xe2x80x9cApplicationxe2x80x9d in this disclosure refers to any program residing above the transport layer, including software that services network requests for file data, such as the SRV server module in the Windows NT operating system.
The transport layer (e.g., Transmission Control Protocol, or xe2x80x9cTCPxe2x80x9d) provides to an application in a local computer a xe2x80x9cvirtual circuitxe2x80x9d that connects the application to an application in a remote computer even where the remote computer is half way around the world. The transport layer maintains this virtual circuit even though the physical network may frequently lose data.
The transport layer breaks the application data into xe2x80x9csegmentsxe2x80x9d that it gives to the network layer. Segments created by the transport layer may be up to 64 Kbytes. Segments which are not acknowledged by the transport layer on the destination computer are resent.
The application data given to the transport layer may have its own application header A (FIG. 1) describing the data. File transfers under Windows NT(copyright) (xe2x80x9cNTxe2x80x9d) for example, have a Server Message Block (xe2x80x9cSMBxe2x80x9d) header placed before the data. The application may divide the data into units smaller than 64 Kbytes. The file server software SRV that handles remote requests for files in NT, for example, breaks data into units of about 60 Kbytes. The transport layer adds its own header T (FIG. 1) and passes the segment down to the network layer.
The transport process that creates a virtual circuit requires an acknowledge signal (xe2x80x9cACKxe2x80x9d) back from the final destination for the data sent. If a specified number of ACKs is not received, the transport layer on the sending side stops sending data. If the missing ACKs are not received in a predetermined time, the data is resent. The transport layer, thus, implements both a flow-control mechanism and an error-control mechanism.
The network layer (e.g., Internet Protocol, or xe2x80x9cIPxe2x80x9d) breaks the transport segment into datagrams that will fit in the Maximum Transfer Unit (MTU) of the network, which is 1500 bytes for an Ethernet physical layer. The network layer then attempts to move each of these MTU-size datagrams through the network to the destination. The network layer gives each of these 1500-byte datagrams a network header N (FIG. 1) containing the address of the final destination node. The network layer also adds a Media Access (xe2x80x9cMACxe2x80x9d) address to each datagram before passing it down to the data link layer. The MAC address is the physical address of the very next node in the network path. As the datagram makes its way through the network toward its final destination, the MAC address is replaced at each hop with the address of the next node on the route.
The data link layer instructs the network interface card (xe2x80x9cNICxe2x80x9d) to move the datagram fragment over the physical network to the next node. The data link layer includes the NIC drivers. As FIG. 1 shows, as the application data moves down the protocol stack, it accumulates headers 10. At the data link layer, the first few hundred bytes of the final datagram contain all of headers 10.
The description above for the transport, network, and data link layers applies equally to a Wide Area Network (WAN) that could span the entire globe and pass through numerous routers, as to a local area network (LAN) where the nodes may all be in the same building. In a LAN, each node is often just one hop away. That is, the MAC address also points to the final destination.
In a conventional network, a read operation can be seen as a write of the read request by a client computer to a server, followed by a write back of the data by the server to the client computer. For example, when a client computer wants to read data from a remote server, the client computer writes a request to the server asking for certain file data. The network is then quiescent with no state maintained about the read operation. When the server locates the data, it writes the data back to the client computer.
In the write back operation, the transport layer sets up a virtual circuit to the application in the destination computer, or uses a virtual circuit that already exists to this application, and passes a segment of data to the network layer. For example, if the application is a remote NT file server, the software in the NT server is SRV. After receiving the request for file data, the server locates and returns the data. The application source buffer in this case is most likely the cache in the NT server. If the data is already in cache, the cache serves the data directly. If the data is not in the cache, NT reads the data into cache before satisfying the network request.
As discussed above, the network layer fragments the segment into MTU-size datagrams which are passed to the data link layer. Since each datagram is a separate entity that may take a different route through the network, the datagrams could arrive at the destination in a different order than they were sent. Because of the possibility of receiving datagrams out of order, the receiving layers below the transport layer in the destination computer buffer and reorder the datagram fragments, if necessary, before passing them to the upper layers. While the chance of datagrams arriving out of order is small on a LAN, LAN datagrams are processed the same way as WAN datagrams.
Another reason buffering is required at the receiver is that the datagrams in a conventional network are unsolicited, i.e. the receiving network hardware does not know yet the final destinations for the data in the datagrams. The receiving node puts the unsolicited datagrams into a temporary buffer until the final application buffer is found, at which time the data is copied from the temporary buffer to the application buffer. Thus, the receiver buffering moves the data received twice.
Because of the unreliable physical network, the transport layer uses a xe2x80x9cchecksumxe2x80x9d in one of the fields of the transport header T (FIG. 1). The checksum is recalculated at the receiving end as the data arrives and compared with the checksum sent. Computing checksum is a large network overhead.
On the receiving side, there are two conventional ways to handle arriving datagrams. The first puts each datagram into a temporary buffer reserved for unsolicited transmissions, reorders the datagrams as necessary, and passes them up to the protocol stack where they are copied to the application buffer. Alternatively, the first datagram received is passed up while succeeding datagrams are placed in temporary buffers. This first datagram contains headers 10, so the upper layers can locate the designated application. The application then passes down an application buffer address and the data link layer begins copying the buffered data to this address, reordering datagrams as necessary. In both cases above, the arriving data is first put into a temporary buffer and later copied to the application buffer.
In one embodiment, a method for transferring data over a network includes specifying a Maximum Transfer Unit (xe2x80x9cMTUxe2x80x9d) greater or equal to the segment size, sending the network headers of an application data over the network, receiving a start-transfer signal indicating that the destination application buffer is ready to receive application data over the network, and sending the application data from the first application buffer to the second application buffer over the network. In one implementation, the network includes a Fibre Channel network. In another implementation, the network includes any network media that allows buffer-to-buffer direct memory access (xe2x80x9cDMAxe2x80x9d) transfers of data. In yet another implementation, the sending of the network headers, the receiving of the start-transfer signal, the sending of the application data, and the receiving of the transfer status are accomplished using a single hardware SCSI exchange.