The present invention relates to data transfer in a computer system. In particular, the invention relates to methods and apparatus for transferring data among various sources and sinks for data.
Queued, message-based I/O ("QIO") in a system with shared memory is discussed fully in U.S. application Ser. No. 08/377,302, filed Jan. 23, 1995, now abandoned and assigned as well to the Assignee of the instant application. U.S. application Ser. No. 08/377,302 is incorporated herein by reference and is loosely summarized below.
FIG. 1 is a block diagram showing a fault-tolerant, parallel data processing system 100 incorporating a QIO shared memory system. FIG. 1 includes a node 102 and a workstation 104 that communicate over a Local Area Network (LAN) 105. The node 102 includes processors 106 and 108, connected by an interprocessor bus (IPB) 109. The IPB 109 is a redundant bus of a type known by persons of ordinary skill in the art. Although not shown in FIG. 1, the system 100 is a fault-tolerant, parallel computer system, where at least one processor checkpoints data from other processors in the system. In prior art, in such a system, memory is not shared in order to avoid the memory being a bottleneck or a common point of failure. Such a fault tolerant system is described generally in, for example, U.S. Pat. No. 4,817,091 to Katzman et al.
The processor 106 includes a CPU 110 and a memory 112 and is connected via a disk driver 132 and a disk controller 114 to a disk drive 116. The memory 112 includes a shared memory segment 124 (including QIO queues 125), an application process 120 and a disk process 122. The application and disk processes 120, 122 access the shared memory segment 124 through the QIO library routines 126. As is the nature of QIO, messages sent between the application process 120 and the disk process 122 using the shared memory segment 124 and the QIO library 126 are sent without duplication of data from process to process.
The processor 108 also includes a CPU 142 and a memory 144 and is connected via a LAN controller 140 to LAN 105. The memory 144 includes a shared memory segment 150 (including QIO queues 151), a TCP/IP process 146 and an NFS distributor process 148. The TCP/IP process 146 communicates through the shared memory segment 150 using the QIO library routines 152 with the NFS distributor process 148 and the software LAN driver 158. Again, communications using the QIO shared memory segment 150 do not involve copying data between processes.
The TCP/IP process 146 and the LAN 150 exchange data by means of the LAN driver 158 and a LAN controller 140.
The process 120 communicates over the IPB 109 with the TCP/IP process 146 using message systems (MS) 128 and 154 and file systems (FS) 130 and 156. Unlike QIO communications, communications using message systems and file systems do require data copying.
Thus, FIG. 1 shows a QIO shared memory system for communicating between processes located on a single processor. A shared memory queuing system increases the speed of operation of communication between processes on a single processor and, thus, increases the overall speed of the system. In addition, a shared memory queuing system frees programmers to implement both vertical modularity and horizontal modularity when defining processes. This increased vertical and horizontal modularity improves the ease of maintenance of processes while still allowing efficient transfer of data between processes on a single processor and between processes and drivers on a single processor.
FIG. 2 illustrates a computer system generally designated as 200. The computer system 200 contains nodes 210, 211, 212 and 213. The nodes 210, 211, 212 and 213 are interconnected by means of a network 220. The nodes 210, 211, 212 and 213 run a disk process 230, an application server process 231, an intermediate protocol process 232 and a TCP/IP and ATM driver 233, respectively.
The application server process 231 receives user requests for data and directs the transfer of that data to the user over the TNet 220. The data requested generally resides on disks accessible only via disk controllers such as the disk controller 240. In fact, access to the data on a disk controller is mediated by a particular disk process. Here, the disk process 230 on the node 210 mediates access to the disk controller 240. The disk process 230 is responsible for transferring data to and from the disk attached to the disk controller 240.
With regard to the system 200 of FIG. 2, assume that a multimedia application needs to obtain some large amount of data 260, say, an MPEG video clip, from a data disk. Assume that the application does not need to examine or transform any (or at least a majority) of the individual bytes of that MPEG video clip. The application seeks that data 260 because an end user somewhere on the net has requested that video clip. A user interface and the application server process 231 communicate using an intermediate protocol implemented on TCP/IP. (The user interface which may be an application process or may be a hardware device with minimal software. In any event, the user interface is not shown here.) Accordingly, the intermediate protocol information 262 must be added to messages from the application server process 231, and the intermediate protocol process 232 has the responsibility for attaching such header information 262 as the intermediate protocol requires. Likewise, TCP/IP protocol information 263 must then be layered onto the outbound message, and the TCP/IP driver process 233 in the node 213 supplies such TCP/IP headers 263 as the TCP/IP protocol requires. Therefore, to transfer the data 260 on demand from the disk attached to disk controller 240, the application server process 231 employs the disk process 230 to retrieve the data 260 from disk and employs the intermediate protocol and TCP/IP & ATM driver processes 232, 233 to forward the data 260 to the user interface.
Further assume that among its functions, the application process 231 attaches some application-specific data 261 at the beginning of the outgoing data 260.
When the application server process 231 recognizes that the disk process 230 mediates access to the data 260 for the requesting user's consumption, the application server process 231 communicates a message to the disk process 230 via the TNet 220 in order to retrieve that data 260.
The disk process 230 builds a command sequence which the disk controller 240 on receipt will interpret as instructions to recover the data of interest. The disk process 230 directs the disk controller 240 to transfer the data 260 into the memory 250 of the sub-processing system 210. The disk controller 240 informs the disk process 230 on successful completion of the directed data transfer.
The disk process 230 in turn responds to the application server process 231 that the data transfer has completed successfully and includes a copy of the data 260 in its response. Thus, the requested data 260 is copied into the application server node 211. As one of ordinary skill in the art will appreciate, several copies may be necessary in order to transfer the data 260 from the TNet driver buffers (not shown) of the application server node 211 into the memory space of the application server process 231. Yet another copy is typically necessary to make the application-specific data 261 contiguous with the disk data 260. The QIO system related above, however, may obviate a number of these intra-processor copies but obviates none of the interprocessor copies.
Indeed, the combined data 261, 260 migrates by means of another interprocessor copy from the node 211 to the node 212. The node 212 adds its intermediate protocol header data 262, probably by copies of the data 262, 261 and 260 into a single buffer within the memory of the intermediate protocol process 232.
Again, the combined data 262, 261, 260 migrates from the node 212 to the node 213 by means of another interprocessor copy. The TCP/IP process 233 desires to divide the combined data 262, 261, 260 into TCP/IP packet sizes and insert TCP/IP headers 263a, 263b, . . . , 263n at the appropriate points. Accordingly, the TCP/IP process 233 copies all or at least substantially all of the combined data 262, 261, 260 and TCP/IP header data 263a, 263b, . . . , 263n to fracture and reconstruct the data in the correct order in the memory 253. The TCP/IP protocol process 233 then transfers these packets to the ATM controller 270 which sends them out on the wire.
(A system designer may wish to separate the processing of layered protocols into separate sub-processing systems for reasons of parallelism, to increase the throughput of the system 200. Such subprocessing systems do not share memory in systems of this type in order to achieve greater fault tolerance and to avoid memory bottlenecks.)
A computer system of this art requires that the disk data 260 be copied five times among the sub-processing systems--and typically an additional 2-4 times within each sub-processing system not practicing QIO as related above. The computer system 200 consumes memory bandwidth at (a minimum of) five times the rate of a system wherein interprocessor copying was not performed. The copying presents a potential bottleneck in the operation of the system 200, wasting I/O bandwidth, memory bandwidth and causing cache misses in the target CPU, all reducing performance.
Accordingly, there is a need for a system which avoids interprocessor copying of data, while avoiding shared memory bottlenecks and fault tolerance problems.
Accordingly, a goal of this invention is a computer system which obviates unnecessary copying of data, both intraprocessor and interprocessor.
This and other goals of the invention will be readily apparent to one of ordinary skill in the art on reading the background above and the description below.