1. Field of the Invention
The present invention relates to the field of computer systems. More specifically, the present invention relates to data transfer among devices within a computer system.
2. Art Background
The input/output (I/O) subsystem is an area of operating systems (kernels) that traditionally has received less attention than other areas with respect to increasing speed and efficiency. In the past, I/O devices were slow compared to central processing unit (CPU) and memory speeds, the software demands on the I/O system were modest, and most machines were uniprocessors.
Today, the I/O subsystems of most operating systems do not exhibit the expected paradigm shift from that of the old systems even though many of the basic assumptions on which those I/O systems were built are no longer valid. For example, when an application reads data from a disk file and sends the data that was read to a network device, there is an assumption that the data will be modified by the application prior to being sent to the network device. This assumption is founded on a data processing paradigm wherein an application receives data as input, and then modifies the data, before providing the modified data as output. Therefore, in anticipation of the data being modified, the data to be transferred is copied from the disk file into a buffer where it can be modified. In the case of a video server, for example, this data processing paradigm will not apply when the video server application does not modify stored video data, but instead only directs that the stored video data be sent to a network device without modification of the data.
The advent of high-speed network media such as 100baseT Ethernet and ATM has served to focus the spotlight on the inefficiencies of existing I/O and buffer management frameworks. Data copying between an application and an operating system (kernel) represents a major overhead in this problem space.
The normal interface for an application to retrieve data from a peripheral device is to issue a system call to read the data. This read interface can be represented generically as: read (descriptor, buf, length), where the application presents a buffer, xe2x80x9cbufxe2x80x9d, of size xe2x80x9clengthxe2x80x9d bytes for the system to fill in from a device/file represented by the descriptor. Similarly, the normal interface used by an application to send data to a peripheral device is to issue a system call to write the data. This write interface can be represented generically as: write (descriptor, buf, length), where the application presents a buffer, xe2x80x9cbufxe2x80x9d, of size xe2x80x9clengthxe2x80x9d bytes to the system to be written out. The application can then reuse this buffer as soon as the write call returns. In the case of memory objects, such as files or frame buffers, the application can also map the object as being read/write to memory. The above interface typically requires a copy from the user buffer to the kernel/device buffer when writing, and requires a copy from the kernel/device buffer to the user buffer when reading.
An improvement on the normal read/write interface is to implement a fast buffer (fbuf) interface. A key aspect of the fbuf interface is the explicit exchange of buffers between the various domains (address spaces) that handle the data. Caching of buffers is implemented to take advantage of locality in interprocess communication. Once data has been exchanged between a set of domains, the buffer used is saved for other data exchanges within the same set of domains.
The fbuf read interface can be represented generically as: read2 (descriptor, buf, length), where the application requests data to be read, but does not provide the buffer. The buffer is then allocated instead by the system. Similarly, the fbuf write interface can be represented generically as: write2 (descriptor, buf, length), where the buffer presented by the application is surrendered to the system before the call returns. An example of an fbuf mechanism appears in: Fbufs: A High-Bandwidth Cross Domain Transfer Facility, Peter Druschel and Larry L. Peterson, 14th ACM Symposium on Operating Systems Principles, 1993.
The advantage of the fbuf interface is that it enables a zero-copy data transfer mechanism when reading from, or writing to, a character stream-device (a device that permits input and output of arbitrary strings of characters, examples of which are network interfaces and serial line devices).
A significant disadvantage of the fbuf interface, however, is that the fbuf interface still requires a copy when reading from, or writing to, a disk file. This copy requirement applies, for example, when an application is reading data from a disk file and sending the data read to a network device. An example of a system that uses such an application is a video server. Using a fast buffer to transfer data from a file to the network still requires copying the data from the kernel file buffer (or a user buffer mapped to the disk file) into the fast buffer to maintain the semantics of the read operation. Thus the prior art does not address the issues involved with optimizing reads from disk to the network.
A method and apparatus is described that permits an application to control data transfer from a memory object of a source device such as a disk, tape or network to a sink device such as a disk, tape or network. The application can request that an operating system establish a mapping or association for the purpose of data transfer between a fast buffer and a memory object storing the data. The operating system then establishes the mapping between the fast buffer and the memory object thereby permitting the application to direct that the data of the memory object be transferred to the sink device. The sink device can use (direct memory access to the source device to transfer the data from the memory object. Furthermore, if the application modifies a portion of the data of the memory object prior to directing the transfer, only the modified portion of the data is copied to main memory prior to transfer to the sink.
The mechanism described herein allows data to be transferred in a general purpose computing environment from a source to a sink without copying to main memory. This mechanism can operate within a general purpose operating system to provide a general mechanism that interoperates well with different devices, file systems and user applications. The transfer can be performed at the most efficient part of the system for a given source and destination. For example, if the source memory object represents a file and the destination is a network with both the network adapter and the disk adapter coupled onto the same I/O bus, a direct memory access (DMA) can be initiated directly from the disk adapter to the. network adapter without consuming any memory bandwidt. Previously, this operation would have required a transfer from the disk to kernel main memory, followed by a transfer from kernel memory to user buffers, then transferring data from the user buffers to the network device buffers, and then finally, transferring the data to the network.