This invention relates generally to storage systems associated with computer systems and more particularly to providing a method for improving the data throughput associated with a storage system during long sequential input/output (I/O) transactions.
As it is known in the art, computer systems generally include a central processing unit, a memory subsystem and a storage subsystem. According to a networked or enterprise model of a computer system, the storage subsystem associated with or in addition to a local computer system, may include a large number of independent storage devices or disks housed in a single enclosure. This array of storage devices is typically connected to several computers over a network. Such a model allows for the centralization of data which is to be shared among many users and also allows a single point of maintenance for the storage functions associated with computer systems.
One type of storage subsystem known in the art is one which includes a number of redundant disk storage devices configured as an array. Such a system is typically known as a RAID storage system. One of the advantages of a RAID type storage system is that it provides a massive amount of storage (typically in the several gigabyte range) and depending upon the RAID configuration may provide several differing levels of fault tolerance. Fault tolerance is typically achieved by providing, in addition to the disk devices that are used for storing the data, a disk device which is used to store parity data. The parity data may be used in combination with the remaining data on the other disk devices to reconstruct data associated with a failed disk device.
A disk storage system such as the RAID system described above will typically include one or more front end (or host) adapters/controllers which are responsible for receiving and processing requests from the various host devices which may be connected to the storage system. Additionally, a RAID storage system as described above may also include several disk adapters/controllers which are used to control the transactions between the disk storage devices and the host controller/adapter described above. Some storage systems may also include a very large buffer (e.g. a cache memory) for buffering the data transfers between the disk adapters and the host adapters.
In addition, a requirement of typical present day storage systems is that they present (or emulate) a particular storage geometry to the host computer. The geometry includes the configuration of the storage devices (i.e. number of cylinders, heads, sectors per track, etc.). The geometry presented to the host may not be the actual physical configuration of the storage devices in the system. As a result, some level of translation must be carried out between the emulated storage parameters and the physical storage parameters.
The problem associated with front-end emulation and back-end control is to relegate the two responsibilities in two separate processors. The front-end processor manages all that concerns operation of the front-end (host) controller. That is, it maintains information about the file system such as list of logical volumes and logical subdivisions such as cylinders, tracks, etc. The back-end processor manages tasks which are transparent to the host such as data mirroring, data striping, RAID protection, concurrent copy and others. As such, the back-end processor is typically loaded with much more functionality than the front-end processor.
In a RAID storage system as described above, the storage system (front-end processor) may be connected to the host via a bus which has different physical characteristics (e.g., data throughput, contention time, etc.) than the bus which connects the physical devices to the storage controller (back-end processor). For example, the host adapter may be coupled to the network (i.e., host or requesting systems) via a so called wide SCSI bus operating at speeds which allow the transfer of data up to 20 megabits per second. The host adapter is then typically coupled to either a cache or directly to a disk adapter via a communication bus which allows for data transmission rates which are at least as fast as the wide SCSI bus. The disk adapters, however, may be connected to the associated disk storage devices by a so called narrow SCSI bus which runs at half the speed as a wide SCSI bus (i.e., data rates up to 10 megabits per second).
It will be appreciated by those of skill in the art that for the configuration described above, the transmission rate mismatch may result in a performance bottleneck during long sequential input/output activity. That is, during a read of large amounts of sequential data, the associated disk device will transmit its data to the disk controller at a rate, which is half the speed at which the host controller can transmit the data to the requesting device. Thus the host adapter, and, as a result, the host device spend an inordinate amount of time waiting for data and thus wasting processing time. Similarly, when a host device needs to write a large amount of sequential data to a particular disk storage device, the host device will be able to transmit the data to the host adapter at a rate which is twice as fast as the rate at which the disk adapter can transmit the data to storage device. As a result the host devices and its associated bus would be stalled while it waited for the disk adapter to transmit the associated data to the disk.
Two attempts at solving the above problem have included the use of pipelining techniques or so called prefetch mechanisms. During pipelining, for a read operation, the host adapter will be begin transferring the associated data to the requesting host device before the entire read has been satisfied from the disk adapter into the associated buffer or cache. The objective of this technique is to tune the data transfer such that data is transferred to the host device by the host adapter in blocks of data such that one block has been completely transferred to the host device just after the disk adapter has finished placing the next sequential block of data into the cache for transfer. A similar scheme is employed during writes of data with the transfer from data to the disk device from the host adapter.
The second approach which utilizes a prefetch scheme involves the use of caching algorithm which is designed to minimize the occurrences of cache misses. To do so, large amounts of data are prefetched from the disk storage device into the cache with the belief that the data prefetched will be that data which is requested by the host device and thus eliminate a second transaction from the disk adapter to the disk storage device to place the data in the cache for transfer to the host device. The drawback to the prefetching scheme is that its effectiveness depends crucially on the size of the cache and assumes that the disk adapter is not continuously busy fulfilling I/O requests.
It would be advantageous, therefore, to provide a system which allows long sequential I/O transactions between a disk storage device and a host device to occur without the usual bottlenecks associated with such a transfer.
In accordance with the present invention, a method of performing input/output operations between a requesting device and a responding device, where the responding device includes a plurality of storage devices, includes the following steps. In response to receipt of a write access request by a requesting device, where the write access is a write of a sequential block of data, portions of the block of sequential data are stored on each of the plurality of storage devices. In response to receipt of a read access request by the requesting device for the previously stored block of sequential data, an access request is generated to each of the storage devices on which the portions of data are stored. Each of the portions of the block of sequential data is retrieved from each of the plurality of storage devices and the retrieved portions are assembled into the original block of sequential data and the retrieved portions are transmitted to the requesting device as a block of sequential data. With such an arrangement, data may be delivered to a requesting device at a constant rate without stalling even though the communications path to any one of the storage devices is slower than the communication rate of the requesting device.
In accordance with another aspect of the present invention, a storage system is provided which includes a plurality of storage devices each configured to store blocks of data. The storage system further includes a plurality of storage controllers where each storage controller is coupled respectively to one of the storage devices. Each of the storage controllers is operable to transmit and receive data to and from its corresponding storage device. The storage system further includes a request controller which receives from a requesting device an access request for a sequential block of data. The request controller is coupled to each of the storage controller and is responsive to the access request for generating an access request to each of the plurality of storage controllers having the portions of data associated with the requested sequential block stored thereon. With such an arrangement, a storage system is provided which allows for increased performance for transactions involving large amounts of sequential data since the data may be stored and subsequently retrieved from several individual devices rather than retrieved as a stream from a single device.