1. Field of the Invention
The invention relates generally to data storage systems associated with computer systems and, in particular, to a method and system for improving the transfer of data to and from a storage system that has been configured to act as a RAID-1 system by storing two sets of duplicated data and which uses application-specific input/output characteristics.
2. Description of the Related Art
During the last decade, the amount of data to be processed, stored and accessed by certain industries, such as banks, financial and insurance institutions, automobile manufacturers and airlines as everyday, normal business operations, in particular the number of data accessing requests, have exploded. These vastly increased data processing needs have spurred the creation of new systems for storing and accessing data, for example, Redundant Arrays of Inexpensive Disks (RAID) and Storage Area Networks (SAN), as well as the development of faster computer-to-storage device interface technology and protocol standards, such as Fibre Channel standards and Small Computer System Interface (SCSI) by which to improve the rate of data transfer, i.e., data throughput.
Also accompanying this explosion in data processing needs has been a concomitant increase in the number of patents related to improving the performance of storing and accessing data using the new storage system technologies. For example, a simple search for United States patents shows that there have been at least 50 patents this year alone that relate to arrayed memory in a computer. For example, U.S. Pat. No. 6,076,143 to Blumenau which discloses a method that manages the writing of data blocks to be accessed on a disc drive storage system whereby a data block is written to one disc at a different physical sector address than the address to which the same data block is written to on a redundant disc. Also, U.S. Pat. No. 5,974,502 to DeKoning et al. provides a method for improving throughput between the host computer and an array of disk drives by splitting up large I/O requests from the computer into smaller, more manageable pieces and processing these requests as if they were individual I/O requests. Further, U.S. Pat. No. 5,787,463 to Gajjar introduces a dual-ported staging memory between the RAID engine and the host bus so that both the host and the RAID engine can concurrently access such stage memory, thereby allowing asynchronous memory operation.
These patents seek to improve data transfer or throughput in a disc drive storage system largely by focusing on upgrading the performance, sequencing, or timing of the storage hardware. However, there is also another approach to improving data throughput, by taking into consideration the kinds of data accessed, the kinds of application software used to input the data as well as the kinds of data requests processed. These are especially important considerations in the industries named above, inasmuch as in these and many other industrial contexts, data is processed in two very different yet predictable ways: first, decision support system processing and second, transaction processing.
The conflicting behaviors of Transaction Processing (TP) Applications and Decision Support System Applications (DSS) have caused the replication of data to flourish and created huge costs and latencies in order to speed up the storing and accessing of data. For example, users of the DSS Applications in a banking context are usually requesting either reports or performing complex arithmetic operations that involve reading out from storage disks a long and huge stream of data, which typically requires the disk head to move sequentially around the platter from sector to sector that are more or less adjacent to each other. On the other hand, users of the TS Applications are usually writing in or requesting short blocks of data that are not written in or read out sequentially but are stored or accessed across platter tracks in a manner that typically requires the disk head to xe2x80x9cskipxe2x80x9d all over the platter.
Conflicts inevitably arise when one disk head is called upon both to read out long streams of sequentially-stored data and to read and write short bursts of non-sequentially-stored data. In short, in responding to requests for processing from both a TP software and a DSS software, the disk heads will be working at cross purposes, which implicates that the physical data path from disk to storage cannot be shared for processing requests from these two kinds of software.
Because DSS software typically read data sequentially, TP software generally does not allow real time access to the DSS system, in order not to negatively impact business performance. Due to the disparity between how DSS applications and TP applications are stored and accessed, users in the data warehouse, datamart and data mining lines of a business, those who typically use the DSS software have had to create copies of the xe2x80x9creal-timexe2x80x9d data in order to crunch or report on them. This need to duplicate data within an enterprise in order to have them available for different processing needs has in turn created a massive sub-industry of copy management as well as fostered data bandwidth and CPU capacity obstacles.
To solve the problems inherent in required data duplication due to different processing needs, an enterprise can rely on a storage system that either has two sets of disk heads or that has been configured to operate as if there are two sets of disk heads. A storage system that stores duplicate data is a RAID-1 engine, which is an array of paired storage devices. A storage system that does not actually comprise a RAID-1 engine may nevertheless be configured to store duplicate sets of mirrored data and so operate as if it were a RAID-1 configuration.
The present invention provides a method of accessing and storing data in a memory system communicating with one or more computers generating read and write requests. The memory system comprises a controller, a memory cache for temporarily storing data. The memory cache comprises an A-cache and a B-cache, and a pairwise-redundant direct access storage device comprising an A-DASD and a B-DASD. The B-cache is a read-ahead cache of data read from B-DASD.
One embodiment of a method of the present invention comprises the steps of providing an A-interface and a B-interface to the memory system, configuring transaction processing applications on a computer communicating with the, memory system to direct read and write requests to the A-interface, configuring decision support system applications on a computer communicating with the memory system to direct read and write requests to the B-interface, fulfilling write requests received at the A-interface by writing data to-the A-cache, fulfilling write requests received at the B-interface by writing data to the A-cache.
The method also comprises the steps of fulfilling read requests received at the A-interface by reading data from the A-cache whenever it contains the requested data or else reading data from the A-DASD, fulfilling read requests received at the B-interface by reading data from the B-cache whenever it contains the requested data or else reading data from the B-DASD, writing data, not yet been committed to A-DASD, from the A-cache to the A-DASD whenever the A-DASD is not fulfilling a read request, and writing data that has not yet been committed to B-DASD, from the A-cache to the B-DASD whenever the B-DASD is not fulfilling a read request. The average time for fulfilling read requests is improved over that of a corresponding memory system using a RAID-1 controller.
Another embodiment of a method of the present invention further comprises the steps of interrupting, whenever A-cache becomes full, the flow of data at A-interface and B-interface, including any read operation from B-DASD, writing to B-DASD records in A-cache that are changed but not yet committed to B-DASD, in the preferential sequence of those records which are logically in read sequence before the current reading position of B-DASD, and then, if additional records must be written in order to generate sufficient space in A-cache, and those records which are logically in read sequence after the reading position of B-DASD and are most distant from the current reading position of B-DASD, such that space in A-cache has been freed. This embodiment then allows the flow of data at A-interface and B-interface to resume, including any interrupted read operation from B-DASD. The likelihood is thereby minimized that the data read from B-DASD in a resumed read operation was changed from the corresponding data before the write operation to B-DASD records in A-cache was performed.
A further embodiment of the method comprises either of the above embodiments wherein A-cache contains the records in the memory cache that have been generated from input from A-interface and the records that have been read from A-DASD and wherein B-cache contains the records in the memory cache that have been read from B-DASD.
The present invention also provides a data structure for an A-cache in a memory system that comprises a pairwise-redundant direct access storage device having an A-DASD and a B-DASD. The data structure comprises a plurality of records in a rapidly accessible cache memory. Each record comprises an entry comprising one or more fields, which correspond to an address on the direct access storage device, a flag indicating whether the record in the memory cache has been changed by new input since being committed to A-DASD or B-DASD, a flag indicating whether the record has been committed to A-DASD, a flag indicating whether the record has been committed to B-DASD, and a data field.
The present invention also provides an improvement to a memory system that communicates with one or more computers, which generate read and write requests. The memory system comprises a controller, a memory cache for temporarily storing data, and a mirroring direct access storage device comprising an A-DASD and a B-DASD. The improvement comprises an A-interface receiving read and write requests generated by transaction processing software running on a computer, a B-interface receiving read and write requests generated by decision support software running on a computer, the B-interface being configured to send write requests to the A-interface, an A-cache, to which is sent all read and write requests received by the A-interface, a B-cache, to which is sent all read requests received by the B-interface, and a controller programmed to cause the changed contents of the A-cache to be written to the A-DASD when the A-DASD is not being read from and to be written to the B-DASD when the B-DASD is not being read from. The average time for fulfilling read requests is improved over that of a corresponding memory system using a RAID-1 controller.
The present invention provides another embodiment of the above improvement to a memory system, wherein the controller is also programmed to interrupt the flow of data from the A-interface and write to the B-DASD when the A-cache is full. The present invention provides a still further embodiment of the previous improvement, wherein the controller is programmed to write to the B-DASD in a sequence that minimizes the likelihood that an interrupted long sequential read being performed on B-DASD will, when resumed, read data that was changed during the write that was programmed.
The present invention provides an improved RAID-1 controller for an A-DASD and a B-DASD that also comprises programming resident in the memory of the controller. The programming provides for an A-interface and a B-interface at which read and write requests may be received and executed. The programming directs write requests received at the B-interface to the A-interface, operates an A-cache that receives data from the A-interface which is requested to be written to storage and causes the data to be written immediately to the A-DASD when not otherwise occupied and, to the extent permitted by the availability of cache memory, avoids writing to the B-DASD until completion of a long sequential read therefrom. The A-interface is thereby optimized to process read and write requests for shorter blocks of data and the B-interface is optimized to process read requests for relatively longer blocks of data.
Another embodiment of the present invention of a RAID-1 controller provides that the controller operates the A-cache so as to comprise a plurality of records which indicate whether or not the data in each such record has been committed to A-DASD and whether or not the data in each such record has been committed to B-DASD.
The present invention provides a computer system having a data storage system with improved throughput, wherein a read request from decision support application software generally accesses a long sequence of data blocks and a read or write request from transaction processing application software generally accesses non-sequentially read or written data blocks. The system comprises a host computer, a storage subsystem to which data blocks are transferred to and from the host computer. The storage subsystem comprises a storage device, a memory cache for temporarily storing data blocks being transferred between the host computer and the storage device, a pairwise-redundant disk configuration of the storage device whereby the configuration provides for creating a redundant pair of data sets. The industry-standard protocols are used for interfacing the storage subsystem with the host computer.
The storage subsystem also comprises a controller for the storage device that configures the storage device whereby a data block is stored twice, into a first and a second of a pair of storage sets for the purpose of storing the data in a pairwise-redundant manner, stores data from non-sequentially written data blocks into the first storage set while and as responding to a processing command from a decision support system software to read out sequentially-read data blocks and stores data from non-sequentially written data blocks into the memory cache while and as responding to a read command from the decision support system software to read out sequentially-read data blocks, so long as the storage capacity of the memory cache has not been reached. Further, so long as the storage capacity of the memory cache has not been reached, the storage subsystem transfers the set of data blocks stored therein into the second storage set upon completion of processing a read request from decision support system software to read out sequentially-read data blocks. When the storage capacity of the memory cache has been reached, the storage system interrupts the processing of a read request from decision support system application software to read out sequentially-read data blocks by transferring the set of data blocks stored in cache memory into the second storage set.
Alternate embodiments of a computer system of the present invention use industry standard protocols that may comprise Fibre Channel standards, SCSI standards, IDE/ATA standards, and PCI standards.
An alternate embodiment of the system comprises a plurality of storage devices and a storage device controller that further configures an even number of storage devices so that there are discrete pairs of storage devices whereon the same sets of data blocks are stored twice, as a first storage set and a second storage set in a pair, creating pairwise-redundant sets of data. The controller stores a set of nno-sequentially written data blocks into the first storage set while and as responding to a processing command from decision support system application software to read out sequentially-read data blocks. So long as cache memory has not been exceeded, the controller transfers the set of data blocks stored therein into the second storage device upon completion of processing a request from decision support system application software to read out sequentially-read data blocks. When cache memory is full, the controller interrupts the processing of a request from decision support system application software to read out sequentially-read data blocks by transferring the set of data blocks from cache memory into the second storage device.
An alternate embodiment of the system comprises storage devices that include at least one pair of RAID disk drives, a pair of storage devices within a storage area network, or pair of CD-ROMs. An alternate embodiment of the system comprises a storage device controller that uses industry standard protocols comprising Fibre Channel standards, SCSI standards, IDE/ATA standards, PCI standards, or Internet Protocol standards.
The present invention also provides a machine readable medium containing executable code, which optimizes the read-write throughput of a programmed general purpose computer comprising a memory system of the present invention by directing write requests and non-sequential read requests to the A-interface and which directs sequential read requests to the B-interface.