1. Field of the Invention
This invention relates generally to computer storage, and more particularly to load balancing techniques in a multi-path storage systems.
2. Description of the Related Art
Computer storage systems, such as disk drive systems, have grown enormously in both size and sophistication in recent years. These systems typically include many large storage units controlled by a complex multi-tasking controller. Large scale computer storage systems generally can receive commands from a large number of host computers and can control a large number of mass storage elements, each capable of storing in excess of several gigabytes of data.
FIG. 1 is an illustration showing a prior art computer storage system 100. The prior art computer storage system 100 includes computer systems 102, 104, and 106, and workstations 108 and 110 all coupled to a local area network 112. The computer systems 102, 104, and 106 are also in communication with storage devices 114 via a storage area network 116. Generally, the computer systems 102, 104, and 106 can be any computer operated by users, such as PCs, Macintosh, or Sun Workstations. The storage devices can be any device capable of providing mass electronic storage, such as disk drives, tape libraries, CDs, or RAID systems.
Often, the storage area network 116 is an Arbitrated Loop, however, the storage area network 116 can be any storage area network capable of providing communication between the computer systems 102, 104, and 106, and the computer storage devices 114. Another typical storage area network is a Fabric/switched storage area network, wherein the storage area network 116 comprises several nodes, each capable of forwarding data packets to a requested destination.
In use, the computer systems 102, 104, and 106 transmit data to the storage devices 114 via the storage area network 116. The storage devices 114 then record the transmitted data on a recording medium using whatever apparatus is appropriate for the particular medium being used. Generally the conventional computer storage system 100 operates satisfactorily until a failure occurs, which often results in data loss that can have catastrophic side effects.
It is more than an inconvenience to the user when the computer storage system 100 goes xe2x80x9cdownxe2x80x9d or off-line, even when the problem can be corrected relatively quickly, such as within hours. The resulting lost time adversely affects not only system throughput performance, but also user application performance. Further, the user is often not concerned whether it is a physical disk drive, or its controller that fails, or if there is no drive response because too much data has been sent down the data to the storage device. It is the inconvenience and failure of the system as a whole that causes user difficulties.
As the systems grow in complexity, it is increasingly less desirable to have interrupting failures at either the device or at the controller level. Further, storage devices, such as disks, may be limited by the number of access or retrieval streams capable of accessing the files on the storage devices. When a large number of users demand access to files on a disk, their access may require the use of all available retrieval streams to the disk, and a condition known as disk bottlenecking will occur. Essentially, with a bottlenecked disk, no bandwidth is available to service any further user requests from that disk. Disk bottlenecking, or as it is sometimes referred, load imbalancing, typically occurs when user demand for access to files on a disk is greater than the number of retrieval streams capable of accessing the files. When disk bottlenecking occurs a particular number of users desiring access to the files on the disk are incapable of accessing the desired files due to the unavailability of retrieval streams.
In view of the foregoing, there is a need for method that can provide load balancing for storage devices. The method should have the capability to automatically detect the failure and load imbalance and act to address the failure or imbalance in manner that is transparent to the user. The method should be capable of increasing system reliability while not interfering with the production of the user.
Broadly speaking, the present invention fills these needs by providing an intelligent load balancing method that balances traffic among data paths providing access to a specific I/O device. In addition to load balancing, the embodiments of the present invention further detect data path errors and modify data path selection based on both I/O traffic and data path status, such as connection state. In one embodiment, a method is disclosed for intelligent load balancing in a mult-path computer system. Initially, an input/output (I/O) request to access the computer I/O device is intercepted. Then, to properly balance data path traffic, the number of I/O requests that have been sent along each data path of a plurality of data paths providing access to the computer I/O device is detected, and a failure probability is calculated for each data path paths based on the number of I/O requests that have been sent along each data path. A data path is then selected that has a failure probability lower than the failure probability of other data paths of the plurality of data paths and the computer I/O device is accessed using the selected data path.
In another embodiment, a system for providing intelligent load balancing in a multi-path computer system is disclosed. The system includes a processor and a computer I/O device in communication with the processor via a plurality of data paths. Further included in the system is a failover filter driver that is in communication with the plurality of data paths. The failover filter driver can determine the number of I/O requests previously sent using each data path in the plurality of data paths. In addition, the failover filter driver can perform load balancing by selecting a particular data path from the plurality of data paths based on the number of I/O requests sent using each data path.
In a further embodiment, a failover filter driver for providing intelligent load balancing in a mulit-path computer system is disclosed. The failover filter driver includes an intercept code module that intercepts I/O requests to a computer I/O device from an operating system. In addition, a manual-select code module is included that selects a data path from a plurality of data paths to the computer I/O device based on data path information provided from a requesting computer application. Further, an auto-select code module is included that selects a data path from the plurality of data paths based on the number of I/O requests previously sent along each data path in the plurality of data paths.
Advantageously, the embodiments of the present invention provide intelligent load balancing in multi-path computer systems, which results in greatly increased reliability. Since multiple data paths to a particular storage device can increase the reliability of successful I/O requests to the device if properly balanced, the ability to automatically detect failures and perform load balancing among the data paths greatly increases system reliability. Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.