Redundant Array of Inexpensive Disk (RAID) storage systems are the predominant form of data storage in computer systems today that require high performance and/or high availability data storage for use in applications such as transaction processing, banking, medical applications, e-commerce, database applications, internet applications, mail servers, scientific computing, etc. A RAID system typically includes a number of physically independent disk drives coupled to a RAID controller. The RAID controller is a device that interfaces to a group of physical disk drives and presents them as a single logical disk drive (or multiple logical disk drives) to a computer operating system. RAID controllers employ the techniques of data striping and data redundancy to increase performance and data availability.
Data storage subsystems are commonly used to store data that can either be accessed randomly or in a sequential fashion, based on characteristics of the requesting application program that runs on a host computer. For example, transaction processing or OLTP programs tend to access data in a random fashion whereas a video server would tend to access data in a sequential fashion. Although hard disk drive storage devices can be used to store either random or sequential data, access efficiency is usually different for random or sequential workloads.
Hard disk drives are well known in the storage art, and have various latencies specific to their technology. There is rotational latency, which is the time it takes for the accessed disk platter to rotate such that the data to be accessed is beneath the magnetic head. There is also seek latency, which is the time it takes for the disk drive magnetic head to slew radially to a position where the data to be accessed is beneath the magnetic head. In addition, there are latencies associated with disk drive electronics and firmware to process incoming commands, manage the onboard disk drive cache memory, and send appropriate positioning commands to electromechanical mechanisms. The combination of the various latencies determines the data access time from incoming command to data processing completion (whether read or write).
Furthermore, hard disk drive devices have onboard buffers of varying size that cache read and write data. Storage controllers manage these buffers via queue depth and I/O size parameters. Maximum concurrent I/Os are the number of read or write commands that a disk drive can process simultaneously using onboard memory. It is technology and manufacturer dependent, and is a function of disk cache buffer size and disk drive design. It ranges from a minimum of one to a present-day maximum of 16 to 32 or higher. I/O size is usually highly variable, and can range from a single block being 512 bytes in size to a Megabyte or more. Storage controllers manage the number of concurrent I/Os to each storage device. Based on empirical testing, the number of allowed concurrent I/Os to an individual storage device is generally lower than the maximum concurrent I/Os supported. This number of allowed concurrent I/Os is called queue depth. Sequential workloads are generally optimized by utilizing a low queue depth and large I/O size, while random workloads are generally optimized by utilizing a higher queue depth and a small I/O size.
Some storage controllers are designed to operate in an entirely sequential or entirely random environment. They are set up to provide efficient access in a given mode all the time, without any concern about a changing workload that may alternate between random or sequential access. Such controllers may work well for their intended purpose, and are outside the scope of this invention. Many if not most storage controllers, on the other hand, are designed and intended to be used for general storage requirements—where the workload is unknown and may be a combination of sequential and random access. There are several approaches to dealing with an unknown workload.
A first approach is to optimize the controller for a sequential workload, where sequential accesses are handled efficiently and with minimal latency and random accesses are handled relatively inefficiently and with correspondingly higher latencies. This works well if all or most host accesses are sequential in nature and random accesses, if they occur, are allowed to be inefficient. However, if the workload changes to predominantly random host requests or if random accesses must also be conducted efficiently if they occur, then this scheme will not work well as it is only efficient for sequential accesses.
A second approach is to optimize the controller for a random workload, where random accesses are handled efficiently and with minimal latency and sequential accesses are handled relatively inefficiently and with corresponding higher latencies or lower bandwidth. This works well if all or most host accesses are random in nature and sequential accesses, if they occur, are allowed to be inefficient. However, if the workload changes to predominantly sequential host requests or if sequential accesses must also be conducted efficiently if they occur, then this scheme will not work well as it is only efficient for random accesses.
A third approach is to optimize a controller for neither sequential nor random workloads, but rather, for a compromise or average of sequential and random workloads. This has the advantage of favoring neither sequential nor random accesses, but averages inefficiency among both workloads. While this yields improved performance for mixed sequential and random workloads over the other two approaches, it does not handle either sequential or random workloads as efficiently as possible.
Accordingly, it would be advantageous to have a storage controller that automatically and dynamically optimizes operations to individual physical storage devices which form the RAID arrays according to actual sequential or random workloads.