A computer system typically comprises a central processing unit coupled to a number of other modules, including memory, storage devices, a mouse, a keyboard, printers, and scanners. Some or all of these modules may reside within a single computer case. Alternatively, some modules may be external to a case holding the central processing unit. Input/Output (I/O) refers to the transfer of data to and from one or more of these modules.
Input/Output (I/O) schedulers are a component of a computer operating system and typically reside between a core kernel layer/file system layer and various hardware drivers for supporting hardware coupled to the computer. Such hardware typically includes hard disk drives, floppy disk drives, optical disk drives, printers, and scanners. I/O schedulers receive job requests from upper layers of the operating system and determine the order, number, and fashion in which those jobs are submitted to underlying hardware drivers. A job is an operation to be performed by the I/O scheduler as a result of a request submitted by a process. A process is an application program in execution. In the context of I/O and I/O schedulers, a process is a uniquely identifiable entity submitting I/O jobs to the hardware. The hardware drivers perform the actual I/O tasks.
FIG. 1 is a schematic block diagram representation 100 of the relationship between an I/O scheduler and other computer system components. Applications 110 occupy a highest layer and provide job requests to an operating system kernel and file system 120. The operating system kernel and file system 120 submits job requests to an I/O scheduler 130, which in turn distributes the job requests among disk drivers 140. Information relating to the processing of a job is returned from the disk drivers 140 to the applications 110, via the I/O scheduler 130 and the operating system kernel and file system 120. Such information may include how much data has been written to a disk drive or read from a disk drive.
I/O schedulers determine and monitor the nature of the workload being undertaken by a computer system by collecting heuristics on job requests submitted. A read is an I/O job requested by an executing application through the operating system kernel to a particular piece of storage hardware. The read job requests particular data to be read from the hardware. A write is an I/O job requested by an executing application through the operating system kernel to a particular piece of storage hardware. The write job requests particular data to be written to the storage hardware.
Typical heuristics monitored include the number of reads and writes submitted, the proportion of reads to writes, the average time taken for processes to submit subsequent jobs once an initial job completes, I/O throughput, and disk utilization. The I/O schedulers utilise the collated heuristics for fine-tuning the scheduling process.
FIG. 2 is a schematic block diagram representation 200 of the collection of heuristics in the computer system of FIG. 1. An operating system module 210 submits job requests to an I/O scheduler module 220. The I/O scheduler module 220 distributes the job requests among hardware drivers 230. The hardware drivers 230 perform the jobs and return information relating to those jobs to the I/O scheduler module 220. The I/O scheduler module 220 collects heuristics pertaining to the job requests received from the operating system module 210 and passes the heuristics recursively to be used by the I/O scheduler module 220 for fine-tuning of operating and performance parameters.
One goal of an I/O scheduler is to increase throughput and disk utilization. Another goal of an I/O scheduler is to provide a degree of fairness among various competing processes that submit jobs. There are a number of known scheduling algorithms that seek to achieve these goals. Such scheduling algorithms include First In First Out (FIFO) (also known as First Come First Served (FCFS)), Shortest Positioning Time First (SPTF), Anticipatory Scheduler (AS), Deadline scheduler, and the Fairness Queue scheduler.
The FIFO scheduler handles job requests on a first-in first-out basis. Thus, the first job received by the FIFO scheduler is the first job to be passed to a device driver.
The anticipatory scheduler attempts to anticipate successive job requests from a process. When a job has been performed by a device driver and the completed job has been returned to the requesting process, the anticipatory scheduler waits for a predetermined time in anticipation of another request from the same process. This property of the anticipatory scheduler is advantageous when a process is performing a synchronous read from a disk with a large proportion of the data being read having spatial locality in the disk. Anticipating a further job in the same location avoids overhead that otherwise is incurred in disk seek times. A description of this algorithm is provided in Iyer, S. and Druschel, P.—“Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O”, 18th Association for Computing Machinery Symposium on Operating Systems Principles (SOSP 2001). The anticipatory scheduling algorithm has recently been implemented for the Linux 2.6 kernel.
The Fairness Queue algorithm services each process having I/O job requests in a round-robin fashion to ensure that each process is treated fairly. The Fairness Queue algorithm is suited to situations in which many processes submit I/O jobs, without any pattern to the job requests.
The deadline scheduler attempts to pick the closest job, in disk spatial locality terms, to the current job. However, the deadline scheduler also maintains a certain priority for each job request process. The deadline scheduler reduces disk seek time, whilst ensuring that there is an upper limit to the latency that any job request can experience.
Due to the nature, variable frequency and competing demands of I/O jobs, it is difficult, if not impossible, for an I/O scheduler using any given scheduling algorithm to obtain the desired goals of maximum throughput and disk utilisation whilst maintaining fairness among competing processes. For example, if a first process is performing numerous and extended, or heavy, reads from a secondary storage device and an associated I/O scheduler is implementing a Fairness Queue algorithm, reads to the secondary storage device appear to be stalled as the scheduler treats the read process as just another process and allocates jobs in a round-robin fashion. An anticipatory scheduler, which waits for a process to submit job requests in the same spatial locality, provides better performance in such a scenario.
In a different example, in which numerous processes submit I/O job requests, such as in a web server, the anticipatory scheduler may appear to be biased towards a particular process. In such a scenario, the anticipatory scheduler waits for a further job request from a first process that submitted an initial job request, even if the first process has no further job request to submit. Consequently, other processes wait unnecessarily. Accordingly, the anticipatory scheduler might not be the optimal scheduler for the given circumstances. In such a scenario, a deadline or Fairness Queue algorithm may be a better choice.
Attempts to address every situation within a single scheduler usually result in a compromised implementation of any algorithm. The system fails to deliver the potential throughput capabilities of the system hardware. Utilising the wrong algorithms for given workloads typically results in large latencies, irrespective of the speed of the processor, amount of memory on board, or other system parameters. Accordingly, if a system is implemented with a single type of scheduler, then that system should be dedicated to a certain kind of workload to provide acceptable performance.
Some operating systems provide multiple I/O schedulers. Applications select an I/O scheduler, based on the nature of the workload. Thus, application designers must be able to predict the nature of I/O workload that the applications are likely to generate. As the nature of the workload is typically dynamic and quite variable, an I/O scheduler selected by an application may, in fact, result in degraded performance if circumstances are different from those predicted by the application designer. Whilst a predicted scheduler is suited to an application in isolation, the same scheduler can be detrimental to that application in a particular computing system having many other competing processes producing similar or different workloads. The overall workload seen by the I/O scheduler is the net result of all the workloads generated by the competing processes. Consequently, the result produced by the predicted scheduler may not be optimal for that application in a given computer system.
Another approach provides a system administrator with an interface for selecting an I/O scheduler. The interface may be implemented using one or more configurable system parameters. Whilst such an implementation provides a system administrator with the flexibility to change schedulers dynamically, the system administrator must monitor the nature of workloads being undertaken by the system. Further, in the scenario in which each I/O scheduler is implemented on a per-disk basis, as is common, the system administrator must attempt to monitor the nature of the workload on each disk in the system at all times to optimise the scheduling of tasks.
In each of the approaches described above, there is significant dependency on external interaction to select an appropriate I/O scheduler. Thus, a need exists to provide a method of selecting an I/O scheduler that is not dependent on applications or system administrators.