The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices. However, even today's most sophisticated computer systems continue to include many of the basic features that were present in some of the first computer systems. One such feature is a computer system's use of a program to control its actions. These computer programs are a collection of instructions or "jobs" that the computer performs in response to various commands. Because of this, the performance of a computer systems is directly related to speed at which it can process its jobs.
A computer performs a wide variety of jobs during the execution of a program. In single-threaded computer systems, a job is executed sequentially until the job is completed. In contrast, parallel process, multi-tasking or multi-threaded computer systems divide the job into independent tasks and can execute multiple tasks simultaneously. The multiple tasks can be executed using multiple central processing units (CPUs) to execute parallel tasks or by providing software that can make several tasks (or threads) active simultaneously by switching back and forth between active tasks. By executing multiple tasks simultaneously, a parallel processing computer is able to execute program code and perform tasks more efficiently than a single-threaded computer system. The performance increase of a multi-tasking computer system is much more dramatic as multiple processors (i.e., CPUs) are added to provide concurrent processing.
In some cases simultaneous tasks will have conflicting resource needs. For example, in a parallel processing systems, two simultaneous tasks may each wish to access different data at the same storage device. If the data storage device can only handle one task request at a time, the second task will have to wait for the first to finish. This in effect makes the parallel process computer behave as a single-threaded computer for these conflicting tasks by serializing the execution of conflicting tasks, thereby degrading the performance of the computer system.
One particular area where parallel processing systems commonly have resource conflicts is in file read operations. Computer files are typically stored on direct access storage devices, called DASDs. The most common example of a DASD is a hard disk drive, but any storage device that allows direct access to the data it contains may be classified as a DASD. Each DASD accesses its data through one or more access means, typically referred to as DASD arms, because a typical disk drive uses an arm to read data off its surface. For the discussion herein, it is assumed that a DASD has a single arm to access its data, realizing that other DASDs may have multiple arms.
A large computer system typically uses multiple DASDs in a parallel arrangement to store data and program files. Because of this, computer files are generally spread out across multiple DASDs, resulting in different file portions being stored under different arms. With a file spread out across multiple DASDs (and hence under multiple arms), a parallel process computer is able to read the file in parallel fashion with simultaneous tasks reading data from different DASDs (and hence, different arms) simultaneously. This results in faster and more efficient access time for the computer system. If, however, parallel tasks try and access data under the same arm at the same time, one task must wait for the other task to finish before it can proceed. This is called arm contention and results in the read tasks becoming serialized instead of executing in parallel, with read performance being degraded.
In prior art methods for executing read jobs in parallel process computer systems, a round robin approach was used to assign the various tasks to read portions of the file. This involves simply running through the file and assigning tasks to processes in a straight sequential order. This approach is inefficient when the file is not evenly distributed across DASDs. In particular, where the file distribution is heavily skewed toward a few DASDs, the probability of arm contention is increased, especially at the end of the read job where it is more likely that the various tasks will each be attempting to access data under the same arm (i.e., on the same DASD). As a result, tasks that could theoretically run in parallel must wait in line for sequential access to the DASD, creating a bottle-neck in the reading of the file from the DASDs.
As described above, the known methods of parallel processing suffer from drawbacks. Without improved methods and apparatus for allocating simultaneous tasks to different DASD arms during a file read operation, arm contention will continue to be an impediment to system performance.