In general, in a large scale application using a computer or a processor and used for numerical fluid mechanics etc., the computer or processor has to have a performance exceeding several hundred GFLOPS. When realizing such a high performance, parallel processing is carried out. “Parallel processing” divides the processing among a large number of computers and processors to try to speed up the overall processing.
On the other hand, a memory configuration corresponding to the large scale application is necessary. This memory configuration may be classified into the following parallel systems.
(a) Central Memory Type
This type, as shown in FIG. 1A, is configured by a plurality of processors (described as CPUs in the figure) 1 having equivalent functions and one memory space (meaning a shared memory and hereinafter simply referred to as a “memory”) 3 connected to the CPUs 1 through an integration network 2. The CPUs 1 are usually provided with cache memories 4.
(B) Distributed Shared Memory Type
This type, as shown in FIG. 1B, has the CPUs 1 connected through the integration network 2 to dispersed local memories 3A to 3N (A to N are any natural numbers). The local memories 3A to 3N can be logically accessed as shared memory. In this example, illustration of the cache memories in the CPUs 1 is omitted.
(C) Distributed Memory Type
This type, as shown in FIG. 1C, has a plurality of system elements 4 comprising CPUs 1 and memories 3 integrated by an integration network 2. Access to the data in the memories 3 of the different system elements 4 is realized by communication through the integration network 2. In this example as well, illustration of the cache memories in the CPUs 1 is omitted.
As explained above, high end computers (distributed parallel computer systems) 10 in the field of science and technology are configured by large numbers of CPUs (tens to thousands of CPUs) and giant memories (logical/physical). FIG. 2 shows an example of a job execution environment in such a high end computer. In a high end computer, a plurality of parallel computer systems 5 are connected in parallel by an integration network 2. Each parallel computer system 5 includes a CPU 1 and a memory 3. Further, the integration network 5 has a disk device 7 such as a hard disk device, a magnetic disk device, or an optical disk device connected to it. The actual data, programs, etc. are stored in this disk device 7.
When a server 8 of the high end computer 10 is given a job from a user, the server 8 determines the parallel computer system 6 to be made to execute the job. This job 6 is sometimes carried out in one parallel computer system 5 and sometimes executed over a plurality of parallel computer systems 5. In this way, a high end computer 10 is an environment wherein a variety of large or small jobs are executed mixed together.
In such a job execution environment, operating technology is sought which enables the resources (CPUs and memories) of the large scale system to be efficiently made effective use of (high system operating rate) and realizes stable service to end users (jobs) (job execution time guarantee, low fluctuation execution time, and fine job control function).
In a conventional environment where a plurality of jobs operate in a computer system, in order to process a certain job in the set time, the practice has been to predict an ending time from an amount of processing of the job at the present point of time and control this job with priority over other jobs so as to bring the predicted end timing of this job close to the desired end timing. Further, when predicting the ending time of a certain job, the method has been adopted of predicting the ending time by taking into consideration fluctuation of the load of the computer system such as the amount of input/output amount of data in addition to the usage times of the CPUs (see for example Japanese Unexamined Patent Publication (Kokai) No. 5-265775).
Further, the usage time of a CPU expresses the performance in the case of ideal execution alone, so when predicting the performance of a CPU in an environment wherein a plurality of jobs are executed, the practice has been to add the actual CPU usage time (charged time) of the job and the time during which the CPU was used for jobs not to be added to the charged time so as to find and predict the total CPU usage time for each job (see for example Japanese Unexamined Patent Publication (Kokai) No. 5-289891). This prediction method was a method of predicting the performance of a CPU in an environment where a plurality of jobs are executed from the total CPU usage time obtained by adding the usage time of the CPU used for services carried out by a system demon for executing a job to the CPU usage time for the job.
In the technologies described in Japanese Unexamined Patent Publication (Kokai) No. 5-265775 and Japanese Unexamined Patent Publication (Kokai) No. 5-289891, however, in all cases, the CPU usage time for a job itself was considered constant (ideal) under all environments, therefore, deviation occurred in the prediction in a case of predicting the ending time of a certain job or a case of predicting the performance of a CPU in an environment where a plurality of jobs are executed. A cause of occurrence of this prediction deviation will be explained below.
In principle, when executing the same program a plurality of times or when simultaneously executing a plurality of jobs, the time from the start of execution of the processing to the end is regarded as the “elapsed time fluctuating every time” or “the usage time of the CPU constant every time”. However, the CPU usage time becoming constant every time is true only in an ideal computer. In an actual computer, the memory access performance is not constant every time, so the CPU usage time will fluctuate every time.
For example, when a CPU has a cache memory, a difference arises in the reading time of the data and therefore the memory access performance ends up becoming different depending on whether the data to be read out next exists in this cache memory (the cache is hit) or the data to be read out does not exist in the cache memory, but is to be newly read out from a shared memory etc. (cache missed). Further, in the case of a computer system having a plurality of system boards, for a job to be processed from then on, the memory access performance of the CPU ends up differing between a case where the CPU and memory accessed by it are arranged on the same system board and a case where the CPU and the memory accessed by it are arranged on different system boards. Further, when a plurality of jobs or parallel jobs are being executed by sharing a memory, the memory access wait time due to memory competition for a certain job fluctuates depending upon the state of the memory access load for every job, so the CPU usage time will dynamically fluctuate every time.
In this way, in the prior art, unless the CPU usage time of the job itself is ideal, the CPU usage time will fluctuate every time. When predicting the ending time of a certain job or when predicting the performance of a CPU in an environment wherein a plurality of jobs are executed, there were therefore the problem of deviation occurring in the prediction and the problem that the CPU usage time was not correctly charged.