The present invention relates generally to high-speed data processing by a computer equipped with a cache memory. More particularly, the invention is concerned with a data processing method and a data processing system of an internally reconstitutible or changeable structure type in which program executing methods or modes can be changed in correspondence to parallelistic feature (referred to as the parallelism) inherent to programs to be executed.
In general, there are known two types of implementation or structurization methods in conjunction with a data processing system which includes a plurality of processor elements, wherein the individual processor elements executes instructions in parallel with one another. A system to which the first structurization method is applied is a multiprocessor type computer system, while a system to which the second structurization method is applied is a super-scalar type computer system.
In the case of the multiprocessor type computer (or data processing) system, a plurality of processor elements operates by sharing memory equipment (a main memory) or a cache memory. In this connection, typical cache memory structuring schemes in the multiprocessor computer system are disclosed in JP-A-56-127261 and James R. Goodman: USING CACHE MEMORY TO REDUCE PROCESSOR-MEMORY. TRAFFIC: The 10th Annual International Symposium on COMPUTER ARCHITECTURE. Vol. 11, No. 3, Jun. 13-17, 1983. In this system, the individual processor elements execute distinct programs independent of one another.
The multiprocessor computer system is certainly suited for execution of parallel processing on a task basis. However, no consideration is paid to the parallel processing for a plurality of instructions contained in a program. Consequently, there arises a problem concerning inter-data consistency between or among a plurality of caches when the amount of data shared by the processors increases. By way of example, invalidation of data writing to the cache memory frequently occurs disadvantageously, lowering the cache hit ratio.
In contrast, in the case of the super-scalar computer system, a plurality of (or n, n being an integer) processor elements executes a plurality of instructions existing in a program in parallel and in synchronism, whereby parallel execution can be processed in great detail. As one type of the super-scalar computer system, there is known a VLIW (Very Long Instruction Word) computer system. Typical examples of the multiprocessor system, the super-scalar system and the VLIW computer system are, respectively, disclosed in the literatures mentioned below:
S. Thakkar et. al.: "The Balance Multiprocessor System", IEEE MICRO, 1988. 2, pp. 57-69;
K. Murakami et. al.: "SIMP (Single Instruction stream/Multiple instruction Pipelining): A Novel High-Speed Single-Processor Architecture", Proc. 16th International Symposium on Computer Architecture, 1989, pp. 78-85; and
H. Hagiwara et. al.: "A Dynamically Micro-programmable computer with Low-Level Parallelism", IEEE Trans. C-29, No. 7, 1980, pp. 577-595.
In conjunction with the super-scalar computer system, it is noted that even when hardware is implemented by using n (n beind an integer) processor elements for executing instructions in parallel, there may occur such situation that only a limited number of processor elements implemented in hardware can be operated unless parallelism is found in a program to be executed. In an extreme case, when a program exhibiting no parallelism at all among the successive instructions is to be executed, only one of the n processor elements that operates in reality. The remaining (n-1) processor elements are thereby rendered useless. Assuming that this sort of program exists in a number of n and that the time taken for executing each of these programs is given by Si (1.ltoreq.i.ltoreq.n), then the time required for execution of all the programs amounts to .SIGMA. Si (where .SIGMA. represents total sum).
In contrast, when the programs of the above-mentioned type are executed by using the multiprocessor computer system, execution may often be realized very effectively. More specifically, so long as no problem arises even when the order or sequence of execution of the individual programs is changed (such as in the case of execution of n different user programs, as encountered frequently in routine works), it is possible to execute the programs with n processor elements independent of one another. In that case, the execution of all the programs should ideally be completed within a maximum one of the times required for execution of the individual programs, respectively. In practice, however, the time taken for completing execution of all the programs will go beyond the aforementioned maximum time because of possible occurrence of conflict or competition for a cache memory and/or a main memory among the processor elements. Nevertheless, the multiprocessor computer system is more advantageous than the super-scalar computer system.
It should, however, be noted that the multi-processor computer system may encounter an unfavorable situation. By way of example, there may be mentioned such environmental conditions under which a plurality of programs are not allowed to be executed simultaneously and applications where the number of processor elements is greater than that of programs to be executed. More specifically, in the case of the multiprocessor computer system, only the parallelisms that correspond to the number of programs to be executed can be realized. Accordingly, when the number of the programs is represented by p with the number of the processor elements being represented by n (n being an integer), (n-p) processor elements may remain useless in case the number n of the processor elements is greater than that of the programs. To the contrary, in the super-scalar computer system, the processor elements can effectively be protected against being rendered useless, when the number of programs is smaller than the processor elements, because the parallelism may even be found in a single program.
As will be understood from the foregoing, optimum computer structure or architecture may frequently become different in dependence on the program(s) to be executed.