The present invention relates to a hierarchically-configured parallel computer system and, more particularly, to a high-speed processor system that can perform high-speed parallel processing without requiring modification of existing programming styles, to a method of using the high-speed processor system, and to a recording medium.
A high-speed processor system that has a CPU and a low-speed large-capacity DRAM with cache memories has been known as a system for high-speed processing of large-sized data. Such a known high-speed processor system has, as shown in FIG. 1, a CPU 1 incorporating a primary cache, and a plurality of parallel DRAMs 2 connected to the CPU 1 through a common bus line, each DRAM 2 being equipped with a secondary cache 3 which serves to enable the DRAM 2 to process at a speed approximating the processing speed of the CPU 1.
In the operation of the circuitry shown in FIG. 1, contents of one of the DRAMs 2 are read in accordance with an instruction given by the CPU 1, and writing of information into the DRAM 2 also is executed in accordance with an instruction from the CPU 1. If the reading instruction hits, i.e., if the desired content to be read from the DRAM 2 is held in the cache 3, the CPU 10 can perform high-speed data processing by accessing the secondary cache 3. However, in case of a miss-hit, i.e., when the desired content does not exist in the cache 3, the cache 3 is required to read the target content from the DRAM 2.
The described basic configuration of the high-speed processor system having a processor, DRAMs, and caches is nowadays the dominant one, because it advantageously permits the use of an ordinary programming style for the control.
This high-speed processor system employing a hierarchical arrangement of caches, however, cannot perform parallel processing because it employs only one CPU 1. In addition, ordinary programming style is not inherently intended for parallel processing and cannot easily be used for running a parallel processing system unless it is modified, thus causing an impediment in practical use.
Under these circumstances, the present invention is aimed at providing a novel high-speed processor system, a method of using the high-speed processor system, and a recording medium for recording a computer-readable and computer-executable program.
In view of the foregoing, an object of the present invention is to provide a high-speed processor system that implements parallel processing without requiring any change or modification of a conventional programming style, a method of producing such a high-speed processor system, and a recording medium recording a computer-readable and computer-executable program.
In accordance with the present invention, there is provided a high-speed processor system, comprising: a CPU having a primary cache memory; a secondary cache memory arranged on a hierarchical level lower than that of the CPU, the secondary cache memory having a first MPU; and a plurality of main memories connected to the secondary cache memory and arranged in parallel with one another, each of the main memories having a tertiary cache memory provided with a second MPU; wherein each of the first MPU and the second MPUs has both a cache logic function and a processor function, thereby enabling distributed concurrent processing.
In the high-speed processor system of the invention, the tertiary cache memories may have a greater line size than that of the secondary cache memory which is greater than the line size of the primary cache memory.
The secondary cache memory is accessed as a secondary cache memory from the CPU and as a primary cache memory from the first MPU.
The tertiary cache memories are accessed as tertiary cache memories from the CPU, as secondary cache memories from the first MPU, and as primary cache memories from the second MPU.
Each of the data processing performed by the first MPU and the second MPUs is executed in accordance with a control protocol carried by a prefetch instruction or an intelligent prefetch instruction given by the CPU. Meanwhile, each of the first MPU and the second MPU selectively performs the data processing, depending on the data transfer size and data transfer frequency.
For instance, the first MPU executes mainly global transfer processing or a low-computation-level and high-transfer-rate processing by using data and programs stored in the plurality of main memories. The second MPU executes mainly local object processing by using data and a program stored in the associated single main memory.
The high-speed processor system may be implemented in a single chip as an ASIC-DRAM.
The present invention also provides a method of using a high-speed processor system which includes a CPU having a primary cache memory, a secondary cache memory arranged on a hierarchical level lower than that of the CPU, the secondary cache memory having a first MPU, and a plurality of main memories connected to the secondary cache memory and arranged in parallel with one another, each of the main memories having a tertiary cache provided with a second MPU, the method comprising: causing the CPU to execute mainly high-level arithmetic processings; causing the first MPU to execute mainly global transfer processings and low-level computation, and large-rate transfer processing; and causing one of the second MPUs to execute mainly local object processing by using, data and a program stored in the main memory associated with the second MPU, whereby distributed concurrent processing is performed.
Each of the data processings performed by the first MPU and the second MPU may be executed in accordance with a control protocol carried by a prefetch instruction or an intelligent prefetch instruction given by the CPU. Therefore, the high-speed processor is controlled with an ordinary programming style.
The high-speed processor system of the present invention may be implemented to comprise a CPU having a primary cache memory, and a plurality of main memories connected to the CPU and arranged in parallel with one another, each of the main memories having a secondary cache memory provided with an MPU, wherein each of the MPUs has both a cache logic function and a processor function, thereby enabling distributed concurrent processing.