Recent arithmetic processing apparatuses that include processor cores, such as a CPU, are generally provided with cache memories to increase processing speeds. A cache memory is provided between a main storage unit, such as a main memory, and a processor core and temporarily stores data that the processor core frequently uses. When executing arithmetic processing, the processor core reads the data from the cache memory, which is closer to the processor core than to the main memory, so that the time required for memory access can be shortened.
In conventional arithmetic processing apparatuses, data that is processed through arithmetic processing by a processor core is temporarily stored in a cache memory and the data is written to a main storage unit at predetermined intervals of time. However, when data is written to the main storage unit via the cache memory, the time required for writing the data becomes accordingly longer (see, for example, Japanese Laid-open Patent Publication No. 63-20640). In order to improve the performance of arithmetic processing apparatuses by shortening the time necessary for writing data, arithmetic processing apparatuses are, in some cases, provided with a data path that directly connects a processor core and a main storage unit. FIG. 7 illustrates one example of an arithmetic processing apparatus that is provided with such a data path.
As illustrated in FIG. 7, an arithmetic processing apparatus 500 includes a processor core (hereinafter, “core”) 501, a first queue, 502 a second queue 503, and a third queue 504. The arithmetic processing apparatus 500 further includes a selector 505, a data memory 506, a control unit 507, and a memory access controller (MAC) 508.
The core 501 is an arithmetic processing unit that executes various types of arithmetic processing using data that is stored in the data memory 506. The first queue 502 is a queue that temporarily stores data that is written back to the data memory 506 from the core 501. The second queue 503 is a queue that temporarily stores data that is written back to a main storage unit (not illustrated) from the data memory 506 via the MAC 508 when a cache replacement is performed. The third queue 504 is a queue that temporarily stores data that is transferred to the data memory 506 from the MAC 508 when a move in is performed on the occurrence of a cache miss.
The selector 505 selects any one of the data that is stored in the first queue 502 and the data that is stored in the third queue 504 and outputs the selected data to the data memory 506. The data memory 506 temporarily stores data that is frequently used by the core 501 and data that is processed by arithmetic processing in the core 501. The control unit 507 writes or reads data by pipeline processing according to instructions from the core 501. Specifically, the control unit 507 includes a move-out (MO) port unit 511, a priority unit 512, a pipeline 513, and a tag memory 514.
Order responses from the core 501, such as data writing and data reading, are set by the MO port unit 511. The priority unit 512 makes adjustments (data interference control) and inputs data to the pipeline 513. The tag memory 514 stores physical addresses of the data that is stored in the data memory 506 and stores logical addresses that are used for searching tags. When a data writing request is input from the priority unit 512, the pipeline 513 searches the tag memory 514 according to the logical address contained in the request and specifies the physical address of the data that is requested by the core 501. The MAC 508 is connected to the main storage unit (not illustrated) and writes data that is received from the second queue 503 to the main storage unit. If a cache miss occurs, for example, the MAC 508 receives the cache miss data from the main storage unit and transfers the cache miss data to the third queue 504.
In this case, if the latest data is stored in the data memory 506, a data path L20 from the data memory 506 to the MAC 508 is used to write the data to the main storage unit. For writing back data to the data memory 506 from the core 501, a data path L10 is used that extends from the core 501 to the data memory 506 via the first queue 502 and the selector 505.
If the latest data is stored in the core 501 and the data memory 506 stores only old data, a data path L30 for transferring data from the core 501 directly to the MAC 508 is used to write data to the MAC 508. Because a data path L30 that directly connects the core 501 and the MAC 508 is provided, the latest data that is stored in the core 501 can be quickly written to the main storage unit without going via the data memory 506.
However, newly providing a data path that directly connects a core and an MAC increases wiring costs. This is apparent particularly in a CPU including multiple cores and multi-bank storage units. These problems are specifically explained below.
Increases in power consumption in recent single-core CPUs, each of which includes one core, is not ignorable and the performance improvement is approaching its limit. Approaches to further improve the CPU performance are made in some cases using a multi-core CPU that includes multiple cores on a board. Furthermore, in addition to providing multiple cores, approaches to improve the throughput between each core and the cache memory or the main storage unit are made in some cases by dividing the cache memory and the main storage unit into banks. FIG. 8 is a diagram of a schematic configuration of a conventional CPU.
As illustrated in FIG. 8, in a CPU 700 that includes multiple cores and multi-bank storage units, cores #0 to #7, data memories #0 to #3, and MACs #0 to #3 are arranged near the periphery of the board. At the center of the board, a control unit that controls the entire data transfer is located. Because the multi-bank main storage units respectively store different types of data, each of the cores #0 to #7 may write data to all of the MACs #0 to #3. Therefore, all the cores may be connected to all the MACs to provide data paths that directly connect the cores and the MACs, which increases wiring costs.
More specifically, in the CPU 700, a data path L30 that connects the cores and the MACs is provided between all of the cores #0 to #3 and all of the MACs #0 to #3. For example, as illustrated in FIG. 8, the core #1 is provided with data paths L30a to L30d that connect to the respective MACs #0 to #3. Note that the core #1 is further provided with data paths L20a to L20d that connect to the data memories #0 to #3, and the data memories #0 to #3 are provided respectively with data path L10a to L10d that connect to the corresponding MACs #0 to #3.
Among the data paths, particularly, the data path L30b and the data path L30d may be provided across the control unit that is located at the center of the board, which may increase wiring costs. Therefore, it has been difficult to mount a data path L30 that directly connects the cores and the MACs in the CPU 700, including the multiple cores and the multi-bank storage units. The areas A1 and A2, illustrated in FIG. 8, between the cores and the MACs and the data memories and the control unit are areas where wiring is particularly concentrated. Because providing the data path L30 in such areas results in an increase in the circuit size, mounting the data path is difficult.