An information processing apparatus including a processor core such as a CPU (Central Processing Unit) is usually provided with a cache memory to realize higher-speed processing. A cache memory is a memory that is accessible at a higher speed than a main storing unit such as a main memory, and the cache memory stores only data that the CPU frequently uses of the data stored in the main storing unit. Therefore, when the CPU executes various computing processes, the CPU first accesses the cache memory and requests the cache memory for the necessary data. In this case, when the necessary data is not stored in the cache memory, a cache miss occurs and the necessary data is transferred from the main storing unit to the cache memory. That is, when a READ is executed to the cache memory and the cache miss occurs as a result, the data is transferred from the main storing unit to the cache memory by a MOVE-IN.
As above, when a cache miss occurs, the necessary data is stored in the cache memory by the MOVE-IN and, therefore, the CPU again executes a READ and, thereby, reads the data from the cache memory. Therefore, the two READs and the one MOVE-IN are executed by the time the CPU obtains the data, and the delay time (hereinafter, “latency”) is extended. To improve the performance of the information processing apparatus by reducing the delay incurred in data acquisition, it is possible to transfer the data from the main storing unit to the cache memory and to simultaneously transfer the data also to the CPU (see, e.g., Japanese Laid-open Patent Publication No. 10-111798).
FIG. 7 is a diagram of the configuration of an information processing apparatus described in Japanese Laid-open Patent Publication No. 10-111798. The information processing apparatus depicted in FIG. 7 includes a main storing unit 1, an instruction control unit 2, and a storage control unit 3. When the instruction control unit 2 requests data, the storage control unit 3 executes a READ for the data. That is, a data request from the instruction control unit 2 is transferred to a cache 4 through a selector 7 in the storage control unit 3. When the requested data is stored in the cache 4 (cache hit), the data is read to a buffer 8 in the instruction control unit 2 through a selector 5. In this case, as illustrated in the upper portion of FIG. 8, the instruction control unit 2 is able to obtain the data from the storage control unit 3 and only one READ (In FIG. 8, “RD”) is executed by the time the data is obtained. Therefore, there is almost no latency in data acquisition by the instruction control unit 2.
On the other hand, when the requested data is not stored in the cache 4 (cache miss), a MOVE-IN that causes the data stored in the main storing unit 1 to be transferred to the cache 4 is executed. That is, the data request from the instruction control unit 2 is transferred to the main storing unit 1 through the selector 7 and the requested data is transferred to the cache 4 through a selector 6. In the normal case, thereafter: the instruction control unit 2 again requests the data; the storage control unit 3 executes a READ; and the requested data is read from the cache 4 to the buffer 8 through the selector 5. In this case, as illustrated in the middle portion of FIG. 8, the two READs (RD) and the one MOVE-IN (in FIG. 8, “MI”) are executed by the time the instruction control unit 2 obtains the data. Therefore, the latency in data acquisition by the instruction control unit 2 becomes long.
However, in Japanese Laid-open Patent Publication No. 10-111798, a line L is provided that directly connects the main storing unit 1 and the buffer 8 of the instruction control unit 2 through the selector 5 and, therefore, the data is transferred from the main storing unit 1 to the cache 4 through the selector 6 and simultaneously the data is read to the buffer 8 through the line L. Therefore, as illustrated in the lower portion of FIG. 8, the instruction control unit 2 is able to obtain the data simultaneously with the MOVE-IN (MI) to the cache 4. Therefore, the latency can be reduced.
Recently, for a single-core semiconductor integrated circuit including one processor core (hereinafter, “core”), problems such as increase of power consumption are not ignorable and performance improvement is approaching its limit. Further performance improvement of a semiconductor integrated circuit may be realized by a multi-core configuration that includes a plurality of cores on one substrate. When the cache memory and the main storing unit are divided into a plurality of banks in the semiconductor integrated circuit having the multi-core configuration, throughput may be improved among each of the cores, the cache memory, and the main storing unit.
In a semiconductor integrated circuit employing the multi-core configuration divided into the banks: a plurality of cores, a plurality of cache memories, and a plurality of main storage control units each connected to a main storing unit are disposed on the outer edge of a substrate; and a control unit that controls the entire data transfer is disposed in the center of the substrate. Each divided bank of the main storing unit stores data having an address different from each other and, therefore, each core may request data to all the main storage control units on the substrate. Therefore, as in the above Japanese Laid-open Patent Publication No. 10-111798, in order to directly connect the main storing unit and the cores, all the cores and all the main storage control units need to be mutually connected and a problem is arisen that the wiring on the substrate becomes complicated.
That is, a core disposed on the opposite side of each main storage control unit sandwiching the control unit disposed in the center of the substrate may request data to the main storage control unit. Therefore, to reduce the latency in data acquisition by the core, a main storage control unit and a core that are disposed away from each other on the substrate also need to be directly connected. As a result, the wiring on the substrate needs to be significantly changed and expanded and this results in a larger size of the semiconductor integrated circuit. Recently, an apparatus loaded with a semiconductor integrated circuit is increasingly downsized and, therefore, the increase in the size of the semiconductor integrated circuit is not practical as means for reducing the latency in data acquisition by the cores.