1. Technical Field
The present invention relates to an information processing device, a memory access control device, and an address generation method thereof, and particularly to an information processing device that accesses to a storage unit with an access unit of a plurality of word lengths, a memory access control device, and an access generation method thereof.
2. Background Art
In recent years, many information processing devices (for example, processors) that realize improvement of computing power by performing parallel processing to data have been suggested. One of such information processing devices is a vector operation device. An example of a memory access method of a related art in this vector operation device is disclosed in Patent Application Publication No. H06-103491 and Japanese Patent No. 3789316.
Patent Application Publication No. H06-103491 discloses that when a word length of an operation unit and a word length of a main memory differ, throughput is reduced. Therefore, in Patent Application Publication No. H06-103491, by accessing to the main memory with a continuous plurality of words collectively, the performance is improved. However, when the word length of the operation unit and the word length of the main memory are different, following performance degradation is caused in the method of a related art that assigns continuous words of a plurality of continuous operation units to the words of the main memory.
Here, a technology of a related art is explained with Japanese Patent No. 3789316 as an example. Japanese Patent No. 3789316 is related to a routing address generation method of a vector processing device. Further, the vector operation device includes a vector operation unit, a storage unit, and a memory access control unit. The vector operation unit outputs a vector element, a top element address, and a distance between elements and makes an access request. The storage unit is composed of a plurality of memory banks that are capable of performing a simultaneous parallel process, and includes a plurality of connection ports. The memory access control unit performs access control independently for each connection port that connects the plurality of access requests to the storage unit between the vector operation unit and the storage unit.
Further, the memory access control unit includes an adder unit, an exclusive or circuit, a routing address generation unit, and a crossbar unit. The adder unit generates an access address of the access request for each vector element by addition of the top element address and the distance between elements that are transmitted from the vector operation unit. The exclusive or circuit obtains an exclusive or of a low-order one bit of the routing address, which is a part of the access address, and bits other than the routing address of the access address for each vector element. The routing address generation unit replaces an output of the exclusive or circuit with the low-order one bit of the routing address to generate a new routing address. A conflict arbitration unit performs conflict arbitration of the access requests for each connection port that connects to the storage unit according to the routing address generated by the routing address generation unit. The crossbar unit outputs the access request for each vector element according to the conflict arbitration of the conflict arbitration unit.
In the vector operation device disclosed in Japanese Patent No. 3789316, the above configuration can prevent conflict of the connection ports assigned to the access address generated for one access request, and improve access performance.
A DDR (Double Data Rate) DRAM (Dynamic Random Access Memory) is becoming the mainstream in recent years because of the improvement in the processing speed of the information processing device. In this DRAM, a burst access for continuously accessing to continuous addresses is performed, and an access unit is defined by bus width×the number of bursts. For example, the number of bursts is two in DDR, the number of bursts is four in DDR2, and the number of bursts is eight in DDR3. As DDR3 DIMM, which is becoming the mainstream at the moment, includes a bus width of 64 bits (8 bytes: hereinafter referred to as 8 B), when the number of bursts is eight, the access unit will be 64 B as eight pieces of 8 B data are continuously transferred. Specifically, the access unit increases by using a high-speed DDR DRAM. In this way, as the access unit of the main memory increases and is different from an access length of the operation unit, the performance for accessing the main memory is degraded.
An example of the access address issued by the vector operation unit to a general storage unit is shown in FIGS. 8 to 10. FIG. 8 shows a data structure of the access address in the case of specifying an address by an address of a block and an address in the block. FIG. 9 shows a data structure of the access address in the case of accessing the storage unit by a memory interleave. FIG. 10 shows a data structure of the access address in the case of accessing the storage unit with the direct mapping method.
Further, a cache may be provided in order to reduce the memory access time to the storage unit. In the information processing device, the access request to the storage unit is made by the access unit. Therefore, the access efficiency to the storage unit improves by registering all the data accessed in one access unit to the cache. Thus, the unit to manage the data on the cache (cache line width) is an integral multiple of this access unit. Note that the cache line width increases with the increase of the access unit.
In the memory access method of a related art, at least one of the access unit and the cache line width is treated as one block. At this time, when the cache is divided by the access unit, interleaved by the access unit, and data transfer is performed, there has been a problem of increasing the data width by one access unit and also the memory access time. Moreover, when the cache is divided by the cache line width and interleaved by each cache line width, and data transfer is performed, there has been a problem of increasing one cache line width and also the memory access time.
The abovementioned issue is explained more specifically. An example of the data structure of the access address used in the access to the main memory is shown in FIGS. 11 and 12. In this example, as shown in FIG. 11, the access address is composed of 24 bits. Then, an access line address in a port is defined to high-order 15 bits of the access address. Further, the routing address is defined to three bits (bits a9, a8, and a7) following the address in the port among the access address. Furthermore, as shown in FIG. 12, the address in the port is generated using the access line address in the port and the address in the access unit (bits a6 to a1) among the access addresses.
Additionally, FIG. 13 shows a timing chart of the access request output by the vector operation unit when the vector operation unit includes operators 0 to 7 and the operators 0 to 7 access to continuous words in one process cycle. In the example shown in FIG. 13, in a cycle T0, the vector operation unit outputs vector elements v0, v1, v2, v3, v4, v5, v6, and v7, a top element address 0 B, and a distance between elements 8 B as the access request. By making such access request, it becomes possible to efficiently access to the storage unit. In response to this access request, the memory access control unit of the vector operation device generates access addresses 0 B, 8 B, 16 B, 24 B, 32 B, 40 B, 48 B, and 56 B corresponding to the vector elements v0, v1, v2, v3, v4, v5, v6, and v7. FIG. 13 shows the access address generated by the memory access control unit for each operational timing.
The routing address generation unit of the memory access control unit selects the three bits in the access address as the routing addresses, as shown in FIG. 11. A request is sent to a cache unit 3 using the connection port specified by the routing address. For the access request of the timing T0, the routing addresses 0, 0, 0, 0, 0, 0, 0, and 0 are generated for the vector elements v0, v1, v2, v3, v4, v5, v6, and v7. That is, the crossbar unit processes access addresses 0 B, 8 B, 16 B, 24 B, 32 B, 40 B, 48 B, and 56 B generated in response to the access request of the timing T0 altogether in the connection port 0. Specifically, in the connection port 0, the vector elements v0, v1, v2, v3, v4, v5, v6, and v7 are processed in order.
Then, in the case in which the storage unit includes the connection ports 0 to 7 and the vector operation unit 10 includes the operators 0 to 7, the assignment of the access region of the storage unit accessed in accordance with the timing chart shown in FIG. 13 is shown in FIG. 14. As shown in FIG. 14, continuous words (words of 0th, 8th, 16th, . . . , and 56th byte) are stored to one access line width of the connection port of the storage unit of a related art. Additionally, continuous words are written inside one access line width of other connection ports.
Moreover, when the access request is made in accordance with the timing chart shown in FIG. 13, the routing addresses of 64 B, 72 B, 80 B, 88 B, 96 B, 104 B, 112 B, and 120 B, which are the access addresses of the vector elements v0, v1, v2, v3, v4, v5, v6, and v7 generated in response to the access request in timing T1 will be 1. That is, the routing addresses generated at the timing T1 will be 1, 1, 1, 1, 1, 1, 1, and 1. Accordingly, the access address generated in response to the access request of the timing T1 is processed using the connection port 1.
FIG. 15 shows a timing chart by the storage unit side in the case of accessing to the storage unit from the vector operation unit 10 in accordance with the abovementioned procedure. As shown in FIG. 15, the continuous words are assigned to one access unit in the access method of the related art. Further, in the access method, the continuous words are assigned to one cache line width. Then, in the accessing method of the related art, there is a period generated when the connection ports cannot be efficiently used, which is caused by the continuous words being assigned as above, and consequently generating a problem of increasing the access time.