1. Field of Invention
The invention relates to an inter-cluster communication module using a memory access network and, in particular, to the inter-cluster communication module using a memory access network for executing data exchange among clusters and data access to the memory.
2. Related Art
Nowadays, the design of a processor usually provides several parallel functional modules therein, so that the processor may process several instructions in parallel. As the number of functional modules increases, it eventually becomes a serious problem in the data exchange and stream among the functional modules.
Initially, the processor is provided a centralized single register file RF to perform the data exchange among functional modules FU, as shown in FIG. 1. Under this architecture, a large amount of registers have to be provided inside the register file RF to maintain the smooth running of programs and the module rate of the functional modules FU. Moreover, the number of the connection ports (i.e. read and write ports) in the register file RF increases linearly with the number of the functional modules FU, to ensure that each functional module FU can obtain data required for operations. In fact, the data have their spatial locality during computations. That is, most of the functional modules exchange data only with their adjacent functional modules. However, it is actually a waste to provide a huge amount of connection ports simply for data exchange that rarely occurs. Therefore, the clustered architecture is developed to solve the problem of poor extensibility of the centralized register file.
In clustered architecture, the functional modules FU, each having one or more functional units, are divided into several clusters 110, 111˜11N. Consequently, the originally centralized register file is also divided into smaller register files RF0, RF1˜RFn, each of which is responsible for the data exchange among the functional modules FU in each of the clusters 110, 111˜11N. The data exchange among the clusters 110, 111˜11N is implemented via some special switch device, i.e. the inter-cluster communication (ICC) network 120. This is illustrated in FIG. 2.
The data exchange among the clusters may be achieved by executing a copy operation. That is, the copy instruction is executed to implement data transmission among the clusters. In this case, the copy instruction is executed using the original functional modules. In other words, the original functional modules FU are added with additional connection ports (i.e. load and store ports) to connect to the register files RF0, RF1 in another cluster 110, 111, as shown in FIG. 3. However, the data exchange among the clusters has to occupy one or several functional modules, such that the operation is not effectively executed. In another case, a specialized functional module cFU with complete connection ports (i.e. the input and output ports) is provided to implement the data exchange, as shown in FIG. 4. Moreover, these ways require the addition of a switch communication network and a controller to perform the switching for data exchange when the number of clusters is large.
Besides, the data exchange among the clusters may be achieved using the way of extended access. Mainly, each of the register files of each cluster is added with an extra read port ER (FIG. 5) or an extra write port EW (FIG. 6), and these extra ports are connected to the functional modules of other clusters. Thus, the functional modules in each cluster have limited abilities in read or write the register files of other clusters. Nonetheless, this way still needs the addition of extra control units so as to detect whether there is any data exchange in the front stage of the pipeline. More a switch device (i.e. a communication network) is required carry out for data exchange.
In the above-mentioned ways, the data exchange is achieved by utilizing additional connection ports. As the number of clusters increases, the complexity of the communication network also increases. Moreover, in pipelined processors using the above methods, the forwarding network, which bypasses variables between the parallel instructions on the fly, has to cross the boundaries of all clusters, and the complexity of the switching network among clusters greatly increases. Otherwise, the functional modules of other clusters have to wait until the functional modules completely write the data into the register files before the operations start. This results in stalls. Moreover, the complicated communication network among the clusters is likely to become a critical path. The speed of the operation may be promoted by increasing the pipeline stages, but the more stages the pipeline has, the more difficult the forwarding thereof becomes.