The present invention relates to a method of and an apparatus for controlling storage in a computer system, and in particular, to a method of and an apparatus for controlling storage in a computer system in which there are disposed a plurality of memory access pipelines to effect parallel processing of elements of a vector so as to solve a problem of contentions among access requests issued from vector processors to storage, thereby processing an access instruction at a high speed.
As a conventional technology related to a storage control method in which a plurality of access request controllers simultaneously issue access requests to a storage including a plurality of storage units (memory banks) each being independently accessible, there has been known, for example, a technology described in the JP-A-62-251956 and the like.
The storage control method according to the conventional technology of this type will next be described with reference to FIGS. 1 to 5.
FIG. 1 shows a configuration example of a primary section of a computer system achieving parallel pipeline processing on elements of a vector in which the computer system includes a plurality of (for example, four in this example) arithmetic units 50A to 50D, vector registers 51A to 51D functioning as data buffers between the arithmetic device 50 and storage 55, access request control devices 52A to 52D, a storage control device 53, and a storage 55. The storage device 55 comprises a plurality of (for example, four in this example) memory banks 55A to 55D which are independently accessible based on a signal attained by decoding address information associated with an access request. The storage control device 53 includes access request stack circuits 53A to 53D corresponding to the access request control devices 52A to 52D, read data buffer circuits 56A to 56D, and access request priority decision circuits 54A to 54D corresponding to the memory banks 55A to 55D, respectively.
Referring to the computer system of FIG. 1, description will be given of an example of operations in which data is read from the storage, an arithmetic operation is conducted thereon, and data is written in the storage.
First, in a case where vector data is read from the storage 55 so as to be loaded in the vector register 51, the respective elements of the vector are assigned to the access request control devices 52A to 52D as follows to create an access request.
Access request control device
52A: Elements 0, 4, 8, . . . , 4n PA0 52B: Elements 1, 5, 9, . . . , 4n+1 PA0 52C: Elements 2, 6, 10, . . . , 4n+2 PA0 52D: Elements 3, 7, 11, . . . , 4n+3 PA0 51A: Elements 0, 4, 8, . . . , 4n PA0 52B: Elements 1, 5, 9, . . . , 4n+1 PA0 52C: Elements 2, 6, 10, . . . , 4n+2 PA0 52D: Elements 3, 7, 11, . . . , 4n+3 PA0 50A: Elements 0, 4, 8, . . . , 4n PA0 50B: Elements 1, 5, 9, . . . , 4n+1 PA0 50C: Elements 2, 6, 10, . . . , 4n+2 PA0 50D: Elements 3, 7, 11, . . . , 4n+3
(where, n is 0 or a positive integer).
For access requests simultaneously created, the respective four elements are simultaneously sent to the corresponding access request stack circuits 53A to 53D. Each of the circuits 53A to 53D then issues, based on the address of the access request, a request to any one of the objective priority decision circuits 54A to 54D. In a case where there occurs a contention among a plurality of access requests, each of the priority decision circuits 54A to 54D selects an access request according to a predetermined priority and then transmits the access request to a corresponding one of the memory banks 55A to 55D. Read data obtained in association with the access request issued to each memory bank is returned to the storage control device 53 after a fixed period of time (equivalent to an access time of a random access memory (RAM) constituting the storage) so as to be sent to one of the read data buffer circuits 56A to 56D associated with the access request control devices 52A to 52D. The read data items are returned, when all data items associated with the four access requests simultaneously issued from the access request control devices 52A to 52D are read out, to the respective access control devices 52A to 52D in an order of issuance of the requests and are then loaded in the vector registers 51A to 51D at the same time. Assignment of the elements to the vector registers is as follows.
Vector register
Next, in a case where data stored in the vector registers 51A to 51D undergo an arithmetic operation, the respective elements of the vector are assigned as follows and the results of the arithmetic operation are stored again in the vector registers 51A to 51D.
Arithmetic unit
In this arithmetic operation, four arithmetic units 50A to 50D effect operations completely in a synchronized fashion such that, for example, the results of the elements 0, 1, 2, and 3 are simultaneously attained and are then loaded in the vector registers 51A to 51D at the same time.
Finally, in a case where data stored in the vector registers 51A to 51D are to be written in the storage 55, the respective elements are assigned to the access request control devices 52A to 52D like in the case of the data read operation above such that four elements, for example, the elements 0, 1, 2, and 3 are sent to the corresponding access request stack circuits 53A to 53D. The subsequent processing up to an access request issuance to the storage 55 is similar to that of the read operation above.
As described above, the respective four arithmetic units 50A to 50D, the four vector registers 51A to 51D, and the four access request control devices 52A to 52D synchronously effect processing on the respective elements. In consequence, in a parallel element processing method associated with the synchronized operation, it is possible to employ a logic configuration in which a control system controls the four arithmetic units 50A to 50D, the vector registers 51A to 51D, and the access request control devices 52A to 52D.
However, in the storage control device 53, there may occur a case where due to a state (for example, a busy state caused by a preceding access request) or a contention with another access, the four access requests respectively issued at the same time from the access request control devices 52A to 52D in the synchronized fashion are not simultaneously processed and hence there appears a shift in time between transmissions of the access requests to the memory banks 55A to 55D. In consequence, there has been adopted a control method in which in the read data buffers 56A to 56D of the storage control device 53, the subsequent operations are set to the wait state until all read data items corresponding to the access requests simultaneously sent from the access request control devices 52A to 52D are completely stored therein such that at a point of time when all the read data items are loaded therein, the four read data items are simultaneously sent to the access request control devices 52A to 52D.
When a program of FIG. 2 is executed by a vector processor constituted with components like those shown in FIG. 1 such as vector arithmetic units and vector registers, the result of the operation is obtained in general as follows. Assume that B(i), C(i), and A(i) are vectors stored in a storage device.
1. Operand data B(i) arranged in the storage are sequentially loaded in a vector register (X) (vector load instruction). PA1 2. In a similar fashion to that of the operation 1 above, operand data C(i) arranged in the storage are sequentially loaded in a vector register (Y) (vector load instruction). PA1 3. Data respectively of the vector registers (X) and (Y) are sequentially read out so as to undergo an arithmetic operation by use of a vector arithmetic unit, thereby sequentially storing a result of an addition thereof in a vector register (Z) (vector add instruction). PA1 4. The contents of the vector register (Z) containing the result of the addition are sequentially read out so as to be written in the storage (vector store instruction).
Through the operations (instructions) above, there can be attained a result of the operation of B(i)+C(i).
FIG. 3A shows a timing chart of processing of the operation effected in a computer having only one transfer pipeline from the storage to the vector registers and only one transfer pipeline from the vector registers to the storage. That is, since there is provided only one transfer pipeline for the transfer operation associated with the load operation, the data of C(i) cannot be loaded until the load operation of B(i) is finished. Although the vector processor can start the arithmetic operation beginning from an element for which an operand is loaded (in an ascending order of the element numbers 0, 1, 2, etc.) by use of a chaining mechanism of the load, add, and store operations, the chaining mechanism does not start an operation thereof unless the load operation of C(i) is initiated. In order to fully develop a high-speed operation, for example, the operation of the chaining mechanism of the arithmetic unit, it is necessary to provide at least two transfer pipelines from the storage to the vector registers. FIG. 3B shows a timing chart of processing of the operation when two transfer pipelines are provided. With the provisions of two transfer pipelines thus provided, the program of FIG. 2 can be executed in a processing time which is about half that required when only one pipeline is used as described above.
However, in a computer system employing a so-called parallel element pipeline processing method in which access requests are simultaneously issued, if the number of transfer pipelines is increased to two, there appears a contention between memory accesses achieved through the respective transfer pipelines and hence the efficiency of the system may be lowered in some cases.
On the other hand, in a computer system having a plurality of vector processors of the conventional technology, in both cases where a plurality of jobs are assigned to the plural vector processors for the processing thereof and where a job is subdivided into partitions to be assigned to the plural vector processors for the processing thereof, it is possible to reduce the processing time if all the vector processors can be used for the processing. In the conventional technology having a memory access pipeline to be processed in a so-called parallel element pipeline processing method, the processing is achieved by establishing a synchronization among access requests issued at the same time. When memory access pipelines are disposed, like in a case of a multiprocessor, between a plurality of (for example, two) vector processors and a storage, there arises a problem that the processing efficiency is decreased due to a contention between the access requests on the respective memory access pipelines. This problem will be concretely described with reference to FIGS. 4 and 5.
FIG. 4 shows a case of a computer system including one or two vector processors. When a single vector processor is disposed, there are provided two transfer pipelines to storage; whereas when two vector processors are used, there are arranged two transfer pipelines respectively from the vector processors to the storage. Each transfer pipeline possesses in either arrangement four access request control devices so as to effect the processing while establishing a synchronization among four access requests. Assume here that, as shown in FIG. 5, a transfer pipeline A accesses a consecutive address region in the storage beginning from a bank number 00 and a transfer pipeline B effects an access, with a delay of one machine cycle with respect to the access of the transfer pipeline A, to the consecutive address region in the storage beginning from a bank number 06. First, access request control devices S00, S01, S02, and S03 are assigned at time T0 with the bank numbers 00, 01, 02, and 03, respectively. In this situation, since the transfer pipeline B is initiated with a delay of one machine cycle, there does not exist any access request contending with an access request from the transfer pipeline A, and hence the access requests issued from the access request control devices S00 to S03 are transmitted to the storage device. At time T2, the access request control devices S00 to S03 of the transfer pipeline A are assigned with the bank numbers 04, 05, 06, and 07, respectively; whereas the access request control devices S10 to S13 of the transfer pipeline B are assigned with the bank numbers 06, 07, 08, and 09, respectively. Access requests issued from S00 and S01 are sent to the storage since there does not occur any contention in this case. The access requests from S02 and S03 respectively access the bank numbers to be accessed by the access requests respectively issued from S10 and S11 (occurrence of contentions). Consequently, the access requests are selected according to the predetermined priority so as to be sent to the storage. In this situation, assuming that the priority is assigned as S00&gt;S01&gt;S02&gt;S03&gt;S10&gt;S11&gt;S12&gt;S13, the access requests issued from S02 and S03 are to be selected. Furthermore, the access requests issued from S12 and S13 are sent to the storage since there does not exist any contention. The access requests transmitted to the storage at T4 finally include those issued from the access request control devices S00, S01, S02, S03, S12, and S13. Next, at time T4, the access request control devices S00 to S03 are assigned with the bank numbers 08, 09, 0A, and 0B, respectively; whereas the access request control devices S10 to S13 are assigned with the bank numbers 0A, 0B, 0C, and 0D, respectively. Access requests respectively issued from S00 and S01 are associated with the bank numbers of the access requests sent from S12 and S13 in the preceding machine cycle (at T2), in consequence, the access operation is set to a wait state for a period of the bank busy state. (The access request transmission is prevented for a cycle time of the RAM constituting the storage.) The access requests from S10 and S12 contend with the access requests from S02 and S03 like in the case of the previous machine cycle such that the access requests from S02 and S03 are selected according to the priority. As a result, the access requests transmitted to the storage at time T6 include those from S02, S03, S12, and S13. Similarly, the access requests sent to the storage at points of time T8, T10, T12, etc. are the same as those issued at time T6. That is, as shown in FIG. 5, the access requests sent to the storage include those from the access request control devices S02, S03, S12, and S13, whereas the access requests from the access request control devices S00, S01, S10, and S11 are set to the wait state for a period of the bank busy state. As described above, for a transfer pipeline processed in the parallel element pipeline processing method, since the processing is achieved by establishing a synchronization between the access requests issued at the same time, there is required a wait time associated with the access requests simultaneously issued and hence there occurs a considerable decrease in the efficiency.
In order to prevent the deterioration of the efficiency, it may be possible to process a preceding transfer pipeline with a higher priority such that the processing of the succeeding transfer pipeline is interrupted (is set to a wait state) for a period of time of the bank busy state so as to delay the time when the processing is started, thereby reducing the processing time. In consequence, it is desirable to delay the processing start time of the transfer pipeline to be processed in the parallel element pipeline processing method by period of time of the bank busy state, However, the storage control method of the prior art technology described above is established with a premise that the processing is conducted by use of one transfer pipeline, namely, considerations have not given to a storage control method in which a plurality of transfer pipelines are employed. In consequence, in a case where a plurality of transfer pipelines are processed in the parallel element pepeline processing method, there arises a problem as described above that the efficiency is greatly lowered in a case of a memory access in which the addresses are consecutive in a memory, the cases occupying at least 50% of the overall memory accesses.