(1) Field of the Invention
The present invention relates to a memory accessing system for a vector processing apparatus which carries out a vector calculation and to a vector processing apparatus incorporating the memory accessing system.
In scientific or technological calculations, loop calculations are frequently carried out. The loop calculations can be transformed into vector calculations, e.g., A(i)+B(i)=C(i), (i=0.about.n-1).
To increase the speed of vector calculations, computer systems carrying out high speed calculations, such as supercomputers, are equipped with a vector processing apparatus for carrying out vector calculations, in addition to a scalar data processing apparatus.
In a vector processing apparatus, a plurality of vector processing units are provided, and each vector processing unit carries out a share of a vector calculation concurrently with the other vector processing units. For example, a vector calculation A(i)+B(i)=C(i), (i=0.about.n-1) is shared by first and second vector processing units in such a manner that the first vector processing unit takes a half share of the vector calculation for half of the elements, A(4m)+B(4m)=C(4m), and A(4m+1)+B(4m+1)=C(4m+1), and the second vector processing unit takes the other half of the vector calculation for the other half of the elements, A(4m+2)+B(4m+2)=C(4m+2), and A(4m+3)+B(4m+3)=C(4m+3), where 0.ltoreq.4m, and 4m+3.ltoreq.n-1, and n is the total number of elements, and the first and second vector processing units concurrently carry out respective shares of the vector calculation, for further increasing calculation speed.
In the above vector processing apparatus having a plurality of vector processing units, each of the plurality of vector processing units independently request access to the main storage, and also each of the scalar unit (CPU) and an input and output processor (IOP) independently requests access to the main storage. Therefore, arbitration regarding access to the main storage between the plurality of vector processing units, the scalar unit (CPU), and the input and output processor (IOP), must be carried out to avoid a conflict. In addition, the order of the elements of the vector data must be maintained in the result of the arbitration between accesses from the plurality of vector processing units.
(2) Description of the Related Art
FIG. 1 shows an outline of the construction of an example of a conventional vector processing apparatus having a plurality of vector processing units.
In FIG. 1, reference numeral 1 denotes a vector processing controller, 2 and 3 each denote a vector processing unit, 4 denotes a main storage, 5 and 9 each denote an address generator, 6 and 10 each denote a priority controller, 8 and 12 each denote a data buffer, 7 and 11 each denote a vector register, and 13 denotes a route for a control signal.
In this example, the vector processing unit 2 carries out calculations of the (4m)-th and (4m+1)-th elements, and the vector processing unit 3 carries out calculations of the (4m+2)-th and (4m+3)-th elements, where 0.ltoreq.4m and 4m+3.ltoreq.n-1, and n is the total number of elements.
When the vector processing controller 1 receives a vector instruction from a scalar unit (CPU) (not shown), the vector processing controller 1 supplies control data including a start signal, an operation code, which defines the action to be performed, e.g., writing or reading, a leading address (LA, an address of a top element of vector data), a distance (D, a distance between addresses of successive elements), and a vector length (the number of elements in a vector data), etc., to the address generators 5 and 9.
The address generator 5 in the vector processing unit 2 generates addresses for accessing the main storage 4 for the calculations of the (4m)-th and (4m+1)-th elements, and the address generator 9 in the vector processing unit 3 generates addresses for accessing the main storage 4 for the calculations of the (4m+2)-th and (4m+3)-th elements, based on the above control data.
The priority controller 6 in the vector processing unit 2 carries out an arbitration between requests for accesses to the main storage 4 for the (4m)-th and (4m+1)-th elements, requests from the scalar unit (CPU), and requests from the input and output processor (IOP), and gives allowance to one of the above requests (for the (4m)-th and (4m+1)-th elements) from the vector processing unit 2, based on the addresses generated in the address generator 5, and the order of the elements. The highest priority is given to the request from the input and output processor (IOP), the next highest priority is given to the request from the scalar unit (CPU), and the allowance is given in the order of the elements, i.e., the (4m)-th element has priority over the (4m+1)-th element.
Similarly, the priority controller 10 in the vector processing unit 3 carries out an arbitration between requests for accesses to the main storage 4 for the (4m+2)-th and (4m+3)-th elements, requests from the scalar unit (CPU), and requests from the input and output processor (IOP), and gives allowance to one of the above requests (for the (4m+2)-th and (4m+3)-th elements) from the vector processing unit 3, based on the addresses generated in the address generator 9, and the order of the elements. Similarly, the highest priority is given to the request from the input and output processor (IOP), the next highest priority is given to the request from the scalar unit (CPU), and the allowance is given in the order of the elements, i.e., the (4m+2)-th element takes precedence over the (4m+3)-th element.
Further, to maintain the order of the elements of the vector data across both the vector processing units 2 and 3, before allowing the access for an element (i), one of the vector processing units 2 and 3 must know whether or not the access for the element which must be allowed before the element (i) (i.e., an element designated by a smaller number than (i)) in the other vector processing unit, has been already allowed. For example, before allowing the access for the element 4m, the vector processing unit 2 must know whether or not the access to the main storage 4 for the element 4m-1 has already been allowed in the vector processing unit 3.
To assure the above order, in the construction of FIG. 1, the status of allowance in each vector processing unit is transmitted to the other vector processing unit through the route 13. However, in practice, each of the vector processing units 2 and 3 is formed on an individual circuit board, and therefore, the propagation delay time through the above route 13 is large compared with the clock cycle of high speed computers when reporting the status of allowance across the different circuit boards.
In addition, in the construction of FIG. 1, to avoid competition between requests from both the vector processing units 2 and 3, predetermined alternative accessible timings are respectively assigned for the (4m)-th and (4m+1)-th elements in the vector processing unit 2, and the (4m+2)-th and (4m+3)-th elements in the vector processing unit 3.
Usually, a main storage is comprised of a plurality of portions (banks or memory units), where each portion can be concurrently accessed, for example, the main storage 4 in the construction of FIG. 1 is comprised of four memory units, SU-0, SU-1, SU-2 and SU-3. For accessing these memory units, SU-0, SU-1, SU-2 and SU-3 from the vector processing units 2 and 3 without competition, the alternative timings as shown in FIG. 2, are assigned.
As mentioned above, in the conventional vector processing apparatus, each vector processing unit must wait to receive the status of allowance of access to the main storage from the other vector processing unit(s), before carrying out the arbitration. In addition, since, in the conventional vector processing apparatus, predetermined (fixed) alternative timings are assigned for a plurality of vector processing units as mentioned above, each vector processing unit cannot access the main storage at a time other than the above predetermined (fixed) time, even when objective portion of the main storage is not busy at that time. Namely, in the conventional vector processing apparatus, the accesses to the main storage from a plurality of vector processing units are not effectively carried out, and therefore, the total access time to the main storage for performing and completing a vector calculation is long.