The present invention generally relates to a vector processor for processing in parallel a plurality of vector elements contained in the same vector data. More particularly, the present invention relates to a vector processor in which a memory skewing scheme is adopted for preventing access performance from degradation when accesses are made successively with addresses each incremented by a predetermined value.
When arrayed data are stored as vector data in a main storage of a vector processor, there is usually adopted a method of storing elements at each row of an array consecutively at successive memory locations. Consequently, the elements belonging to a same row of an array are successively accessed by means of a sequence of consecutive addresses, while the elements belonging to a same column are accessed by means of a sequence of addresses which are equally distanced from one another by a given incremental value (Hereinafter, this will be referred to as the stride).
In this conjunction, it is known that the access speed or rate (i.e., the rate at which the access is executed) will vary in dependence on methods of assigning addresses to the individual memory locations in a plurality of memory modules, which constitute a main storage. By way of example, as one of the conventional methods for the memory address assignment well known in the art, there may be mentioned an interleaving method or scheme. FIG. 8 of the accompanying drawings shows, by way of example, the addresses as assigned by the interleaving method in the case where the number of memory modules constituting a main storage is four. As can be seen in the figure, the four memory modules are sequentially allocated with the identifiers or numbers (hereinafter referred to as the ID numbers) "0", "1", "2" and "3", respectively, in this order, wherein a symbol "MM#" represents generally or collectively the memory module ID numbers. According to the illustrated interleaving scheme, consecutive addresses are assigned to the memory modules which differ sequentially from one to another.
At this juncture, it should be mentioned that the term "address" represents the number assigned or affixed to a memory location in a memory on an access-by-access basis. This definition obtains throughout the specification unless specified otherwise. Further, in the following description which is directed to the vector processors known heretofore, as well as the vector processors shown in conjunction with the exemplary embodiments of the invention, it is assumed that each memory access takes on a length equivalent to the data length of a single element of the vector data. To say in another way, an address is assigned to each vector data element.
According to the interleaving method described above, consecutive addresses are successively assigned to the memory modules which differ sequentially from one to another. Accordingly, in the memory accesses with the consecutive addresses, the mutually different memory modules are sequentially accessed. Thus, the memory access operation can be accomplished at a very high speed. However, in the case of the equi-distant accesses which are spaced from one another by a given value of the stride defined previously, it has been known that the accesses are concentrated to a particular one of the memory modules, incurring degradation in the access performance. FIG. 9 of the accompanying drawings shows relations between the strides and the memory access performance indexes. By way of example, in the case of the scheme illustrated in FIG. 4, the access with a stride of "4" presents a problem. More specifically, when the memory access is to be made with a stride of "4" sequentially, starting from the address "0", the accesses will then be made to the addresses "0", "4", "8", "12" . . . , and so forth sequentially in this order. In this conjunction, it will be noted that these addresses are all assigned to the memory module MM0. Thus, it becomes impossible to realize the access of high speed.
As the measures for coping with or mitigating such degradation in the memory access performance as mentioned above, there is known a so-called memory skewing scheme (or skewed storage scheme, to say in another way) according to which the address assignments to the memory modules are skewed, so to say. The mathematical basis for this memory skewing scheme is elucidated in D. J. Kuck: "ILLIAC IV SOFTWARE AND APPLICATION PROGRAMMING", IEEE Transactions on Computers, Vol. C-17, No. 8, pp. 758-770, (August 1968) or P. Budnik and D. J. Kuck: "THE ORGANIZATION AND USE OF A PARALLEL MEMORIES", IEEE Transactions on Computers, pp. 1566-1569, (December 1971). The memory skewing scheme is never limited definitely to a single method but has many variations and modifications, some of which are disclosed in D. T. Harper, III and J. R. Jump: "PERFORMANCE EVALUATION OF VECTOR ACCESS IN PARALLEL MEMORIES USING A SKEWED STORAGE SCHEME", IEEE Transactions on Computers, C-36 (12), pp. 1440-1449 (December 1987) or "PERFORMANCE EVALUATION OF VECTOR ACCESSES IN PARALLEL MEMORIES USING A SKEWED STORAGE SCHEME", Conf. Proc. of the 13th Annual International Symposium on Computer Architecture, pp. 324-328 (June 1986), IEEE, and U.S. Pat. No. 4,918,600. Description which follows will be directed to a vector processor in which the memory skewing scheme (or skewed storage scheme) is adopted and typical variations of the skewing scheme where the numbers of memory modules are four and eight, respectively.
Vector processors in which the memory skewing scheme is adopted are disclosed in U.S. Pat. Nos. 4,370,732 and 4,918,600. FIG. 10 shows a vector processor disclosed in U.S. Pat. No. 4,918,600.
In the figure, reference numeral 500 denotes a processor which sequentially issues access requests, numerals 510 to 513 denote memory modules, respectively, numerals 520 to 523 denote buffers for holding temporarily the access requests issued from the processor 500, and reference numerals 530 to 533 denote buffers for holding temporarily the data read out or retrieved from the memory modules 510 to 513, respectively. The memory module to which a given access request is to be sent is determined by an address mapping circuit 540 which serves to select the memory module of concern in accordance with the address information contained in that given access request. The processor 500 is capable of issue one access request in one cycle. The access request contains the address information on the basis of which the access request to the memory module to be accessed is issued. In the case of the prior art vector processor now under consideration, the access to one memory module requires four cycles. However, since four memory modules are sequentially accessed, the access request can be processed in every cycle. When a plurality of access requests are successively issued to one and the same memory module, those access requests which succeed to the preceding one are temporarily held by the buffer 520, 521, 522 or 523 during the period in which the memory module mentioned above is being accessed by the preceding access request. Thus, the succeeding access requests can be issued until the buffer 520, 521, 522 or 523 becomes full. On the other hand, buffers 530 to 533 serve to hold temporarily the data read out from the memory modules for sending them back to the processor 500 in the order in which the relevant access requests were issued from the processor 500.
FIGS. 11 and 12 of the accompanying drawings illustrate address assignments to memory modules in accordance with a first skewing scheme illustrated in FIGS. 4 and 5 of U.S. Pat. No. 4,918,600.
According to the illustrated memory skewing schemes, the memory modules are shifted or changed over from one to another one by one every time a number of addresses corresponding to that of the memory modules (four and eight in the case of the examples now shown in FIGS. 11 and 12) have been assigned. In this case, relation among the memory module ID number MM#, the address ADR and the number N of memory modules is given by the expression (1) mentioned below: EQU MM#=(ADR+ADR.div.N) mod N (1)
where "mod N" represents modulo-N operation. Assuming, for example, that N=4, the address assignment is performed in such a way that the address "0" is assigned to the memory module 0, the address "4" is assigned to the memory module 1, the address 8 is assigned to the memory module 2, and so forth.
FIGS. 13 and 14 of the accompanying drawings show address assignment to memory modules according to a second skewing scheme shown in FIG. 6 of U.S. Pat. No. 4,918,600.
According to this skewing scheme, the memory modules to be assigned with the addresses are shifted one by one every time the number of addresses which corresponds to a multiple (e.g. 8) of the number of memory modules (i.e., 4) have been assigned. In this case, relation among the memory module ID number MM#, the address ADR and the number N of the modules is given by the following expression (2): EQU MM#=(ADR+ADR.div.(N.times.2)) mod N (2)
Thus, according to this scheme, the address assignment is made in such a manner that the address "0" is assigned to the memory module 0, the address 8 is assigned to the memory module 1, the address 16 is assigned to the memory module 2, and so forth.
FIG. 15 shows a relation between the inter-address distance or the access stride and the performance in the case where the first skewing scheme defined by the expression (1) or shown in FIG. 11 is adopted, while FIG. 16 shows the relation between the stride and the access performance in the case where the second skewing scheme defined by the expression (2) or shown in FIGS. 13 and 14 is adopted, both on the assumption that the number of memory modules is four in the system configuration shown in FIG. 10. Further, FIG. 17 shows relation between the stride and the performance in the case where the first skewing scheme defined by the expression (1) or shown in FIG. 12 is adopted on the assumption that the number of the memory modules is eight in the system shown in FIG. 10. In this conjunction, it should be noted that the access performance of concern is determined after lapse of sufficient time from a time point the processor 500 started to issue the access requests and in the state where the number of access requests processed by the memory during one cycle has become steady. Further, it is to be added that the performance capable of processing one element in one cycle is represented by "1" (unity).
As can be seen from the comparison of FIG. 9 with FIGS. 15 and 16, the variety of the strides which give rise to degradation in the performance can be decreased by adopting the first skewing scheme given by the expression (1) or illustrated in FIG. 11. This effect becomes more significant when the second skewing scheme defined by the expression (2) or illustrated in FIGS. 13 and 14 is adopted.
It must be pointed out that the vector processor disclosed in U.S. Pat. No. 4,918,600 is designed to issue sequentially the access requests at a rate of one access request in one cycle. However, there already exists such a vector processor which is so designed as to be capable of processing simultaneously a plurality of elements belonging to a same vector data in response to a single instruction in order to enhance the processing capability of the vector processor. For convenience of description, the simultaneous processing of plural elements as mentioned above is referred to as the element parallel processing, while the number of elements susceptible to the simultaneous processing will be referred to as the element parallelism factor. Further, the vector processing other than the element parallel processing will hereinafter be referred to as the sequential processing. Now, description will turn to a hitherto known vector processor designed for executing the element parallel processing. FIG. 18 shows a vector processor of which element parallelism factor is four and which is disclosed in JP-A-63-66661.
In the figure, a reference numeral 14 denotes a vector register unit which includes four vector data controllers 14-0, 14-1, 14-2 and 14-3 and vector registers (not shown).
Further, reference numeral 15 denotes a requester module which issues access requests to a main storage 13. As can be seen, the requester module 15 is comprised of four access request control units 1 to 4 which are connected to vector data controllers 14-0 to 14-3, respectively.
In FIG. 18, reference numerals 5 to 8 denote, respectively, access request buffers for holding temporarily the access requests issued by the access request control units.
Finally, reference numerals 9 to 12 denote access request priority determining units for determining the priority with which the access requests in conflict are to be processed.
The main storage 13 includes memory modules MM0, MM1, MM2 and MM3 which are affixed with ID numbers "0", "1", "2" and "3", respectively. The addresses in the memory modules MM0, MM1, MM2 and MM3 are so assigned that a single continuous memory space can be implemented for all of the four memory modules.
Next, description will turn to a flow along which the access requests are processed.
At first, access request control units 1, 2, 3 and 4 issue access requests in parallel to the associated access request buffer units 5, 6, 7 and 8 is unoccupied or idle, respectively. However, in case none of the access request buffers 5-2 of the access request buffer units 5, 6, 7 and 8, the access request control units 1, 2, 3 and 4 issue no buffer requests.
In the access request buffer unit 5, the address signal accompanying the access request is decoded by an address decoder unit 5-1, whereby the ID number of the memory module to be accessed is determined. The access request priority determining unit 9, 10, 11 or 12 corresponding to the memory module as determined is then selected and the corresponding information is sent to an access request send-out control unit 5-3 incorporated in the access request buffer unit 5 with the access request being transferred to the access request buffer 5-2.
When the access requests are present in the access request buffer 5-2, an access request send-out control unit 5-3 selects the access request, beginning with the oldest one, and sends the access request thus selected to the access request priority determining units 9, 10, 11 or 12 which corresponds to the memory module as designated by the selected access request.
Other access request send-out control units 6-3, 7-3 and 8-3 perform the similar processing.
In the access request priority determining unit 9, an access request priority determining circuit 9-1 determines the processing priority to be imparted to the access requests supplied from the individual access request buffer units 5, 6, 7 and 8.
An access request accept controller 9-2 accepts the access request of the highest priority and sends out an access request accept acknowledge signal to the access request buffer units 5, 6, 7 or 8 in which the received access request as accepted origins.
The other access request priority determining units 10, 11 and 12 perform the similar processing.
The access request send-out control unit 5-3 incorporated in each of the access request buffer units 5, 6, 7 and 8 responds to reception of the access request accept acknowledge signal by sending out a succeeding access request to the corresponding one of the access request priority determining units 9, 10, 11 and 12.
The access request priority determining unit 9 in turn responds to reception of the access request issued from one of the access request buffer units 5, 6, 7 and 8 by sending out that access request-to the memory module MM0.
Each of the other access request priority determining units 10, 11 and 12 performs the similar processing.
In this manner, the four vector elements can be processed in parallel.