The present invention relates to a vector processor capable of executing a vector operation at high speed, and more particularly to a vector processor which efficiently processes a vector operation having more elements than vector elements storable in vector registers.
FIG. 1 shows the situation for a vector operation in a prior-art vector processor which has a plurality of vector registers. In the figure, numeral 1 designates a main storage unit, numeral 7 a vector register unit, and numeral 8 an arithmetic logic unit.
When a load instruction is executed, vectors X and Y are fetched from the main storage unit 1, and they are stored in vector registers VR0 and VR1, respectively. Then, when a vector operation instruction is executed, the contents of the VR0 and VR1 are fetched and inputted to the arithmetic logic unit 8, in which an assigned operation is performed, the operated result being stored in a vector register VR2. Further, when a store instruction is executed, the content of the vector register VR2 is stored as a vector Z in the main storage unit 1. Here, k represents the number of elements of a vector to be processed (called the "vector length"), and l represents the number of vector elements storable in a single vector register (hereinbelow, called the "vector register length").
The vector length k differs widely depending upon the programs, and in general, it can reach 10 to 10000 or more. On the other hand, the vector register length l has its upper limit determined by restrictions in the hardware of the processor to be used, and it ranges from 64 to 256 in the present situation.
Therefore, for k &lt;l, all necessary processing can be completed with one vector operation, whereas for k &gt;l, more than one vector operation is required. In this situation, the following two methods are considered.
The first method is to divide the processing into units by software and to process the respective processing units independently by hardware.
The second method is the direct processing by hardware, in which the vector data is divided into groups each consisting of at most l elements and the individual groups are then processed. The l elements shall hereinbelow be referred to as a "segment". That is, in this method, after one segment has been processed by a vector instruction series, the vector instruction series is fetched again so as to process the next segment.
Owing to the hardware control, the latter method has the advantages that the overhead can be curtailed and that fast processing is permitted.
In the latter processing method, it is important to enhance the speed of processing wherein, after the processing of the vector instruction series for a certain segment (hereinbelow, this shall be called "loop processing" and "loop i" shall signify the loop processing for the i-th segment), addresses concerning the segment for use in the next loop processing are obtained.
Such an address updating method is described in "PROCESSING VECTORS AS A PLURALITY OF SEGMENTS" (IBM Technical Disclosure Bulletin, Vol. 13, No. 12, May 1971). This publication teaches the address updating method, but it does not disclose a circuit which execute operations in pipeline fashion by the use of vector registers as shown in FIG. 1.
FIG. 2A is a schematic block diagram of the known art, while FIG. 2B is a diagram of the general processing flow thereof.
In the case of accessing (load and store) a main storage unit 1, the load addresses (or store addresses) of respective elements are successively generated on the basis of a head address (the address of the first element and termed the "base address") and an address interval (hereinbelow, termed the "increment value") between the adjacent elements, whereupon the main storage unit is accessed on the basis of the generated addresses. Hereinbelow, a case where the vector data is arranged in the main storage unit 1 without any vacancy between the adjacent elements, namely, with all the elements being continuous, shall be called an "address continuation" condition, while the other case shall be called an "address discontinuation" condition. Assuming that the increment value is assigned in byte amounts, the address continuation condition holds for an increment value of 8 when the width of the vector data is 8 bytes, and it holds for an increment value of 4 when the width of the vector data is 4 bytes. Here, the vector register length is assumed to be 256.
First of all, the base address is set in base register 105, the increment value is set in an increment register 101, and a countup register 102 is reset to zero.
In a segment base register 109, the addresses of the head elements of segments to which respective loop processings are directed are set as stated below. Such address shall hereinbelow be called the "segment base address". In addition, the address displacement from the address of the head element of the vector data to a certain segment base address shall hereinbelow be called the "segment address displacement".
The countup register 102 retains the head element number in each loop processing.
Accordingly, the segment base address is found by adding the value of the base register 105 to the product between the value of the increment register 101 and that of the countup register 102. A method of calculating the product includes the following two cases.
The first case is the case of the address continuation condition. The value of the countup register 102 is inputted to a leftshift register 104, in which for 8-byte data the value is shifted leftwards by 3 bits, or for 4-byte data the value is shifted leftwards by 2 bits. The resulting output is set in an address register 107 through a selector 106.
The second case is the case of the address discontinuation condition. In this case, the product between the value of the increment register 101 and that of the countup register 102 is calculated by a multiplier 103, and the result is set in the address register 107. In either case, the contents of the address register 107 and the base register 105 are added by an adder 108, whereby the segment base address is obtained. An increment decoder 100 detects the address continuation/discontinuation condition, and controls the selector 106 on the basis of the detected result.
The countup register 102 needs to increase by the vector register length for the next loop processing. As shown in FIG. 2B by way of example, the updating of this register 102 is done at the last step of a instruction series which begins with a load instruction and ends in a store instruction. Further, since the multiplication in the multiplier 103 requires a considerable time for the address discontinuation condition, there is the problem that the starting point at time c of a load instruction in a loop 2 becomes late.
Regarding some contents of processing, it is desired to perform the processing with the starting time c of the load instruction of the loop 2 preceding the starting time b of the store instruction of the loop 1 in FIG. 2B, in other words, it is desired to overlap the instruction starting times between the processing loops. With the prior art, however, the countup register 102 is updated for the next loop processing at the last step of the preceding loop processing and the address is generated by the use of the value of the countup register 102 as stated before, so that overlap processing has been impossible with this prior art system.