The present invention relates to a vector processor capable of fast vector computation and, particularly, to a vector processor with simplified control for writing vector data into the vector register so as to perform vector processing at a high speed.
A vector processor has a main storage for storing vector data and a vector register for storing vector data read out from the main storage. The main storage and vector register are accessible for reading and writing data by specification of a bit position in each word. The specified bit position for reading or writing is called an "address bound". The main storage and address register can store data of a plurality of byte widths.
In case of data stored in the main storage in l.sub.1 -byte width, data stored in the address register in l.sub.2 -byte width, and a vector element in m-byte width, a relationship is assumed in that l.sub.1 /m is an integer and l.sub.2 is larger than or equal to m. In the vector processor, vector data stored in the main storage is read out by specifying its address bound, the element is stored in the vector register, the vector data is transferred from the vector register to an arithmetic unit for computation, and the computational result is stored in the vector register.
In the prior art vector processor having two arithmetic units with each word of the vector register including two vector data elements to be processed by separate arithmetic units, e.g., arithmetic unit 0 and arithmetic unit 1, it is necessary to determine which arithmetic unit is to be used for processing each vector data element included in each word of the vector register and to input each vector data element to a certain arithmetic unit depending on the determination, causing the control circuit to be complex and requiring a long vector computation time due to complicated control for the frequent data transfer from the vector register to the arithmetic units. Referring to FIG. 2(e), showing an example of data element arrangement in the main storage of the conventional vector processor, vector data elements arranged in a main storage 100 are- put into a vector register 200-1 as shown in the figure. In the figure, each word of the vector register 200-1 has an 8-byte width or size, while data such as F(1) and F(2) have a data size of 4 bytes, and data F(1), F(3) and F(5) are to be entered to one arithmetic unit, while data F(2) and F(4) are to be entered to another arithmetic unit. In transferring vector data from the vector register 200-1 to the arithmetic unit, it is necessary to determine whether an effective data F(l) resides in the right-hand or left-hand 4 bytes of an 8-byte word which has been read out as the first word from the vector register, and also to determine which arithmetic unit is to be used for the computation of the data F(1). For the second word, it is necessary to determine which arithmetic unit is to be used for each of data F(1) and F(2) in the word read out of the vector register. The same determinative process is required for the following words. FIG. 2(f) also shows an example of the data element arrangement in the conventional vector processor using two arithmetic units. In FIG. 2(f), vector data elements arranged in the main memory 100 as shown in the figure are placed in the vector register 200-1 as shown in the figure. Also in this case, each word of the vector register 200-1 has an 8-byte size and data such as G(1) and G(2) are in a 4-byte size. Data G(1), G(3) and G(5) are to be entered to one arithmetic unit, while data G(2) and G(4) are to be entered to another arithmetic unit. Also in the case of FIG. 2(f), in transferring vector data elements in the vector register 200-1 to an arithmetic unit 300, it is necessary to determine whether an effective vector data element resides in the right-hand or left-hand 4 bytes of an 8 -byte word read out from the vector register 200-1, and to determine which arithmetic unit is to receive the effective data.
In both cases of FIG. 2(e) and FIG. 2(f), a complex control circuit is needed for the determination of data transfer from the vector register to the arithmetic units, and extra time is required for a complicated control in transferring data from the vector register to the arithmetic unit, resulting disadvantageously in a lengthy vector computation time.
In a conventional vector processor, where a single arithmetic unit is used for each word of the vector register, vector data cannot be processed if a plurality of vector elements of the vector data are stored in one data area of the memory.
By application of technology for determining which vector data element in each word from the vector register is to be entered to which one of the arithmetic units in the conventional vector processor using two arithmetic units, an attempt may be made to solve the problem in the case of using a single arithmetic unit as mentioned above. However, this poses the following problem. Namely, one possible way in reading out vector data stored in the main storage and writing the data into the vector register is to write vector data elements into the vector register without changing the arrangement in the main storage. In this case, however, it becomes necessary for vector data elements arranged in words from the vector register to be rearranged in positions suitable for the process by the arithmetic unit before they are inputted to the arithmetic unit. For example, when the vector register has 8-byte words with a vector data element of 4-byte length placed in the right-hand 4-bytes of the word and the arithmetic unit requires the input word format in the arrangement of left-to-right order, a vector data element in the word read out from the vector register needs the rearrangement of shifting it to the left-hand side of the word. In another example in which each vector element has a different arrangement position in a word of the vector register, as in case where, for example, a plurality of vector data elements exist continuously within a word, rearrangement must be made differently for each vector element. Therefore, a decision process for determining the rearrangement of vector data elements is needed, that requires a complex control circuit, and moreover, it takes a longer time to transfer vector data from the vector data register to the arithmetic unit. In vector computation, read-out of vector data from the main storage to the vector register is followed by frequent transfer of vector data and results between the vector register and arithmetic unit, resulting disadvantageously in a lengthy vector computation time due to the rearrangement for a vector data element and the determination of the rearrangement each time a vector data element is transferred from the vector register to the arithmetic unit, and in a complex control circuit for the determination.
The above problems will further be detailed in the following. FIG. 1 illustrates the process of vector computation by a vector processor. The arrangement shown in the figure includes a main storage 100, vector register group 200, vector registers 200-1 through 200-n each made up of l' elements, and an arithmetic unit 300. In operation, l-element vector data A(1-l) and B(1-l) in the main storage 100 are read out to the vector registers 200-1 and 200-2 so that a necessary computation is performed for these data elements by the arithmetic unit 300, and the resultant vector data C(1-l) is fed through the vector register 200-3 and stored in the main storage 100. It may be possible for the, main storage 100 to have a fixed word length of 8 bytes for vector data elements so as to simplify the data structure. However, if the vector data element is made to have not only the 8-byte size, but for example the 4-byte size, a large scale vector data can be handled by a main storage of the same capacity, and such capability is strongly desired. In the case of reading out vector data elements in different size (e.g., 4-byte data and 8-byte data) in the main storage 100 into the vector registers for computation, the following problems as will be described in connection with FIG. 2.
In FIGS. 2a to f, reference number 100 denotes a main storage and 200-1 denotes a vector register. It is assumed that the elements of the vector register have a data size of 8 bytes, vector data stored in the main storage has a data size of 8 bytes, and an address of data in the main storage is given on a byte basis. Assumption is also made that vector data is stored in the main storage with the following address format. Namely, the leading element (the first element) is addressed directly, but the following elements are pointed to in terms of the incremental address given on a byte basis relative to the preceding element. In FIG. 2, (a) and (b) show the operations of reading out 8-byte vector data stored in the main storage 100 into the, vector register 200-1 and then inputting the data to the arithmetic unit 300. In case (a), vector data is stored continuously in the main storage 100 with an address increment of 8 for each element, while in case (b), vector data has an address increment of 16 and the elements are stored in every two locations of the main storage. In both cases, vector data read out from the main storage to the vector register can be fed in order without any manipulation (e.g., shift operation). Cases (c) and (d) are transfer operations for vector data with a data size of 4 bytes for each element. In case (c), the elements have an address increment of 4 and are stored continuously in the main storage, allowing the main storage to read out two elements in a pair as 8-byte data. In this method, the first and second elements of vector data may be read out into the left and right halves of the first element of the vector register as shown in the figure. In case (d) where the address increment is 20, odd-numbered elements are read out into the left halves of the elements of the vector register and even-numbered elements are read out into the right half of a vector register as effective 4-byte data, as shown in the figure. In the cases of (c) and (d), when vector data read out into the vector register is to be inputted to the arithmetic unit in left-justified and in the ascending order of the element number, there arise problems:
Different treatment is needed for updating the vector register read counter for the case of 4-byte vector data elements with the continuous vector data element address and for other cases, resulting in a complicated updating operation for the counter.
In the case of 4-byte data, information of whether 8-byte data of one element of the vector register contains two effective 4-byte data or one, and for the latter case whether the data is located in the right half or left half is needed by the vector register controller, and moreover, a circuit for taking out 4-byte data from 8-byte data is needed, causing the vector register controller to be complex. Thus, the complex vector register controller takes an extra time for transferring data from the vector register toe the arithmetic unit, and the register controller needs increased hardware.
The foregoing cases of FIGS. 2(a) through 2(d) are of the transfer of vector data elements one by one to the arithmetic unit irrespective of the data size of vector data, whereas for the data size of 4 bytes it is possible to transfer two elements in a pair to the arithmetic unit and carry out the computations for the two elements concurrently as shown by (e) and (f). In these cases, in addition to the foregoing problems, two vector data read out of two adjacent elements of the vector register need to be merged in accordance with information on how effective 4-byte data is stored, and complicated control is required for this purpose. It should be noted in FIGS. 2(e) and 2(f) that the unit 300 includes two arithmetic units.
In order for vector data to be read out from the main storage and transferred in order in a predetermined format to the arithmetic unit, the above-mentioned complicated data manipulation is needed. And, if this data manipulation is to be done while it is transferred from the vector register to the arithmetic unit, a complex control circuit must be provided for each of the vector registers, and in addition, due to the complex vector register control circuit, reading-out of vector data from the vector register into the arithmetic unit takes a long time, making it difficult to speed-up the operation.