The present invention relates to the field of memories in general, and specifically, relates to the field of vector registers for data processors.
Computers have long used both special and general purpose registers to store frequently accessed data. More recently, certain computers have added vector registers to store vector elements, which are the data operated on by vector instructions. Vector processing is especially well-suited to expedite repetitive operations performed on sequential data elements. Examples of vector registers are shown in Cray, Jr., U.S. Pat. No. 4,128,880; Chen et al., U.S. Pat. No. 4,636,942; and Chen et al., U.S. Pat. No. 4,661,900.
Vector processing is often performed in a pipelined fashion. The tasks of fetching the vector elements, performing an arithmetic operation on them, and storing the results are broken down into small fragments which are executed in parallel by dedicated hardware. Due to the repetitive nature of vector operations, such pipelining increases the speed of processing by a factor almost equal to the number of pipeline stages if the vector registers can simultaneously produce enough vector elements for all the pipeline stages and store the results also. "Chaining" is a term used to refer to mechanisms that use results from certain operations as operands in later operations without added delay. A discussion of chaining, although in a slightly different context, appears in Chen et al., U.S. Pat. No. 4,661,900.
Even with the use of chaining and pipeline processing, however, conventional vector registers suffer from several disadvantages. For example, if vectors are stored in main memory, large memory delays result and a high bandwidth communication path must be established between memory and the vector processors. Therefore, it is attractive to store vectors in special vector registers.
Designing special vector registers involves several tradeoffs. One is hardware cost and another is performance. To obtain both speed and flexibility, vector registers are usually built using flip-flops or latches. Such registers, however, are both bulky and expensive. The use of conventional RAM chips avoids the size and cost problems but does not satisfy the access requirements due to the monolithic organization of such chips.
Vector registers must supply operands to multiple processing streams and accept the results. To maximize performance, the vector register file must support multiple simultaneous accesses during each processor cycle. This requirement is very difficult to achieve using conventional RAM components.
Furthermore, vector registers need to be physically connected to the processing elements to exchange operands. Such connections often create problems of excessive pin requirements, excessive signal load, and conflicts from wire sharing.
Finally, the main purpose of vector processing is to achieve high processing performance. Thus, it is necessary to cycle the storage elements in very short time periods. Using traditional RAM chip implementations, short time periods are difficult to achieve because the cycles need to include address distribution, RAM access times, and data distribution. The attainment of high speeds is further complicated by the need to allow for clock skew.
Therefore, it is an object of the present invention to provide vector registers which can be used with great efficiency to speed up vector processing, for example by using short cycle times.
It is also an object of the present invention to configure the vector registers using nonstandard RAM cell technology, to satisfy the access requirements of high speed vector registers.
Another object of the invention is to minimize the physical interconnection problems between vector registers and the processing elements.
Still another object of the invention is to improve the capability of the vector registers to use overlapping techniques and pipeline processing, by providing simultaneous multiple accesses to vector registers.
A further object of the invention is to provide a vector register which minimizes conflicts during consecutive register accesses to any individual register.
Additional objects and advantages of the present invention will be set forth in part in the description which follows and in part will be obvious from that description or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by the methods and apparatus particularly pointed out in the appended claims.