The invention disclosed broadly relates to data processing systems and more particularly relates to processing elements within a data processing system. Even more particularly, the invention relates to the instruction execution mechanisms of such processors and the data handling associated with these mechanisms.
Present day computers are machines that accept instructions and data and produce results from the data according to their instructions. They are made up of elements that communicate with the external world, store data, change the forms of the data, and convert their instructions (which are moved around within the computer as data) into the signals that control the computer's activity. This invention is concerned with those components of the computer, called processing units or elements, that actually carry out the computations. A processing element is typically constructed of one or more circuits that convert the instructions into control signals and perform certain computations, one or more memory circuits for temporarily storing the data to be operated on and the results, and the communications links, or buses, capable of communicating data, instructions, and results between the processing and memory circuits. The present invention consists of an arrangement for connecting the various processing and memory circuits together and a set of instructions for the resulting processor.
When configuring a processing element, the most important attributes to be considered are its processing speed, its versatility, and its overall cost. The present invention is designed so that it is partitioned in such a way that very fast technologies can be used to implement its critical parts and the overall processing element can be customized according to the application. Although speed was the primary design criterion used in formulating this invention, it achieves a good speed to cost ratio by using relatively simple circuits and connections between these circuits. It is the simplicity of the circuits that allows them to be implemented using low-density, but very fast technologies such as those based on gallium arsenide. The architecture contains only two primary internal buses so that, even though the processing element may include several integrated circuits, the power dissipation is kept at a reasonable level. Each integrated circuit would be required to drive only one of these buses and the width and speed of these buses could be set according to the design requirements.
The invention is comprised of a control unit and a very high speed register set/data RAM combination which constitutes its central memory. The control unit receives its instructions from a code stream and uses the instructions to transfer data from the central memory to a parallel set of processing circuits over an output bus. The processing circuits also receive from the control unit that portion of the instructions that indicates which processing circuit is to perform the computation and the precise form of the computation. The selected processing circuit acts as a simple transponder that inputs the data, performs its computation and returns its result(s) to the central memory via an input bus. The instruction execution is coordinated with the arrival of results using a destination validation scheme similar to the scheme often referred to as scoreboarding (see Richard Y. Kain, Computer Architecture: Hardware and Software, Vol. 2, Addison-Wesley, 1989, pp. 236-237). Scoreboarding attaches a valid bit to each data location in the local store of the processor. The present invention, however, retains the destination addresses of data in the local store of the control unit until the corresponding results are returned by the processing circuits. The present invention does use valid bits on the destination addresses to denote when the destination address is being used by a processing circuit, but it does not have a valid bit associated with each local store address.
In addition, most processing elements are such that they must use an explicit instruction to load a datum into or store a datum from their internal memories. The present invention allows for the automatic inputting and outputting of sequential data to and from its central memory. This is done by using partitions and, perhaps, first-in/first-out (FIFO) buffers within the central memory.
The overall architecture of the present invention is similar to that of the invention described in Trubisky et al, U.S. Pat. No. 4,521,851, issued Jun. 4, 1985 (the '851 patent), but it also differs from the '851 patent in several respects. The '851 patent also uses separate code and data streams, a central memory, multiple processors, and an output bus for sending source operands to the processors and an input bus for returning results. However, the '851 patent uses a conventional data cache for its central memory, includes separate first-in/first-out buffers for temporarily storing the results until they can be returned to the data cache, coordinates the return of the results with the instruction's execution by including a second instruction execution queue that is associated with the result storage circuitry, and does not provide for automatic prefetching and storing of array data. Also, the processors in the '851 patent serve specific purposes, some of which are related to address calculations. Address calculations are not applicable to the present architecture because it uses only the immediate, direct, and register indirect addressing modes. The equivalent of base, index, and virtual addressing is left to external circuitry such as that described in the invention given in patent application (B) of the foregoing list of copending patent applications. The processors indicated in the present invention may serve extremely varied purposes and may be designed to fit a specific application. Communication with the external data memory hierarchy is handled by one of the processors, the I/O processor, instead of a data cache as in the '851 patent. A final observation is that the architecture of the '851 patent must be synchronous. For the invention herein disclosed, because of the simplicity of the control unit/central memory design and the subdivision separating this circuitry from the processors, it is possible to use either a synchronous or an asynchronous design for of the components--the control unit, the central memory, or any or all of the processors. The use of asynchronous circuitry is highly desirable when designing very high speed circuitry.