The present invention relates, in general, to the field of computer architectures incorporating multiple processing elements. More particularly, the present invention relates to a multiprocessor computer architecture incorporating a number of memory algorithmic processors (xe2x80x9cMAPxe2x80x9d) in the memory subsystem or closely coupled to the processing elements to significantly enhance overall system processing speed.
As commodity microprocessors increase in capability there is an ever increasing push to use them in high performance multiprocessor systems capable of performing trillions of calculations per second at significantly lower cost than those made from custom counterparts. However, many of these processors lack specific features common to systems in this category that employ much more expensive custom processors. One such feature is the ability to perform vector processing.
In this form of processing, a data register or buffer is filled with operands forming what is called a vector. All of these operands are then passed one after the other through a functional unit capable of performing operations such as multiplication. This functional unit will output one result every clock cycle. This type of processing does require that the same operation be performed on all operands in the input vector and it is, therefore, widely used in that it exhibits much higher processing rates than the traditional scalar method of computation used in most microprocessors.
Nevertheless, neither vector nor scalar processors perform very well when required to perform bit manipulation as is required, for example, in matrix arithmetic. One such function is a bit matrix multiply operation in which two matrices of different sizes are multiplied together to form a third matrix. Another shortfall of both vector and scalar processing is their inability to quickly perform pattern searches such as those used in a variety of pattern recognition programs.
A solution to all of these deficiencies can be found by building a high performance computer which contains numbers of commodity microprocessors to reduce the system cost together with MAP elements developed by SRC Computers, Inc., assignee of the present invention, to provide the deficient functions at very low cost. The MAP architecture and specific features thereof is disclosed in the aforementioned patent applications, the disclosures of which are herein specifically incorporated by this reference.
The enhanced memory algorithmic processor architecture for multiprocessor computer systems of the present invention is an assembly that not only contains, for example, field programmable gate arrays functioning as the memory algorithmic processors, but also an operand storage, intelligent address generation, on board function libraries, result storage and multiple I/O ports. Like the original MAP architecture disclosed in the aforementioned patent applications, this architecture differs from other so called xe2x80x9creconfigurablexe2x80x9d computers in many ways.
First, its function is intended to be altered every few seconds distinguishing itself from other systems with very long reconfiguration times primarily intended for a single function. Secondly, it contains dedicated hardware to provide for large data set operand storage (on the order of 16 Mbytes or more) allowing the MAP element to function autonomously from its host system once operands are loaded. Thirdly, it contains dedicated data ports to allow, but not require, multiple MAP elements to be chained together to perform very large operations. As currently contemplated, it is intended that typically 32 to 512 or more MAP sections can be connected in a single system.
Further, the MAP element is intended to augment, not replace, the high performance microprocessors in the system. As such, in a particular embodiment of the present invention, it may be connected through the memory subsystem of the computer system resulting in it being very tightly coupled to the system as well as being globally accessible from any processor in the system. This technique was developed by SRC Computers, Inc. and distinguishes the MAP architecture from all other so called xe2x80x9cattached array processorxe2x80x9d systems that may exist today. While such xe2x80x9cattached array processorxe2x80x9d systems may bear some superficial similarities to MAP based systems, they are entirely separate units connected to the host computer through relatively slow interconnects resulting in lost system performance.
The MAP architecture developed by SRC Computers, Inc. as defined in the aforementioned patent applications overcomes many of the limitations of such xe2x80x9cattached array processorxe2x80x9d systems. Because of the particular limitations in the exemplary architecture disclosed therein surrounding the attachment of input storage and chaining capabilities, certain vector processing functions may not have been optimally implemented unlike relatively smaller algorithms.
Through the addition of these and other features to the MAP architecture, a much more powerful multiprocessor computer system is provided. Moreover, while, as originally disclosed, another feature of the MAP architecture was its ability to perform direct memory access (xe2x80x9cDMAxe2x80x9d) into the common the memory of the system, enhancements disclosed herein have expanded the potential utilization of this feature.
Particularly disclosed herein is a Memory Algorithmic Processor (xe2x80x9cMAPxe2x80x9d) assembly (or element) comprising reconfigurable field programmable gate array (xe2x80x9cFPGAxe2x80x9d) circuitry, an intelligent address generator, input data buffers, output first-in, first-out (xe2x80x9cFIFOxe2x80x9d) devices and ports to allow connection to a memory array and chaining of multiple MAP assemblies for the purpose of augmenting the capability of a microprocessor in a high performance computer.
Further disclosed herein is a MAP assembly comprising an intelligent address generator capable of supporting a data gather function from its associated input buffer or common memory. The MAP assembly may also comprise circuitry to allow the reconfigurable elements to reprogram their on-board configuration read only memory (xe2x80x9cROMxe2x80x9d) devices to cause alterations in the functionality of the reconfigurable circuitry.
Still further disclosed herein is a MAP assembly comprising dedicated input and output ports for the purpose of allowing an infinite number of MAP elements to be chained together to accomplish a single function. The MAP assembly may also incorporate provisions to create a single MAP chain or multiple independent MAP chains automatically based on the contents of the reconfigurable circuitry.
Further disclosed herein is a MAP assembly comprising output FIFOs for the purpose of holding output data and allowing the MAP element to not stall in the event the processor reading these results is delayed due to outside factors such as workload or crossbar switch conflicts. The MAP assembly may further comprise relatively large dedicated input storage buffers to allow for optimization of operand transfer as well as allow multiple accesses to an operand without requiring external processor intervention.
Still further disclosed herein is a MAP assembly comprising a dedicated port for connection to an input buffer so that the MAP element can simultaneously receive operands via the chained input (chain) port and the input buffer. This allows the MAP element to perform mathematical processing at the maximum possible rate while also allowing the MAP element to accept operands via the chain port while accessing reference data in the input buffer (such as reciprocal look up tables) to allow the MAP element to perform operations such as division at the fastest possible rate.
Also further disclosed herein is a MAP assembly which may comprise connections to the memory subsystem of a high performance computer for the purpose of providing global access to it from all processors in a multiprocessor high performance computer system. The MAP assembly incorporates the capability to update multiple on board function ROMs under program control while in the system and may also include connections to the memory subsystem of a high performance computer utilizing DMA to accept commands from a microprocessor.