The present invention is directed to a multiprocessor data processing system. More particularly, the present invention provides a new and improved digital computer system generally comprising a plurality of identical processors each of which is connected to a common data bus and a specific tap of a tapped delay line instruction bus. The novel arrangement for a multiprocessor data processing system disclosed herein is operable to achieve highly overlap operation of the plurality of identical processors whereby the system may be arranged to include a large number (i.e., hundreds to thousands) of processors with no interprocessor competition for the system resources. The present invention has significant practical applications in areas of data processing wherein an identical series of processing steps must be performed on a very large amount of input data.
Throughout the entire development of electronic computer systems, those skilled in the art have continuously sought to develop computer machines having high data throughput rates to achieve a throughput speed-up in applications wherein a fixed set of instructions is applied to a large data base, in a repetitious fashion. An early prior art design is embodied in the so-called "Von Neumann" computer. A traditional Von Neumann computer utilizes a single, central processing unit (CPU) wherein both the program instructions and data flow to and from the single CPU from a single memory device. A problem associated with such Von Neumann computers is that the ultimate throughput rate achievable in such systems is limited by the memory bandwidth of the single memory device. Even the utilization of expensive, state of the art, high speed memory devices achieves a relatively small improvement in the throughput rate in that such high speed implementation technology generally involves a considerable and disproportionate increase in the overall cost of the system. The above-discussed practical limitations imposed by implementation technology and the cost thereof have resulted in commercially feasible Von Neumann computer systems capable of performing only a few Millions of Instructions Per Second (MIPS), at best.
Significant advances in the data processing art have been achieved pursuant to many prior art proposals including the separation of the instruction stream and the data stream into two physically distinct buses as well as the utilization of a large plurality of storage registers in the CPU to receive and temporarily store information to be processed by the CPU. These proposals, when coupled with advances made in solid state technology, i.e., low cost, high density integrated circuits, led to the multicomputer concept to achieve high data throughput rates. In a multicomputer system, a plurality of separate and independent processing elements are arranged whereby the processing power of the system is distributed across the several independent processing elements. In this manner, a given problem is divided among the several processing elements to achieve a throughput rate speed-up that is a function of the composite throughput rates of the several elements. Thus, the more processors employed, the greater the speed-up in the throughput rate of the system. In an ideal system, the throughput rate speed-up will be a near linear speed-up with increasing numbers of processors.
In well known prior art multicomputer systems (e.g., ILLIAC-IV, PEPE and Holland Machines), the several processing elements are arranged to be operable simultaneously. As implemented in commercial embodiments, these prior systems have encountered practical limitations in their throughput rates due to interprocessor and processor to memory conflicts. Accordingly, the number of processors which may be utilized at any one time to apply a series of instructions to a large data base is limited by the capabilities of the processor interconnect systems utilized in the system to control and co-ordinate processing operations. While sophisticated and complex processor interconnect systems have been developed to minimize interprocessor and processor to memory conflicts, the costs of these advanced interconnect systems is considerable particularly when viewed in relation to the number of additional processing elements addable to the system by virtue of the implementation of such advanced interconnect systems.
It is a primary objective of the present invention to provide a new and improved multiprocessor system including features designed to maximize the throughput rate without encountering the excessive costs and other practical limitations associated with the prior art proposals. Generally, the present invention provides a multicomputer architecture which permits substantially independent operation of each of the processing elements while accommodating the flow of instructions and data through the system without any interprocessor conflicts. This is achieved pursuant to a significant feature of the invention whereby each of the processing elements comprises a microcomputer arranged to be connected to a specific tap of a tapped delay line. The tapped delay line includes an input instruction bus interconnecting the tapped delay line with an instruction memory. The specific set of instructions to be applied by each of the microcomputers to the data is stored in the instruction memory.
As will be described in greater detail below, the instructions contained in the stored set of instructions are applied in a timed sequence, one at a time, from the instruction memory to the tapped delay line. The several taps of the tapped delay line are in a time skewed relation to one another such that each instruction applied to the tapped delay line will appear on one tap at a time and progress from tap to tap under the control of a clock associated with the tapped delay line. The same clock or a second synchronized time control is used to control the timed sequential application of the instructions from the instruction memory to the tapped delay line. Thus, when instruction one is applied to the tapped delay line, it will appear on the first tap interconnecting the first microcomputer to the tapped delay line. When the clock applies its next control signal, instruction two will be applied to the tapped delay line and appear at the first tap while instruction one is simultaneously transmitted to the second tap of the tapped delay line. The second tap is arranged to interconnect the second microcomputer to the tapped delay line. In this manner, the entire set of instructions stored in the instruction memory is sequentially applied to all the microcomputers of the system on a time-skewed basis. When the n.sup.th instruction is applied to the tapped delay line, the first instruction will have arrived at the n.sup.th tap. Accordingly, the entire set of common instructions to be applied by each of the microcomputers to its particular data parcel is effectively and orderly transmitted from the instruction memory to each and every one of the plurality of microcomputers without any conflict between the individual processors or between any of the processors and the system resources.
Pursuant to another feature of the invention, each of the microcomputers is also connected to a common data bus. The common data bus provides a means of communication between the several microcomputers and a common data memory. In accordance with the invention, the data to be processed is segmented into several data parcels with each parcel containing the entire number of data points to be processed by a particular microcomputer pursuant to the common set of instructions. The system of the invention includes appropriate control means to coordinate the operation of the instruction memory with the operation of the data memory whereby the data memory will present the parcel of data for each particular microcomputer to the common data bus when the instruction to take data from the data bus is being executed by the particular microcomputer. For example, if the first instruction concerns taking data from the data bus, the data parcel for the first microcomputer will be placed on the data bus when the first instruction is being executed by the first microcomputer and so on until the n.sup.th parcel of data is placed on the data bus when the first instruction is being executed by the n.sup.th microcomputer.
In the event the common set of instructions results in a parcel of output data from each of the microcomputers, the common data bus may also be used to transmit the output data parcels from each of the microcomputers to a system output device. The coordinated utilization of the common data bus to transmit output data parcels will be similar to the input data parcel approach described above.
Thus, the present invention provides a straightforward and highly effective architecture for a multiprocessor data processing system. The flows of data and instructions are controlled and coordinated to achieve a maximum throughput rate for the system with minimal interference between the several microprocessors. Indeed, the throughput rate speed-up factor of the present invention is substantially linear with the speed-up factor approximately equaling the ideal of the product of the number of processors in the system times the processing rate of each microprocessor (measured in Millions of Instructions Per Second). This is particularly true when the total number of data points is large, e.g., 1,000,000 data points. The system of the invention has significant utility in application environments wherein a large array of input data points must be processed pursuant to the common set of instructions to produce an array of output data points. Suitable applications for the present invention exist in the fastest growing areas of technology, such as graphics and image processing, quality control testing and computer-aided design systems. As will be discussed in the following detailed description of preferred embodiments of the invention, the basic principles of the present invention may be applied to achieve optimal throughput rates in highly economically feasible hardware and software systems. Such embodiments of the invention are compatable with existing computer technology and "real" world interfaces for convenient implementation to upgrade an overall system's effectiveness and speed of operation.
Moreover, the present invention contemplates several configurations for the several processors as well as enhancement features for each of the processors to further improve the throughput rate speed-up capability of the system. The multiple processor configurations include either linear or parallel chain arrangements or a shared chain arrangement whereby two separate sets of instructions may be applied to data by "sharing" the several processors of one linear array between two instruction memory-tapped delay line arrangements. These various configurations are further enhanced, where appropriate, by data buffers and double buffered instruction interconnections between each of the microcomputers and the common data bus and tapped delay line. In this manner, each of the microcomputers may temporarily store data and/or several steps of instructions at one time to increase the flexibility of operation of each unit without interfering with the system's common resources.
Furthermore, paired microcomputers may be utilized for each of the data processors whereby each of the pair processes the same data parcel and the outputs of the pair are compared to verify accuracy. Such an arrangement lends itself to a fault tolerant embodiment of the invention wherein the detection of faulty operation in any of the processing means by a mismatch between the outputs of the paired microcomputers will activate a control signal to remove the faulty processor pair from the chain and transfer the data parcel earmarked for that particular processing unit to another properly functioning unit.
To advantage, a physical embodiment of the present invention may be assembled from commercially available subcomponents in a manner whereby the system is dynamically reconfigurable and adaptive to accommodate changes in the number and configuration of microcomputers employed. Accordingly, the operation of the system may be conveniently modified to obtain optimum results for the number of data points and instructions involved in a particular practical application.
For a better understanding of the above and other features and advantages of the invention, reference should be made to the following detailed description of preferred embodiments of the invention and to the accompanying drawings.