Parallel processing systems have been utilized as an expedient approach for increasing processing speeds to create computer systems which can manage voluminous data and handle complex computational problems quickly and efficiently. A number of parallel or distributed processing systems are well known in the prior art.
A massively parallel processing system may include a relatively large number, often in the hundreds or even thousands, of separate, though relatively simple, microprocessor based processing elements inter-connected through a communications fabric typically comprising a high speed packet network in which each of the processing elements appears as a separate node on the network. Messages, in the form of packets, are routed over the network between these processing elements to enable communication therebetween. Each processing element typically includes a separate microprocessor and associated support circuitry including but not limited to storage circuitry such as random access memory (RAM), and read only memory (ROM) circuitry and input/output circuitry, as well as a communication sub-system comprising a communications interface and associated hardware and software which enable the processing element to interface with the network. The communication fabric permits simultaneous or parallel execution of instructions by the processing elements or nodes.
In such a parallel processing system including a number of interconnected nodes, the number of possible combinations of internode communications (or communication patterns between the nodes) grows as 2.sup.n(squared) where n represents the number of nodes capable of passing messages between each other in the system. Thus, a distributed computing system having as few as eight nodes may exhibit over 2.sup.64 (or approximately 1.845*10.sup.19) different potential communication patterns.
While it is not practical to implement a system which exercises every possible communication pattern in a parallel computing system, it is desirable to enable a system to quickly and efficiently exercise any communication pattern which is of interest at any time.
For example, and without loss of generality, consider the generation of test programs for testing the interconnected nodes in a typical parallel processing system. Current test programs implement fixed communication patterns, with the consequence that the communication subsystem for the parallel computing system is not well tested. A typical current generation test program is generated to test a particular communication pattern. The creation of such a test program is a labor intensive process entailing the dedication of considerable programming resources. Thus, the generation of a number of such test programs, to test a variety of communication patterns which may be implemented in a parallel computing system would consume a significant amount of programming resource. Consequently, given the state of current testing art the creation of test programs to achieve adequate test coverage of communication patterns in distributed computing systems is a cost prohibitive endeavor.
The use of graphical analysis to define abstract binary relationships has long been understood. Parallel computing system topologies have been constructed utilizing graphical analysis techniques which achieve higher fault tolerance. For example, U.S. Pat. No. 5,280,607 entitled Method and Apparatus For Tolerating Faults in Mesh Architectures, issued on Jan. 18, 1994 to Bruck et al. describes the construction of fault tolerant mesh architectures to be used in parallel computing systems. The invention is implemented via the arrangement of a graphical construction wherein the nodes in a circulant graph represent identical components and the edges of the graph represent communication links between the nodes of a fault tolerant mesh designed for a given target mesh. For a predetermined number of possible faults, the so-designed fault tolerant mesh is guaranteed to contain, as a subgraph, the graph corresponding to the target mesh.
Graphical analysis has additionally been employed to implement improved message routing between processing nodes in a parallel computing system. U.S. Pat. No. 5,170,393 issued to Peterson et al., entitled "Adaptive Routing of Messages in Parallel and Distributed Processor Systems", describes a method and apparatus for reducing message latency between communicating nodes by implementing a heuristic-based adaptive routing algorithm which prunes failing communication path segments from consideration, thereby driving a node to node communication to success or failure in a minimal amount of time. The routing model described by Peterson et al. utilizes a partitioned interconnection graph of the system to define a specific set of nodes which must be visited while pruning the network for a successful path between the source and destination nodes. The pruning algorithm may be implemented on an adjacency matrix which includes a one at the intersections of nodes which are separated by one hop (or a single edge of the interconnection graph).
In an article authored by Wang et al., entitled "Scheduling of Unstructured Communications on the Intel iPSC/860", appearing in Supercomputing 94, Proceedings, pp. 360 et seq., 1994, a communication matrix representing a predetermined communication pattern for a highly parallel computing system is decomposed via a number of algorithms into partial permutations. These partial permutations are used to schedule all-to-many personalized communications for the highly parallel computing system so as to avoid node and link contention.
The communication matrix is used, in the context of the Wang et al. article, as an input for the scheduling algorithms, and the invention contemplates scheduling messages for a known communication pattern so as to avoid contention. Thus, Wang et al. describe the capture of a communication pattern for an existing program in incidence graph form in order to optimize the execution of that program.
U.S. Pat. No. 5,313,645, issued to Rolfe, entitled "Method For Interconnecting And System Of Interconnected Processing Elements By Controlling Network Density", (commonly assigned to the present assignee and incorporated herein by reference) describes, inter-alia, a method for interconnecting processing elements so as to balance the number of connections per element against the network diameter. The patent teaches the use of a network connection algorithm to accomplish the novel interconnection technique. Though directed toward defining physical node interconnections (i.e. via cable links or other means) in a massively parallel computer network, the algorithm may be adapted to the generation of logical communication paths for exercising a number of such patterns within a parallel computing system. Thus, the creation of incidence graphs corresponding to a set of important communication patterns which may include hypercubes and n-dimensional toruses of varying diameter, containing interconnected arcs, may be generated via this algorithm embodied in a single computer program.
Utilizing this automatically generated set of communication patterns for a parallel processing network as an input, it is possible to implement a system wherein these incidence graphs (or communication matrices) serve to direct the passage of messages between the nodes in a parallel computing system so as to implement any communication pattern which may be of particular interest for the parallel computing system. Moreover, in such a system, an incidence graph may be alternatively utilized to drive the execution of non-communication based operations at the processing nodes in the parallel computing system.