This invention generally relates to a method of selecting from among a plurality of computer processors connected together, and more particularly to a method of distributing portions of an overall numerical problem to be solved among a plurality of interconnected processors for optimizing the speed of communication between the processors when solving the overall problem.
Parallel processing utilizes the combined power of a plurality of computer processors that are connected together and communicating with each other via physical network connections. The processors operate simultaneously in parallel to solve a numerical problem that is generally too large and complex for a single computer to handle in a timely manner. A further advantage of parallel processing is that the numerical problem solution capability afforded by a large number of networked processors can be much less expensive than that offered by a supercomputer. This is due primarily to rapid advances and decreasing costs in the design and manufacture of semiconductor computer processors and associated devices, which have allowed greater computational power and memory storage capacity to be integrated into a desktop machine such as a personal computer or workstation.
The overall numerical problem or domain to be solved is divided up into a number of smaller problems or xe2x80x9csub-domainsxe2x80x9d, which are then each assigned to an associated processor. Each processor solves its associated sub-domain by means of local calculation and communication of intermediate results to other processors in order to achieve a complete solution for the domain. Due to the nature of the overall numerical problem to be solved, certain processors often communicate more frequently than other processors.
Rapid communication between the interconnected processors is essential for acceptable parallel processing performance. The time required to communicate a data message between processors is relatively longer and, thus, more expensive than the time spent by each processor in computing the data. For example, communication time may be measured in hundredths of a second, whereas computational time may be measured in millionths of a second. Thus, it is important to reduce, as best as possible, the communication time between processors when solving the overall problem.
One approach to the parallel solution of many large and complex problems is to utilize tens or hundreds of distributed processors. This approach is often known as xe2x80x9cdistributedxe2x80x9d, xe2x80x9cclusterxe2x80x9d, or xe2x80x9cnetwork of workstationsxe2x80x9d parallel computing. The processors may typically be utilized to perform dedicated tasks, such as CAD drafting, during normal working hours. When used as such, the processors are unavailable for parallel processing in solving computational problems. However, a major advantage of distributed parallel processing is in its use of the processors during other than normal working hours to solve the computational problem. In this way, the processors may be utilized virtually continuously, 24 hours a day.
Typically, one or more processors may be located within a single workstation or personal computer. The workstations and computers may be located physically close to each other, or they may be remotely located apart from one another; for example, in different buildings or facilities, or in different cities, states or countries. The workstations and computers are linked together by a computer network connection, such as the popular Ethernet connection. Such a network connection typically contains a backbone, or main data communication routing path (such as wire connections within a building), together with numerous communication branches or paths connected to the backbone. A branch path may have one or more workstations or personal computers connected to the backbone via switches, which serve to connect multiple workstations and to pass communications as necessary from the workstations to the backbone.
Given this common type of multiple processor network connection, the resulting speed of communication between any two processors within the network is dependent upon the location of the each processor within the network. Generally, the fastest inter-processor communication is between two processors located within the same workstation or personal computer. In contrast, communication is somewhat slower between two processors located in different workstations but connected by a single network switch. Further, communication is even slower between processors connected to different network switches. This is because communication from a processor on one switch must be routed out through its switch and over the backbone connection and back through a second switch to reach the second processor. Generally, communication between processors becomes relatively slower with an increasing number of devices physically interposed between the processors.
Regardless of the type and number of physical connections employed, the number of communications between processors that are required during the computational solution of a problem generally depends upon both the type and size of the overall problem. Also, when solving a large computational problem using parallel processing, it is often the case that a processor must communicate with some processors more so than with others.
In the prior art of distributed parallel processing, it is known to utilize a plurality of computer processors to solve a computational problem as the processors are physically connected together. That is, the individual problem segments or sub-domains are not logically distributed across the network of processors for solution. Instead, the individual problem segments are randomly distributed across the processors for solution. As a result, there is no selection made from among the fixed connections between the plurality of processors to optimize the speed of communication between the processors. Rarely does this type of random connection result in an optimal connection in terms of speed of processor communication. Basically, this type of connection scheme does not take into account the various factors that inherently reside in a physical network connection of processors. These factors could potentially optimize or provide for much faster communication between the processors as they are simultaneously solving the computational problem.
An object of the present invention is to optimize the speed of communication between a plurality of distributed computer processors connected together and operating in parallel to solve a complex numerical problem.
Another object of the present invention is to increase the overall problem solving speed of a plurality of interconnected, distributed computer processors each operating to solve a sub-domain of the overall problem.
According to the present invention, a method of selecting from among a plurality of distributed computer processors connected together in a network utilizes various groups of factors to determine how individual segments of an overall problem are distributed among the processors for solution. In a preferred embodiment, three groups of factors are used. The factors generally relate to the known, existing physical connections among the processors in the network. Several factors within one group specifically relate to the topology of the network. The method makes no attempt to change these physical connections. Instead, the method takes the physical connections (and, thus, the associated communication times) between processors as being fixed. All of the factors influence the resulting speed of communication between the processors when solving the overall problem. Also, the method does not affect how the overall problem is broken up into individual sub-domains, nor does it influence how the overall problem is ultimately solved by the individual processors. Instead, after the problem is segmented, the method distributes the sub-domains for solution by the individual processors to optimize the speed of communication between processors thereby reducing the total time required to solve the overall problem.
The three groups of factors include: (1) a listing of the distributed computer processors available to solve the overall problem; (2) the known communication requirements of the problem to be solved; and (3) a number of specific topology factors. The topology factors include: (a) whether any of the processors are located within the same computer or workstation; (b) whether any of the processors share a network switch; (c) whether any of the processors are located on the same sub-network within a larger network; (d) the speed of the individual network connections; and (e) any user-configurable groupings of the processors.
A combinatorial optimization technique, such as the well-known simulated annealing algorithm, then uses these groups of factors to determine the optimal arrangement of processors for an overall problem. In arriving at the optimal arrangement, the simulated annealing algorithm uses the topology factors directly in an equation, while it uses the other groups of factors indirectly. Simulated annealing is an iterative process that proposes different distributions of the sub-domains among the available processors until an arrangement is reached whereby optimal communication speed among processors is achieved. The overall method of the present invention is preferably implemented in software that is executed by a computer processor within one or more of the workstations or personal computers within the network.
The above and other objects and advantages of the present invention will become more readily apparent when the following description of a best mode embodiment of the present invention is read in conjunction with the accompanying drawings.