A multiple processor computer apparatus, such as a supercomputer, is typically used in a wide variety of applications that require massive amounts of computation. Examples of such applications include shock physics, radiation transport, materials aging and design, computational fluid dynamics, structural dynamics, etc.
Historically, the performance of supercomputers has been measured in a number of ways, including by peak floating-point operations per second, by simple benchmarks such as MPLINPACK, and by complex physical simulations. The best conventional supercomputers have achieved 70-75% of peak performance on the MPLINPACK benchmark. However, for many complex simulation codes, the performance is only 10-20% of peak for a single processor and can be as low as one or two percent when parallel efficiency is considered. The performance, as measured against peak, for complex simulation codes has been declining in recent supercomputing generations. This trend seems to be continuing in the newest supercomputers.
One area of computer hardware design that has contributed significantly to this trend is the machine interconnect structure. Interconnect hardware development has severely lagged behind the pace of increasing processor performance. The shift from tightly coupled Massively Parallel Processor (MPP) designs such as the Intel ASCI Red and Cray T3E designs, to clusters that use I/O buses for interconnect connections, has resulted in not only a relative reduction in interconnect performance, but also in an absolute reduction. At the same time, processor performance has been increasing rapidly. This combination has resulted in growing performance imbalance in large parallel computer systems. Also, the size of machines in terms of the number of processors has been increasing, putting even more stress on interconnect performance. The result has been poor scalability compared to that achieved on earlier generations of tightly coupled MPPs, and poor overall efficiency of computer systems.
Another factor that is having a negative impact on performance is the poor scalability of the operating system and operating system services such as compute processor allocation, job loading, internal communication, network communication, file management and file I/O.
Many users will typically utilize a supercomputer to perform a wide variety of applications, including the examples given above. Some of these applications may include classified information that can only be made available to a limited number of users, and must not be made available to all users of the supercomputer. Accordingly, some type of partitioning mechanism is necessary to separate classified applications from unclassified users. Although it is necessary to partition unclassified users from classified applications, it is nevertheless desirable to effectuate this partitioning with a minimum amount of inconvenience to the unclassified users. This challenge of providing classified/unclassified partitioning, while also minimizing the inconvenience to unclassified users has been a problem in conventional systems.
It is desirable in view of the foregoing to provide for a multiple processor computing apparatus which can avoid the various difficulties described above.
Exemplary embodiments of the invention provide a compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus. The compute processor allocator architecture is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. In some embodiments, the compute processor allocators can share a common database of information pertinent to compute processor allocation. In some embodiments, a communication path permits retrieval of information from the database independently of the compute processor allocators.