High performance computing (HPC) tasks typically require large amounts of computing power for relatively short periods of time, and HPC systems typically deal with tightly coupled parallel jobs. As such, HPC systems typically must execute within a particular site with low-latency interconnects.
High performance computing generally refers to the practice of aggregating computing power in a manner that delivers very high performance relative to the performance of a typical desktop computer or workstation. HPC systems (e.g., HPC systems that use supercomputers to achieve the large amounts of required computing power) typically: (i) are large installations; (ii) include a plurality of sub-systems (e.g., server computer nodes); and (iii) use different server computer nodes to respectively accomplish different “roles.” Some examples of “roles” within an HPC system include storage, management, login, compute, and bridge.
HPC systems have a prevalent role in the field of computational science, and are utilized for a wide array of computationally intensive tasks in a plurality of subject areas, which include quantum mechanics, weather forecasting, molecular modeling, climate research, cryptanalysis, and physical simulation. An HPC system can be composed of a multitude of server racks, each rack including a multitude of server and system nodes, and each server or system node including a multitude of computer components (e.g., computer processors, computer memory, I/O devices, storage, etc.).
High Throughput Computing (HTC) tasks also require large amounts of computing, but for much longer times, typically months and years, rather than hours and days. A primary interest in the field of HTC is how many jobs can be completed over a long period of time instead of how fast an individual job can complete. Many-task computing (MTC) is aimed at bridging the gap between the two computing paradigms of HTC and HPC. MTC is similar to HTC, but differs in the emphasis of utilizing many computing resources over short periods of time to accomplish many computational tasks (i.e. including both dependent and independent tasks), where the primary metrics are measured in seconds (e.g., Floating-point operations per second (FLOPS), tasks/s, Megabytes (MB)/s I/O rates), as opposed to operations (e.g., jobs) per month.