1. Field
The embodiments relate to reducing overall latency in processing technologies, and more particularly to pragmatically truncating processes in a multi-fabric environment.
2. Description of the Related Art
With today's communication passing environments, such as parallel systems and dedicated switching networks, different types of protocols and devices can be combined. With combined types of devices and protocols, the combined device can have different latency for each device and protocol.
There are different types of standards that have been formed to try to simplify communication passing. One such standard is the message passing interface (MPI, see MPI: A Message-Passing Interface Standard, Message Passing Interface Forum, May 5, 1994; MPI-2: Extensions to the Message-Passing Interface, Message Passing Interface Forum, Jul. 18, 1997). MPI is a de facto standard for communication among the nodes running in a parallel program on a parallel system. MPI comprises a library of routines that can be called from programming languages, such as FORTRAN and C. MPI is portable and fast due to optimization on the platform it is to be run on.
In MPI implementation practice, it may be necessary to combine two or more MPI devices (e.g., lower MPI layers capable of dealing with, for example, only shared memory, or Transmission Control Protocol/Internet Protocol (TCP/IP), or direct access programming library (DAPL) connections) in order to obtain a multi-fabric device (for example, a device that would be able to work with the shared memory, TCP/IP and DAPL connections at the same time).
In order to accomplish processing of a multi-fabric device, most of the process is accomplished by either embedding or invoking the corresponding parts of the respective MPI devices in proper order in the upper layer device code. A problem with this, however, is the problem of getting the resulting multi-device to perform at least nearly as well as the constituent devices. This is particularly challenging when the characteristic latencies of the constituent devices broadly differ.
There are several ways of accommodating the latency difference from the varying devices. One way is to call the respective fabric progress processes adaptively—depending on the expected frequency and/or volume of the messages that the differing fabrics have to communicate.
Variations exist as to how the relative frequencies are to be initialized and tracked. The relative level of activity on the fabrics may change substantially during a typical application run, and there's no generally applicable solution. The same is true for a central processing unit (CPU) yielding. These techniques, however, are either cumbersome and prone to producing unpredictable results, or are inadequate.