The present invention relates to controlling interrupts in multiprocessor systems, and more particularly relates to providing control of interrupts to application programmers of multiprocessor systems.
A multiprocessor distributed memory machine processes typically executing on different processors communicate with each other by sending and receiving messages. Sending and Receiving messages is usually accomplished in one of two modes:
(a) Synchronous mode--in this mode the two communicating processes execute a send or a receive (recv) function. The data sent by the sender using the send function call is received by the receiver using the recv function call. PA1 (b) Asynchronous mode--in this mode the sending side sends a message using the send function call but the receiving side may not be expecting this message. The sent message needs to be handled by the receiver in a timely fashion to avoid losing messages or avoiding congestion in the network buffers where the message sits waiting for the receiver to pull it out from the network buffer. This is typically accomplished by setting up a mechanism such that the incoming message causes an interrupt to the system.
In IBM's implementation of Message Passing Interface (MPI), "synchronous mode" is also referred to as "polling mode", and "asynchronous mode" is also referred to as "interrupt mode". In polling mode, interrupts are essentially disabled. In interrupt mode, the sending node (origin) sends a message to the receiving node (target). When the message arrives at the target, an interrupt to the node is generated by the adapter on the target.
The flow chart for interrupt mode of operation at the target is shown in FIG. 1. The message from the origin is partitioned into fixed sized units called packets. All packets of a message are sent to the target. These packets arrive at the target at 10 (possibly out of order), and the adapter, to be discussed, copies the incoming data at 12 in the communication buffer (also called network buffer) area. The first packet of the message that arrives at the target causes an interrupt at 14 to be generated by the adapter to the system. The interrupt is received by the OS FLIH (operating system's first level interrupt handler) at 16. The operating system may be, for instance, the AIX operating system from IBM. As shown at 18, the FLIH decodes the interrupt to find the source of the interrupt. FLIH then calls the corresponding SLIH (second level interrupt handler). The SLIH is installed in the kernel by the MPI initialization process and is part of the adapter device driver. The SLIH in the device driver reads the interrupt mask on the adapter to determine from which adapter port (also called window) the interrupt was generated, shown at 20. The device driver looks up its tables to determine the PID (process id) of the job running in the user space window (assuming it was a user space interrupt), at 22. The device driver then sends a signal to the PID running user space job at 24, and then exits. The signal handler to field the signal sent by the device driver, is installed when the MPI library is initialized.
The operating system (AIX) scheduler marks the PID with the signal handler as a runnable entity and puts it in the queue of runnable processes at 26. The operating system dispatcher (AIX) eventually schedules the signal handler for execution at 28. The signal handler then receives the incoming data and absorbs it into the ongoing computation. As shown at 30, the handler reads the headers of the incoming packets and determines the number of packets in the message. The handler waits and receives all packets of the message, before enabling interrupts (by setting the appropriate threshold) before returning control to the application. Previous to the present invention the adapter generated only one interrupt for a particular threshold. This means that if two packets arrive at the target node and the interrupt threshold has not been changed since before the first packet arrived, only the first packet will cause an interrupt.
It is clear that the cost of an interrupt is very high. On the IBM RISC System/6000 Scalable POWERparallel (SP) system containing Power-2 (Model 591) wide nodes with the TB2/TB3 adapter between each node and the SP switch (SPS) the time from when the data reaches the target to the time the handler is invoked may be as high as 65 microseconds.
U.S. Pat. No. 5,265,215 issued Nov. 23, 1993 to Fukuda et al. for MULTIPROCESSOR SYSTEM AND INTERRUPT ARBITER THEREOF discloses a tightly coupled multiprocessor system in which I/O interrupts are distributed to respective processors in accordance with load conditions of the processors.
U.S. Pat. No. 5,359,730 issued Oct. 25, 1994 to Marron for METHOD OF OPERATING A DATA PROCESSING SYSTEM HAVING A DYNAMIC SOFTWARE UPDATE FACILITY discloses a dynamic software update facility in a data processing system. Pieces of software in large data processing systems are updated using interrupts without having to shutdown the whole system, thereby allowing other parts of the system to continue operation.
U.S. Pat. No. 5,495,615 issued Feb. 27, 1996 to Nizar et al. for MULTIPROCESSOR INTERRUPT CONTROLLER WITH REMOTE READING OF INTERRUPT CONTROL REGISTERS discloses a multiprocessor programmable interrupt controller system which has an interrupt bus distinct for the system bus for handling interrupt-related messages.
U.S. Pat. No. 5,561,809 issued Oct. 1, 1996 to Elko et al. for IN A MULTIPROCESSING SYSTEM HAVING A COUPLING FACILITY, COMMUNICATING MESSAGES BETWEEN THE PROCESSORS AND THE COUPLING FACILITY IN EITHER A SYNCHRONOUS OPERATION OR AN ASYNCHRONOUS OPERATION discloses a mechanism for communicating messages between processors in a multiprocessor system in either a synchronous or an asynchronous operation without using interrupts.
IBM Technical Disclosure Bulletin, Vol. 38, No. 02, February 1995 for PARALLELIZED MANAGEMENT OF ADVANCED PROGRAM-TO-PROGRAM COMMUNICATIONS/VM IN A SERVER SUPERSTRUCTURE discloses a method for letting multiple threads of execution running in parallel on multiple control processing units manage a set of Advanced Program-To-Program Communications/VM (APPC/VM) conversations wherein one instance of an interrupt handler is registered for each APPC/VM resource being managed.
IBM Technical Disclosure Bulletin, Vol. 38, No. 07, July 1995 for REDUCING CPU UTILIZATION BY CONTROLLING TRANSMIT COMPLETE INTERRUPTS discloses a method that significantly reduces the number of interrupts generated by LAN adapter, helping to alleviate system CPU utilization by delaying transmit complete interrupts.