1. Field of the Invention
The present invention generally relates to bus hang prevention operations, and more particularly, to a bus hang prevention and recovery system and method for use in a multi-master bus system.
2. Description of Related Art
Digital communication over a communication channel is well known in the art. Modem data communication systems often have multiple high performance data processors and generally include a plurality of external devices interconnected by one or more various buses. For example, modem computer systems typically include a system processor coupled through a high bandwidth local expansion bus, such as the peripheral component interconnect (PCI) bus or the VESA (Video Electronics Standard Association) VL bus, to an external shared memory, peripheral devices, and other processors. Examples of devices which can be coupled to local expansion buses include SCSI adapters, network interface cards, video adapters, etc.
High performance bus architectures, such as the PCI bus architecture, provide a hardware mechanism for transferring large sequential groups of data between a peripheral controller""s local memory and a system processor""s shared memory via burst cycles. In many bus architectures, the maximum burst length is typically not defined.
Systems in which many devices share a common resource, typically utilize arrangements for allocating access to the resource under conditions during which a plurality of associated devices may concurrently request access. High performance systems have the potential to generate multiple independent requests for access to one or more external components, often via a single shared bus interface unit (BIU). Since multiple independent input/output (I/O) requests may appear at the BIU at any given time, the data communication system requires a shared bus arbitration scheme to determine the priority of the I/O requests for accessing the shared bus. In multi-master systems, where one or more data processors have the capability of becoming a bus master, the bus arbitration protocol determines which data processor becomes the bus master first. Typically, these multi-master systems employ an arbiter, external to the data processors, to control the shared bus arbitration, and each data processor requests access to an external shared memory or another external device from the arbiter.
In typical microprocessor systems the bus transports data among the processor and other components. The central processing unit (CPU) is usually the master of the bus, controlling the flow of data to and from the CPU and to the other components of the system, such as printers, memory, displays, and parallel and serial ports. Rather than have the CPU perform complex mathematical calculations, which is very slow, the data may be sent to the dedicated math co-processor where the calculations are performed, freeing the CPU to perform another task. Other masters in a multi-master arrangement may be used for ethernet control as part of a local area network (LAN), video controllers, or some other customized operation.
In a multi-master communication system a shared bus may become hung-up for various reasons. For example, a hang condition could happen due to an unrecognized address on a shared bus, when the system cannot abort the transfer or does not have the ability to ban the bus master from the shared bus. Sometimes a bus master does not give up the shared bus for a long time, thus causing other masters to be unable to proceed with a transfer in time. Other times a condition happening elsewhere in the system makes buffer space or data unavailable for an unacceptable amount of time, so that the bus becomes unusable.
If a bus hang condition occurs on a shared bus within a subsystem of a communication system with several subsystems, so that a transfer operation cannot be completed, it is possible that the entire subsystem will not be able to proceed any further. The subsystem processor may itself be unable to proceed (e.g. is presently attempting to read an address via the hung shared bus) and therefore cannot be used to recover from the hang condition. If the subsystem hang condition must be reset from an external source (i.e., from the system""s main computer via a bus external to the subsystem), the loss of information on either transferring data and/or error conditions may occur. It may also result in the subsystem being unable to interact with other subsystems while the recovery is taking place and/or during the time it takes for the external source to realize that a problem has occurred in the subsystem. This may in turn require further recovery efforts to become necessary. In other conventional systems, the entire subsystem has to be reset, via an external source. This not only causes the loss of error/recovery information but may cause additional problems with any other subsystem of the communication system, with which the subsystem getting reset is interfacing.
Therefore, there is a need for an improved hang prevention and recovery system and method, usable in high performance multi-master data communication systems with multiple shared external devices. This system and method should be able to prevent a permanent bus hang condition and allow recovery of the subsystem to a known state, so that the propagation of problems to other subsystems, which may otherwise cause severe consequences, can be avoided.
The foregoing and other objects, features, and advantages of the present invention will be apparent from the following detailed description of the preferred embodiments which makes reference to several drawing figures.
One preferred embodiment of the present invention is a shared bus hang prevention and recovery device usable in a multi-master data communication system. The system preferably has a plurality of bus masters and corresponding slaves. The hang prevention and recovery device is connected to a shared bus, and the shared bus is located between an external bus connected to a system processor, and an internal bus connected to an internal processor. Some masters are associated with the external bus and other masters are associated with the internal bus. One bus master on the internal bus is named a control master, associated with the internal processor.
The shared bus hang prevention and recovery device has a circuitry for timing each pending request of the control master for the shared bus, and control program instructions for monitoring and controlling the circuitry. The circuitry initiates bus recovery if the shared bus became hung up, when the control master exceeded a pre-determined time period allowed for waiting to acquire the shared bus control and complete the transfer on the shared bus. At the bus hung-up, the circuitry terminates the transfer in progress, causing the shared bus hang-up to be freed, and performs shared bus recovery. During the recovery the circuitry prevents bus request grants to the master attached to the external bus until the master is subsequently reset. Next, the circuitry initiates transfers for all pending requests for the shared bus from the control master queue, where each transfer is timed and terminated if the shared bus became hung up again. Upon the control master queue clearing, the control program instructions instruct the circuitry to reset and reinitialize all masters and slaves on the shared bus.
Another embodiment of the present invention is the method for shared bus hang prevention and recovery, corresponding to the device embodiment described above.
Yet another embodiment of the present invention is a shared bus multi-master data communication system which has bus hang prevention and recovery capability. The system includes a shared bus located between an external bus connected to a system processor, and an internal bus connected to an internal processor, and a plurality of bus masters and corresponding slaves connected to the shared bus. Some of the masters are associated with the external bus and other masters are associated with the internal bus. One of the bus masters is a control master associated with the internal processor. The system has a shared bus hang prevention and recovery device connected to the shared bus. The device includes a circuitry and control program instructions. The circuitry performs timing of each pending request of the control master for the shared bus and initiates bus recovery if the shared bus became hung up, when the control master exceeded a pre-determined time period allowed for waiting to acquire the shared bus control and complete the transfer on the shared bus.
The control program instructions monitor and control the circuitry and initialize termination of the transfer in progress causing the shared bus hang-up. During the bus recovery the circuitry prevents bus request grants to the master attached to the external bus until the master subsequent reset. The circuitry initiates transfers for all pending requests for the shared bus from the control master queue, and each transfer is timed and terminated if the shared bus became hung up again. Upon the control master queue clearing, the control program instructions instruct the circuitry to reset and reinitialize all masters and slaves on the shared bus.