Improving process and design technologies have enabled many processors to be integrated together onto a same silicon integrated-circuit chip. The many processors may work together to accomplish complex processing tasks, such as examining and operating on Internet-Protocol (IP) packets that pass through a server, router, or other network device. Each processor may operate on a different IP packet, allowing a processing throughput of hundreds of IP packets in parallel at a same instant in time.
FIG. 1 is a block diagram of a packet processor. Such a packet processor has been developed and has been on-sale for more than one year by the Applicant and is marketed as the LANShield Processor I.
Packet-transfer memory 40 is on-chip and contains memory that stores incoming and outgoing packets from packet interface 20. Control interface 22 may also read and write data streams in packet-transfer memory 40.
The many processors are arranged into groups or clusters of processors known as tribes. Four multi-processor tribes 10 each have 32 processors 16 that access packets and scratch-pad data in packet-transfer memory 40. Thread controller 18 in each of multi-processor tribes 10 assigns processing loads among processors 16 within that tribe 10. Thread controller 18 receives new processing work from central packet-transfer controller 42, and sends the processing information such as pointers and initial register values to one of processors 16 to launch a new thread of the processing workload.
During thread execution, processors 16 may access local memory 34 through memory controller 32. Local memory 34 may be external DRAM or other kinds of memory for use by each of multi-processor tribes 10. Processors 16 may also access packet-transfer memory 40, such as to read headers of incoming IP packets that were received by packet interface 20 and initially written into packet-transfer memory 40.
Central packet-transfer controller 42 receives processor requests to access packet-transfer memory 40, and arbitrates among these requests, as well as requests from packet interface 20 and control interface 22. Other control functions such as ordering packets and control transfers to and from processors 16 through thread controllers 18 may be handled by central packet-transfer controller 42 or by other logic not shown. Active and sticky bits are used to detect when one of processors 16 is stalled for an unusual reason, such as executing a continuous loop of instructions, which might occur after executing defective program code or reading an illegal or out-of-bounds parameter value. Detection of a stuck processor 16 occurs when the active bit is still set, but the sticky bit, which is periodically cleared, is in the cleared state.
While useful, such stuck-processor detection may not detect some other kinds of starved or error conditions, such as occur when processors arbitrate for access to shared resources. For example, a processor may request access to packet-transfer memory 40, yet for some reason never be granted access to packet-transfer memory 40. The processor waits for an excessively long period of time for access to packet-transfer memory 40, perhaps due to an arbitration failure. Another processor may hold the lock to a certain location in memory, such as a semaphore, and not release the lock, preventing the current processor from being granted access to that memory location or semaphore. The current processor may be starved by another aggressive user of the same resource. It is still able to execute instructions, but not able to access the requested semaphore in the shared memory. The current processor may continue to execute instructions to poll the semaphore, yet not make forward progress on its true workload. The stuck-processor detection cannot determine which is the aggressor processor and which resource is the source of contention.
Another condition that cannot be detected is an error condition brought on by a processor that for some reason failed to release a lock before it deactivates itself. The stuck-processor detection would see this processor deactivate, and therefore is not stuck, and see other processors as stuck but provide no more information.
What is desired is a starvation detection system that can detect lock-outs from shared resources such as shared memory or semaphores, shared buses, system or global registers, descriptors, and shared I/O. A method to easily monitor arbitration status to shared resources for many processors is desirable. Simple but accurate detection of starved or locked-out resources is desirable in a multi-processor system.