While the invention is generic in nature and capable of use with a large variety of multi-threaded processor systems, it will be described in conjunction with a multi-threaded processor system such as the IBM Part No. IBM32NPR161EPXCAE133 Network Processor which employs a plurality of processors or threads each of which concurrently process data frames which may be from the same or different data flows. The individual threads/processors share common resources in the network processor. Semaphores defined to be associated with specific resources are used to allocate the specific resources to the individual threads as requested.
Within such a network processor several data frames are processed at the same time. Each data frame is processed by one processor/thread. Each processor/thread operates independently from all the other processors/threads. Thus, as the software (picocode) processes a data frame, the software has no knowledge of other frames which have been, are being, or will be processed. As data frames are processed, a thread may need access to a shared resource. This shared resource is shared among all threads. To allow a thread access to the resource without interference from other threads, semaphores are used. A semaphore is a mechanism which allows a processor/thread to use a resource without interference from another processor/thread. Semaphores exist in almost every multi-processor environment where multiple processors can access common resources. A semaphore is used to ensure that one and only one processor/thread has “ownership” or use of a given resource at any given time.
A network processor is a multi-processor environment with resources which can be accessed by all processors/threads. Thus, semaphores are an intricate part of network processors. As discussed above, network processors process data frames which belong to one or more data flows. Traditionally, semaphores are implemented in software using “read modify write” or “test and set” instructions. When these instructions are used as a basis to create and allocate semaphores, valuable system resources must be used. To implement a semaphore, system memory must be used. To access a semaphore, several lines of code must be executed. If these system resources were not used for semaphore implementation, they could be used for other functions or provide a performance increase by not executing extra line(s) of code.
When semaphores are implemented in software, several lines of code must be executed to access and lock the semaphore, impacting performance. If the semaphore is unavailable (locked by another thread/processor), the software would need to poll on the semaphore. This would waste valuable bandwidth on the arbitrated memory holding semaphore locks to be accessed by all threads/processors. To implement a fair semaphore access in software requires more system memory and lines of code. For example, if a semaphore is locked, the thread/processor would need to put itself in a queue waiting for access. This queue would be implemented in system memory and require software management impacting performance. This allows threads/processors to have fair access to resources.
In a software semaphore environment, multiple threads/processors cannot unlock their respective semaphores at the same time. Typically, all the semaphores are in the same system memory. Each thread/processor must arbitrate to access the memory to unlock their semaphore. This may add to the processing time of other threads/processors waiting to access the same memory to access the semaphore locks. The same is true for locking semaphores. When semaphores are implemented in software, only one semaphore can be unlocked/locked at a time since all the semaphores reside in a common area of system memory.
In the IBM Network Processor System identified above a device termed Completion Unit monitors the order in which frames or packets in a flow are processed by the threads or Dyadic Protocol Processor Units (DPPUs) and generates information used by a semaphore sub-system to control the order in which semaphores are assigned. Such systems require ordered semaphores which must perform two functions. First, the well known semaphore function, ensure that one and only one processor/thread has access to a single resource at any time. And second, ordered semaphores must ensure that the processors/threads which are processing data frames of the same data flow access the common resource in frame order, for example, an e-mail message which must be encrypted using an encryption co-processor shared among all of the processors/threads. The encryption of the data frames must occur in order to properly encrypt the message. The software would use an ordered semaphore mechanism to access the encryption co-processor.
This would ensure two things. First, only one processor/thread accesses the co-processor at a time. And second, the encryption of the data frames of the data flow (e-mail message) occurs in order. Ordered Semaphores are needed since processing time for each data frame can be different. Data frames from the same data flow will take different amounts of time to process. For example, tree searches for each data frame can take different amounts of time. Threads which share a common ALU will stall occasionally to allow the other thread to process data. Thus, frames in the same data flow being processed by different threads will attempt to access a shared resource at different times and not in data flow order. Because of this, ordered semaphores are required to ensure the shared resource is accessed in data flow order.
The Completion Unit logic block contains all the information required to put processed data frames (received from processors/threads) back in the correct order for each data flow. USPTO applications Ser. No. 09/479,028 filed Jan. 7, 2000, now issued as U.S. Pat. No. 6,633,920 and Ser. No. 09/548,906 filed Apr. 13, 2000, now issued as U.S. Pat. No. 6,977,928 incorporated herein by reference describe how the Completion Unit performs this function. Within the completion unit, linked lists of the data frames assigned to processors/threads represent the data frame order of the data flows. One linked list exists for each data flow which currently has a data frame being processed by a processor/thread. The head of the linked list is associated with a processor/thread. It is from this processor/thread that the next processed data frame is to be taken from and sent out onto the network. When the processed data frame is sent, the head of the linked list is removed and the next element of the linked list is examined; see the referenced applications for details.