Multithreading and multiprocessing are common programming techniques often used to maximize the efficiency of computer programs by providing a tool to permit concurrency or multitasking. Threads are ways for a computer program to be divided into multiple and distinct sequences of programming instructions where each sequence is treated as a single task and to be processed simultaneously. An application that may use the multithreaded programming technique is a packet-switched network application that processes network packets in a high speed packet-switched system concurrently.
To maintain and organize the different packets, a new thread may be created for each incoming packet. In a single processor environment, the processor may divide its time between different threads. In a multiprocessor environment, different threads may be processed on different processors. For example, the Intel™ IXA network processors (IXPs) have multiple microengines (MEs) processing network packets in parallel where each ME supports multiple threads.
The network performance in processing these packets depends on the time it requires to process a packet; the faster a packet can be processed the more efficient a switch is. The service time of a switch usually refers to the time between the arrival and the departure of a packet. When a packet arrives, a series of tasks such as the receipt of the packet, routing table look-up, and queuing can be performed by the new thread to service the packet. Resource access latency usually refers to the time delay between the instant when resource access such as memory access is initiated, and the instant when the accessed data in the resource is effective. For example, the time it takes to perform a routing table look-up is resource access latency. In many instances, the resource access latency in processing a packet takes up the majority of the service time.
In a multithread environment, a processor that is usually idle during resource access latency may be used to execute a different thread. The time the processor executes the different thread overlaps the time the processor executes the previous thread usually refers to as resource access latency overlapping or resource access latency hiding. Multiple threads may access the same resource concurrently if one thread does not depend on another thread. The following example demonstrates a dependency relationship between two instructions and resource access latency overlapping and hiding.
FIG. 1a depicts a sequence of programming instructions N1 to Nk+2. Instruction N1 loads the data, stores in memory location R2, into memory or register R1. After R1 is loaded with the data from memory location R2, instruction N1 asserts a signal s. Instructions N2 through Nk are independent from N1 because these instructions do not need the data from R1. Thus, they may be processed concurrently while N1 accesses the data from memory location R2.
The duration in which N1 loads the data may be referred to as the resource access latency 101. FIG. 1b is a diagram illustrating the execution of overlapping instructions. In this diagram, instruction N1 loads a data from memory location R2 into a register or memory R1 and sends signal s after the data is loaded. Concurrently, N2 through Nk are executed while N1 is executed. Instruction Nk+2 depends from N1 because Nk+2 needs the data from memory or register R1. Consequently, instruction Nk+1 waits 104 for the signal s from instruction N1 and blocks all the subsequent executions until the wait instruction is satisfied when the signal s is detected. Because instruction N1 only asserts a signal s when the instruction finishes loading the data from memory location R2 at 103, Nk+2 is not executed until the signal s is cleared at 102. Subsequently, instruction Nk+21 uses R1 in its execution at 105.
The instructions listed in FIG. 1a may be run in a multithreaded environment where each thread handles one instruction. In such scenario, threads communicate to other threads through shared resources such as global memory, registers, or signals. For example, signal s, and registers R1 . . . R5 are the shared resources and are accessible by instructions N1 . . . Nk+2. In many instances, the shared recourse may only be accessed by one thread, for example, until instruction N1 asserts a signal s, no instructions may be executed before instruction Nk+2. This duration usually refers to a critical section because instructions are executed in a mutually exclusive manner. A critical section may also be defined in terms of a program where a computer programmer marks a part of the program as the critical section. For example, a critical section may begin before instruction Nk+1, when it waits for signal s, and ends after the assertion of signal s.
A conventional method to implement a critical section is to use an entry and an exit protocol. For example, a token or a signal may be used to permit the entering or to indicate the exiting of a critical section. An example of the token or signal based critical section is illustrated in FIG. 2 where a thread 202 waits for a token or signal 204 from a previous thread 201. After accessing its critical section, the thread 202 then passes another token or signal 205 to a thread 203. Before the thread 203 receives the token or signal 205, the thread 202 has exclusive access to a shared resource 210.
In a situation where an instruction blocks all subsequent executions, such as the wait instruction Nk+1 in FIG. 1a, is included in a critical section, the critical section becomes longer than it is necessary. The critical section is longer because the wait instruction already blocks all the subsequent executions, a critical section may not be needed to ensure the exclusivity in accessing a shared resource.