1. Field
This disclosure relates generally to compiling technologies in a computing system, and more specifically but not exclusively, to code optimization techniques.
2. Description
Multithreading and multiprocessing are common programming techniques often used to maximize the efficiency of computer programs by providing a tool to permit concurrency or multitasking. Threads are ways for a computer program to be divided into multiple and distinct sequences of programming instructions where each sequence is treated as a single task and to be processed simultaneously.
One example application that may use the multithreaded programming technique is a packet-switched network application that processes network packets in a high speed packet-switched system concurrently. To maintain and organize the different packets, a new thread may be created for each incoming packet. In a single processor environment, the processor may divide its time between different threads. In a multiprocessor environment, different threads may be processed on different processors. For example, the Intel® IXA™ network processors (IXPs) have multiple microengines (MEs) processing network packets in parallel where each ME supports multiple threads.
In such a parallel programming paradigm, accesses to shared resources, including shared memory, global variables, shared pipes, and so on, are typically be protected by critical sections to ensure mutual exclusiveness and synchronizations between threads. Normally, critical sections are created by using a signal mechanism in a multiprocessor system. A signal may be used to permit entering or to indicate exiting of a critical section. For instance, in an Intel® IXP™, packets are distributed to a chain of threads in order (i.e., an earlier thread in the chain processes an earlier packet). Each thread waits for a signal from the previous thread before entering the critical section. After the signal is received, the thread executes the critical section code exclusively. Once this thread is done, it sends the signal to the next thread after leaving the critical section.
Due to the cost of hardware, the number of signals that can be used for critical sections is limited by the scale of processing element in a computing system. In order for the signal resource to be used more effectively, critical section merge is typically performed by a compiler when optimizing a code. On the other hand, the size of a critical section also affects the performance of a programming code. Typically the larger a critical section is, the longer the shared resource access latency is. Additionally, a small sized critical section is normally easier to be hidden by technologies such as multithreading than a large-sized critical section. Hence, a compiler also performs critical section minimization in addition to critical section merge when optimizing a code. Code motion techniques may be used to at least partly merge critical sections and reduce sizes of critical sections. To merge critical sections, it is desirable to first determine the order of critical sections since the order of critical sections may be different across different traces.