Computer applications having concurrent threads executed on multiple processing systems (such as multiple processors, multiple processor cores, or other forms or parallelism) present great promise for increased performance but also present great challenges to developers. The growth of raw sequential processing power has flattened as processor manufacturers have reached roadblocks in providing significant increases to processor clock frequency. Processors continue to evolve, but the current focus for improving processing power is to provide multiple processor cores on a single die to increase processor throughput. Sequential applications, which have previously benefited from increased clock speed, obtain significantly less scaling as the number of processing systems increase. In order to take advantage of multiple processing systems, concurrent (or parallel) applications are written to include concurrent threads distributed over the processing systems.
A process includes one or more threads and the code, data, and other resources of a program in memory. Typical program resources are open files, semaphores, and dynamically allocated memory. A thread is basically a path of execution through a program. A thread typically includes a stack, the state of the processor registers, and an entry in the execution list of the system scheduler. Each thread shares resources of the process. A program executes when the system scheduler gives one of its threads execution control. The scheduler determines which threads will run and when they will run. Threads of lower priority might have to wait while higher priority threads complete their tasks. On multiprocessor machines, the scheduler can move individual threads to different processors to balance the workload. Each thread in a process operates independently. Unless the threads are made visible to each other, the threads execute individually and are unaware of the other threads in a process. Threads sharing common resources, however, coordinate their work by using semaphores or another method of inter-process communication.
Thread Local Storage (TLS) is a method by which each thread in a given multithreaded process can allocate locations in which to store thread-specific data and uses static or global memory local to a thread. Typically all threads in a process share the same address space, which is sometimes undesirable. Data in a static or global variable is typically located at the same memory location, when referred to by threads from the same process. Variables on the stack are local to threads, because each thread has its own stack, residing in a different memory location. Sometimes it is desirable that two threads referring to the same static or global variable are actually referring to different memory locations, thereby making the variable thread-local. If a memory address sized variable can be made thread-local, arbitrarily sized memory blocks can be made thread-local by allocating such a memory block and storing the memory address of that block in a thread-local variable.