1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to a method for deterministic and pre-emptive thread scheduling. Still more particularly, the present invention relates to a method for deterministic and pre-emptive thread scheduling utilized to support cyclical debugging of multithreaded applications.
2. Description of the Related Art
The basic structure of a conventional computer system includes a system bus or a direct channel that connects one or more processors to input/output (I/O) devices (such as a display monitor, keyboard and mouse), a permanent memory device for storing the operating system and user programs, (such as a magnetic hard disk), and a temporary memory device that is utilized by the processors to carry out program instructions (such as random access memory or xe2x80x9cRAMxe2x80x9d).
When a user program runs on a computer, the computer""s operating system (OS) first loads the program files into system memory. The program files include data objects and instructions for handling the data and other parameters which may be input during program execution.
The operating system creates a process to run a user program. A process is a set of resources, including (but not limited to) values in RAM, process limits, permissions, registers, and at least one execution stream. Such an execution stream is commonly termed xe2x80x9cthread.xe2x80x9d The utilization of threads in operating systems and user applications is well known. Threads allow multiple execution paths within a single address space (the process context) to run concurrently on a processor. This xe2x80x9cmultithreadingxe2x80x9d increases throughput and modularity in multiprocessor and uniprocessor systems alike. For example if a thread must wait for the occurrence of an external event, then it stops and the computer processor executes another thread of the same or different computer program to optimize processor utilization. Multithreaded programs also can exploit the existence of multiple processors by running the application program in parallel. Parallel execution reduces response time and improves throughput in multiprocessor systems.
FIG. 1 illustrates multithreading in a uniprocessor computer system 10, which includes a bus 18 that connects one processor 12 to various I/O devices 14 and a memory device 16. Memory device 16 contains a set of thread context fields 20, one for each thread associated with a particular process. Each thread consists of a set of registers and an execution stack 24. The register values are loaded into the CPU registers when the thread executes. The values are saved back in memory when the thread is suspended. The code that the thread runs is determined by the contents of a program counter within the register set. The program counter typically points to an instruction within the code segment of the application program. Memory device 16 further contains all of the logical addresses for data and instructions utilized by the process, including the stacks of the various threads. After a thread is created and prior to termination, the thread will most likely utilize system resources to gain access to process context 22. Through the process context 22, process threads can share data and communicate with one another in a simple and straightforward manner.
Thread scheduling is an important aspect of implementing threads. In a first category, there are cooperative threads which do not rely on a scheduler and cooperate among themselves to share the CPU. Because of the complexities involved in programming such threads, and because they cannot react quickly to external events, cooperative threads are not utilized much in the art nowadays. In a second category are pre-emptive threads. Such threads rely on a scheduler that can decide to switch the CPU from one thread to another at any point during the execution. Pre-emptive threads react quickly to external events because the currently running thread could be pre-empted out of the CPU and another thread takes over to handle the emergency, if needed. Unlike cooperative threads, pre-emptive scheduling relieves the programmer from the burden of implementing the scheduling mechanism within the application program.
Pre-emption of a running thread can generally occur at any point during program execution. Typically, pre-emption occurs when a timer expires allowing the scheduler to intervene and switch the CPU among threads. In the art, this is referred to as xe2x80x9ctime slicingxe2x80x9d the CPU among threads, and each thread is said to run for a xe2x80x9ctime slice.xe2x80x9d This form of intervention allows the scheduler to implement various scheduling mechanisms, including round robin, priority scheduling, among others. Additionally, pre-emption could also occur in response to external events that may require the immediate attention of some thread.
However, when two copies of the same program run on two different machines, the timer interrupts do not preempt the threads at the same execution points because of the differences in the clock speeds of the two machines. This effect results in the threads potentially accessing shared memory in different orders on the two machines, yielding different results. FIG. 2 illustrates this effect.
FIG. 2 shows what occurs when the same set of threads 32 are run on two different processors, Processor A 36a and Processor B 36b, utilizing time slicing. Processor A 36a runs at one clock speed and executes the threads as shown in Output A 38a. Processor B, runs at a slightly different clock speed and executes the threads as shown in Output B 38b. The processors yield different outputs from threads running identical programs. The output connected to xe2x80x9cwrite Xxe2x80x9d will depend on which thread wrote to that variable last, and this is in turn varies depending on whether Processor A 36a or Processor B 36b is utilized. Thus, one xe2x80x9cwrite Xxe2x80x9d 39a yields a different output from xe2x80x9cwrite Xxe2x80x9d 39b. Similarly, the output of xe2x80x9cwrite Yxe2x80x9d differs between the two processors. Processor A 36a yields the value read during execution, while Processor B 36b yields an unknown value stored prior to the thread""s execution. Such inconsistencies are common when utilizing time slices as done in the current art, making the current art incapable of supporting cyclical debugging of multithreaded applications. Cyclical debugging requires that two runs of the same program produce the same outputs and go through the same execution paths. In the example we have shown in FIG. 2, the two different results may correspond to two different runs on the same processor making debugging very difficult.
The result of the application program often depends on the exact points within the execution streams where pre-emptions occur. Therefore, pre-emptive thread scheduling in a multithread program is a source of nondeterminism that affects its final results. If a multithreaded application runs twice on the same machine, the threads may access shared memory in different orders to yield different results. Thus, even if the program starts with the same input on the same machine, two runs of the program may yield different results.
One method of debugging is discussed in a 1989 article by Mellor-Crummey and LeBlanc. The article, xe2x80x9cA Software Instruction Counter,xe2x80x9d describes a method for debugging utilizing a software instruction counter; however, unlike this invention, the article does not rely on any particular scheduling support, and assumes the utilization of timers. Thus, in their technique they generate a log record each time a timer expires, making the log size impracticably large. This invention solves this problem via an innovative scheduling mechanism that can also support efficient debugging and general purpose preemptive and deterministic scheduling, when such scheduling is needed.
It would therefore be desirable and advantageous to provide an improved method of cyclical debugging by scheduling of threads in a way which would be deterministic and preemptive.
It is therefore one object of the present invention to provide an improved method and system of scheduling threads on a computer CPU.
It is another object of the present invention to provide such a method and system whereby said thread scheduling will be preemptive and deterministic.
It is yet another object of the present invention to provide such a method and system which does not depend on time slices, but utilizes a more accurate method of instruction slices so as to be able to reproduce execution of threads for utilization in debugging.
The foregoing objects are achieved as is described now in a computer system generally comprising a CPU connected to a memory which is comprised of a process context and a series of one or more threads. Additionally, the computer system includes an instruction counter. An instruction counter is a register that counts down by one each time a thread executes an instruction on the CPU. When the count reaches zero, the counter generates an interrupt that activates the scheduler. The counter can exist in hardware or can be emulated in software. In the disclosed embodiment, the scheduler allocates xe2x80x9cinstruction slicesxe2x80x9d on the CPU, such that each slice consists of executing N instructions (N is fixed). At the beginning of each instruction slice, the instruction counter is set to N. The thread then executes until one of several terminating events occurs, for example, it either executes N instructions, stops to wait for input, or blocks on a synchronization variable. When any of these events occurs, a new thread is scheduled, and its instruction slice executes until one of the terminating events occurs. The process is repeated until all of the threads are completed.
New threads may be admitted into the sequence. This is controlled by an admission control window (ACW) which permits the new threads to be appended to the queue every K instruction slices.
This support ensures a deterministic and pre-emptive scheduler that is repeatable across different application runs. This feature can be utilized to support cyclical debugging. This is true because the invention described herein forces the threads to access shared variables in the same order and eliminates all sources of nondeterminism from scheduling. A log is utilized by the debugger to record the events of thread creation, thread admission to the instruction queue, and input values. This enables practical debugging without the need for having the log record the state of the program each time a timer expires.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.