The present invention relates to information processing system organizations, more particularly to the parallel execution of computer programs or jobs, and even more particularly to techniques for enabling the speculative execution of concurrent jobs in an information processing system.
The traditional electronic computer has a single processing unit, and operates in accordance with a model whereby program instructions are retrieved (xe2x80x9cfetchedxe2x80x9d) one-by-one from an addressable memory, and then executed. Instructions that are to be executed in sequence are typically stored at sequential address locations within the memory. Exceptions to this sequential storage of instructions often occur, as for example when execution of a program segment is made conditional on some condition to be tested (e.g., whether two values are equal to one another), or when execution of the present program segment is to be interrupted by execution of another program segment (e.g., in response to a subroutine call or an interrupt). In such cases, program execution may take what is called a xe2x80x9cbranchxe2x80x9d or xe2x80x9cjumpxe2x80x9d to another location, whereby the fetching of instructions continues not with the next sequentially stored instruction, but with one stored at some other location within the memory.
Regardless of how the instructions are stored, it is the expectation of the programmer that the instructions that constitute a particular job will be executed in a particular order. A consequence of this expectation is that variables will be operated upon (e.g., modified or tested) in a particular sequence. Failure to comply with this expectation can result in a job that generates error-laden results.
It continues to be a goal of computer architects to design systems that can complete more work in less time. One approach for doing this has concentrated on making processing elements that are capable of operating faster. This approach has no impact on the programmer""s expectation of sequential program execution.
Another approach to improving processing speed has been to devise processors that are capable of operating concurrently. For example, in a so-called xe2x80x9csuper-scalarxe2x80x9d processor, the elements within a single processor are organized in such a way so as to permit several instructions to be performed concurrently. Another way to provide concurrent execution of instructions (so called xe2x80x9cinstruction level parallelxe2x80x9d (ILP) processing) is to provide multiple processing units, each attached to a shared memory, and to allocate individual instructions of a single program to be run on different ones of the processing units.
In order to ensure that the programmer""s expectation of sequential program execution is carried out, these architectures need to deal with two types of dependencies: xe2x80x9ccontrol dependencyxe2x80x9d and xe2x80x9cdata dependencyxe2x80x9d. Control dependency refers to the dependency of instructions to be executed only as a function of whether a conditional branch or jump has been taken in a preceding instruction. Data dependency is a dependency of instructions that use data that is created or changed by earlier instructions. The later-specified instructions may correctly execute only if the earlier instructions using the same data do not change the common data or have completed the change of the common data.
Rather than holding up the execution of an instruction whose execution is in some way dependent on the results generated by another instruction, these architectures often turn to the speculative execution of an instruction. That is, an instruction is executed as if there were no control or data dependency. The results of such a speculatively executed instructions must be undone in the event that it is later discovered that the originally planned sequential execution of the instructions would have achieved different results. U.S. Pat. No. 5,781,752 describes an ILP architecture that employs a table based data speculation circuit.
In yet another approach to increasing overall processing speed, some computer systems achieve high processing performance through a computer architecture known as Symmetric Multi Processing (SMP). In contrast to the fine-grained parallelism achieved by the above-described ILP architectures, the SMP architecture exploits coarse-grained parallelism that is either explicitly specified in programs designed in accordance with concurrent programming principles, or extracted from programs designed for sequential execution on a single-processor system during compilation.
Coarse-grained parallelism means task-level parallelism as opposed to instruction-level parallelism (although the two types of parallelism are not mutually exclusivexe2x80x94different tasks could be assigned to separate processors which each then employ instruction-level parallelism to carry out their respective task). In an SMP architecture, each one of several rather self-contained and complex computing tasks is carried out on a respective one of several processors. These tasks are mutually concurrent processes, threads or other similar constructs well-known in the information processing arts.
In another computer architecture having multiple processors, further parallelism is extracted during program execution by creating different threads from a single program, and assigning several tasks to different processors for concurrent execution. Because they derive from the same program, these threads may have dependencies similar to those described above with respect to instruction level parallelism. In particular, it is important that the two or more threads maintain data consistencyxe2x80x94that is, that a thread intended for later execution not use a data variable that has yet to be updated by a thread intended for earlier execution, and that the thread intended for later execution not modify a data variable that will subsequently be accessed by a thread intended for earlier execution. The occurrence of either of these events is called a xe2x80x9ccollisionxe2x80x9d.
Because of the possibility of collisions, it is common to insert locks (semaphores) into the code in order to maintain data consistency. This prevents any collisions from happening. However, algorithms that extract parallelism and insert locks for this purpose must employ a very conservative strategy because they must guarantee that a collision never occurs. This has the drawback of limiting the amount of parallelism that can be extracted.
As another solution to the problem presented when threads that share a data memory space are concurrently executed, one may employ speculative execution. In speculative execution, a collision between threads is detected and the erroneous results of executed threads are undone or purged and threads are restarted in such a way as to guarantee progress (i.e., to guarantee that at least one of the restarted jobs will complete without a collision).
In one architecture, one of a number of parallel threads is designated as a xe2x80x9ccommitted threadxe2x80x9d. All other concurrently executed threads are referred to as xe2x80x9cspeculative threadsxe2x80x9d. The committed thread is a thread that would be executed earliest if execution were sequential. The committed thread stores its state directly in a main memory. (As used herein, the term xe2x80x9cstatexe2x80x9d refers to the execution results of a thread or job, such as memory updates, heap, stack, signaling and so forth.) Speculative threads however temporarily store their states not in the shared memory, but in a memory (or memory area) distinct from the shared memory.
Since the committed thread is the thread intended for the earliest execution if execution were sequential, and since the results of the execution of the speculative threads do not affect the shared memory, there is no question concerning accuracy of the result of the committed thread. When execution of the committed thread is complete, it is simply retired. No particular action is taken with regard to the memory because an accurate state of the committed thread is already part of the shared memory.
After retirement of the committed thread, another thread is designated as a new committed thread. Designating a thread as a new committed thread is called xe2x80x9ccommitting a threadxe2x80x9d. The order in which threads are committed is always maintained the same as the order in which threads would be executed if they were executed sequentially. Committing a thread is done provided that no collision is detected for the thread. When committing a thread that is speculatively executing (or has been speculatively executed), the temporarily stored memory states are copied to the shared memory.
If a speculative thread encounters a collision, the collision is resolved by purging the temporarily stored states of at least one or more speculatively executed threads, and executing them anew. Purging the temporarily stored states is also referred to as a xe2x80x9croll-backxe2x80x9d or xe2x80x9cflushxe2x80x9d.
Speculative execution in conjunction with detecting collisions, and rolling back state changes when necessary offers a high potential for extracting parallelism from a program. Good performance is achieved so long as collisions do not occur too often (i.e., so long as the overhead associated with performing roll-backs is not excessive).
The xe2x80x9cProgram Language for EXchangesxe2x80x9d (PLEX) programming-model by Telefonaktiebolaget LM Ericsson employs essentially non-preemptive scheduling. Each PLEX program is divided into multiple jobs. A job is the execution of a sequential program that is initiated by a scheduler in response to an event, and that uninterruptedly continues until it finishes without external intervention. An event may result from an externally generated request (such as by a telephony subscriber) or it may result from a request generated by another job. Several jobs are generally queued in the scheduler, and carried out in a first-come-first-served manner.
PLEX lends itself well to parallel processing. Jobs are simply scheduled on multiple processors by a scheduler. However, when PLEX programs that are designed for execution on a single processor are executed on multiple processors in parallel, dependencies may emerge because jobs operate on a shared memory.
According to another concept developed at Telefonaktiebolaget LM Ericsson, called xe2x80x9cJob Parallel Computerxe2x80x9d (JPC), dependencies between jobs executed in parallel are resolved through speculative execution. In JPC, one and only one job at a time is committed. States of the committed job are effectuated immediately in the shared memory during execution. If there is no dependency when execution of the committed job finishes, a speculatively executed job becomes committed as determined by the scheduler. States generated by the speculatively executed job being committed take effect in the shared memory only after the previously committed job finishes execution and the speculatively executed job becomes the new committed job.
In the event of a dependency, speculatively executed jobs are flushed and execution of the speculatively executed jobs is repeated. A strict scheduling order is always maintained.
Dedicated hardware is used for managing coarse-grained parallelism with speculative execution. The dedicated hardware includes a memory area for temporarily storing information from speculative execution of threads or jobs. When it is time to commit a speculatively executed job, the information is copied from the temporary storage area into the shared memory. The dedicated hardware further includes logic for dependency checking.
The existing approaches to enabling coarse-grained parallelism with speculative execution generally require dedicated hardware support in order to be efficient. However, it would be desirable to be able to benefit from the full potential of computer architectures implemented with standard processors. In particular, programs designed under the sequential programming paradigm have not previously benefitted from the coarse-grained parallel capabilities of a multiprocessor based computer that uses standard processors.
Moreover, even with dedicated hardware support, conventional techniques for implementing coarse-grained parallelism with speculative execution require quite a bit of resource and processing overhead in connection with dependency checking between the concurrently executed jobs. In particular, these techniques require the allocation of extra storage for every addressable data item that is shared by the concurrently executed jobs. This extra storage is used to keep track of which jobs have accessed the particular shared address, and what type of access was performed (i.e., read or write). In addition, a great deal of extra processing overhead is incurred by the need to perform a dependency check just prior to each and every attempt to access the shared memory. There is, therefore, a need for more efficient techniques for performing dependency checking between concurrently executed jobs that share a memory space.
It is therefore an object of the present invention to provide a technique for enabling coarse-grained execution of concurrent jobs that does not require special support hardware to handle speculative execution of jobs.
In accordance with one aspect of the present invention, the foregoing and other objects are achieved in a computer that performs dependency checking between two or more concurrently executed jobs that share a memory space. In some embodiments, this involves defining a first job and a second job, each having a set of shared individually addressable data items stored in a corresponding set of locations within a memory. The set of locations are partitioned into a set of data areas, wherein at least one of the data areas stores more than one of the data items. The first job and the second job are then run. To determine whether a collision has occurred between the first job and the second job, it is determined whether the first job accessed a same data area as was accessed by the second job, regardless of whether a same data item within the same data area was accessed by both the first job and the second job. By checking for conflicting accesses to a data area rather than to individual data items, the overhead associated with dependency checking can be greatly reduced.
In another aspect, a set of marker fields is created, each uniquely associated with a corresponding one of the data areas. For each of the data areas, a first subfield (e.g., a bit) is set in the associated marker field in response to the first job accessing any of the data stored within the data area. Also, for each of the data areas, a second subfield is set in the associated marker field in response to the second job accessing any of the data stored within the data area. These flags can be used to determine the occurrences of collisions between the first and second jobs.
In one class of embodiments, determining whether a collision has occurred between the first job and the second job comprises determining whether there exists a marker field having both the first subfield and the second subfield set. In this way, collision checking can be performed after the first and second jobs have accessed the data area.
In another class of embodiments, the step of determining whether a collision has occurred between the first job and the second job comprises determining that the first job is attempting to access one of the data items stored in a first data area; and determining whether the second subfield in the marker field associated with the first data area is set. In this way, collision checking can be performed dynamically as the first job""s access is being attempted, rather than waiting for both jobs to finish accessing the data area. This is particularly useful when a non-privatization strategy is adopted for maintaining speculative states.
In yet another aspect, for each of the data areas, setting a first subfield in the associated marker field in response to the first job accessing any of the data stored within the data area may be performed only in response to the first time the first job accesses any of the data stored within the data area.
In still another aspect, for each of the data areas, setting the first subfield in the associated marker field may be performed in response to a software trap instruction that is executed just prior to another program instruction that causes the first job to access one of the data items stored within the data area.
Alternatively, setting the first subfield in the associated marker field may be performed in response to a memory protect interrupt caused by the first job accessing any of the data items stored within the data area. In another aspect of this embodiment, a memory protect bit associated with the first job and the data area is reset after the first memory protect interrupt caused by the first job accessing any of the data items stored within the data area. This prevents further memory protect interrupts associated with this job and data area from reoccurring.
In another class of embodiments, at least one of the data areas is associated with a program block; and for said at least one of the data areas, the first subfield in the associated marker field is set in response to initiating program execution in the program block.
In yet another class of embodiments, at least one of the data areas is a page of the memory. In an alternative embodiment, at least one of the data areas is a data record defined by the first job and the second job, wherein the data record comprises a plurality of record variables. In yet another alternative embodiment, the first job and the second job are created by an object oriented programming language; and at least one of the data areas is a portion of the memory containing a method or an entire object that is part of the first job and the second job.
In another aspect, the step of determining whether the collision has occurred between the first job and the second job comprises determining whether the first job read from the same data area as was accessed by the second job, regardless of whether the same data item within the same data area was accessed by both the first job and the second job.
In still another aspect, determining whether the collision has occurred between the first job and the second job comprises determining whether the first job wrote to the same data area as was accessed by the second job, regardless of whether the same data item within the same data area was accessed by both the first job and the second job.