In complex software environments, it is often helpful to program software for a particular task or application as a plurality of software state machines that interact by passing messages so as to simplify programming of the application. A state machine is a software component that maintains data defining its state between messages from other state machines, and whose behavior at any given time in response to incoming messages from other state machines depends on its state. An example of a data processing application that may be programmed as message-passing state machines is a transaction processing system, such as an order processing system for an on-line bookstore. The state machines in this example may implement individual transactions (e.g., a single customer's book order) or parts of a transaction (e.g., updating inventory, scheduling shipment, processing a credit card payment, etc.). The individual state machines that implement small parts of the overall order processing task in the system simplify programming the overall application in that the programmer simply defines the various states that the individual state machines can assume, and the behavior of the state machines in response to input messages in those states.
In large volume data processing systems, a large number of state machines may need to be active at any time in order to handle processing demand. For example, the on-line bookstore application may need to process tens, hundreds or more orders from customers at a time. Where parts of the processing task for each order are implemented in different state machines, each of the orders may require multiple state machines operating concurrently. To handle this processing demand, the state machines are executed in a multi-tasking or multi-threaded operating environment. Because implementing each of hundreds of active state machines in such a system as a separate process having its own thread or threads of execution would be prohibitively expensive in terms of computing resources (i.e., memory, processor time, etc.), the state machines preferably execute in an operating environment that provides a set of worker threads that are not exclusively assigned to any one state machine, but rather are scheduled to execute in the state machines as needed.
The terms "process" and "thread" both refer to an execution context or sequential stream of instruction execution. A process refers to an execution context associated with an entire program that is allocated resources such as memory, processor time, and files. A thread, also called a lightweight process, refers to an execution context that is independently scheduled, but shares a single address space with other threads. (See, Tucker Jr., Allen B. (editor), The Computer Science and Engineering Handbook, pp. 1662-1665, CRC Press 1997.)
Multi-threaded operating environments provide necessary concurrency for meeting large volume data processing demands, but also can lead to well-known synchronization problems. Particularly, multiple threads that access a common shared memory region can interfere with each other and cause data corruption. This data corruption or loss may be due to the thread scheduler pre-empting execution of a first thread while the first thread is modifying a data structure or otherwise executing code in a shared region of memory, and then scheduling execution of another thread that also modifies the data structure or executes the code while the first thread is pre-empted. For example, when two threads each try to add an element to a doubly linked list at the same time, one or the other element may be lost, or the list could be left in an inconsistent state.
Another synchronization problem arises in circumstances where code executed by a first thread utilizes a result produced in code executed by a second thread. In this circumstance, the first thread is said to have a data dependency on the second thread. For correctness, the first thread must halt execution and wait until the result is available (also known as "blocking").
There are several well-known mechanisms to synchronize multiple threads so as to ensure that only a single thread executes in a particular block of code or accesses a data structure at one time One such mechanism is known as a "lock." A lock is associated with a particular block of code (called a "critical section") that only one thread at a time is allowed to execute. Before a thread can execute the critical section, the thread must first acquire the lock. If the lock already is held by another thread, the thread requesting the lock blocks until the lock is released. A programmer can restrict access by threads to a data structure by placing any code that operates on the data structure within a critical section The lock thus serves to serialize access by threads to a block of code or data structures. In other words, the lock isolates code or data structures from concurrency (i.e., provides concurrency isolation).
Another synchronization mechanism, known as the "condition variable," is used for resolving data dependencies between threads. The condition variable is a mechanism by which a thread blocks until an arbitrary condition has been satisfied Another thread that makes the condition true is responsible for unblocking the waiting thread. Two special forms of condition variable are a "barrier" and a "join." A "barrier" is a form of condition variable which operates to synchronize a set of threads at a specific point in the program. The arbitrary condition of the barrier therefore is whether all threads have reached the barrier. When the final thread reaches the barrier, it satisfies the condition and unblocks all the waiting threads (known as "raising" the barrier).
In a join, a first thread spawns (i.e., creates another thread) to execute a procedure and proceeds with other processing work in parallel. Later, when the result of the procedure is needed, the first thread blocks until the spawned thread completes the procedure, returns its result, and unblocks the first thread. The condition for a join therefore is whether a given thread has finished.
A problem with these existing concurrency isolation mechanisms is that they generally require explicit programming in application program code. In the case of locks for example, the programmer must write instructions into the application program's code to acquire a lock before entering a specific critical section, and to release the lock when execution in the critical section is complete. This greatly complicates the programming task.
The present invention provides a form of concurrency isolation and thread management in a programming environment having a plurality of state machines communicating via messages sent on connections. In an illustrated embodiment of the invention, each connection is a communications path that preferably provides full-duplex, exactly-once, in-order delivery of messages between exactly two state machines. Each state machine communicates using one or more connections to other state machines.
In accordance with the invention. connections to a state machine or set of state machines with private data in shared memory or other need for concurrency isolation are grouped into collections that are herein termed "cliques." A clique is defined as a collection of connections that deliver a single message at a time in a serialized fashion. A system or method (e.g., a connection manager in the illustrated embodiment) that manages message delivery to the cliques ensures serialized delivery of messages to each clique. As further discussed below, the connection manager also ensures that only a single thread at a time execute in the state machines within a clique.
With each connection to a state machine grouped into one clique, the connection manager effectively provides concurrency isolation by ensuring that a state machine (or group of state machines) within its "sphere of control" never receives an incoming message on any of its connections while processing a previous message and that only one thread executes in the state machine or group at a time By enforcing these restrictions, the programmer need not implement locks and critical sections, or other form of concurrency isolation when programming individual state machines. More specifically, the state machines can be programmed as purely sequential code blocks that simply awaken, accept an incoming message, process the message, generate zero or more outgoing messages on any of its connections, and return to dormancy. The programmer need not be concerned about the state machine dealing with more than one message at a time or having more than one thread executing in the state machine at a time. In other words, the state machine is freed from the responsibility of locking access to its private code and data
According to another aspect of the invention, the cliques and connection manager also promote efficient use of system threads in executing the state machines. In the connection manager of the illustrated embodiment, a thread that delivers an incoming message on a connection must first check whether the clique that contains the connection is busy. If not busy, the clique is marked as busy, and delivery and processing of the message in the state machine proceeds using the thread. On the other hand, if the clique already is busy, the thread instead places the message on a queue for later delivery to the clique and continues with other processing (e.g., attempting delivery of another message). After any thread completes processing a message in a state machine, the thread checks whether any messages have been queued for delivery to that clique by other threads, and also delivers and processes each of the queued messages in turn. When the thread completes processing a message in a state machine of the clique and no further messages have been queued by other threads for delivery to the same clique, the thread is free to move on to other work. In this way, the connection manager ensures that only a single thread executes in any state machine whose connections are grouped in a clique at a time.
The use of one thread to deliver and process all messages that are pending for a state machine or group of state machines with shared memory also reduces thread contention or blocking, and saves a thread switch. In particular, when a state machine generates an outgoing message during processing of an incoming message by a thread, the outgoing message may result in an immediate second incoming message to the state machine, such as a "buffer filled" message. The second incoming message becomes queued while the thread completes processing of the first incoming message. Upon completing processing the first incoming message, the same thread picks up and delivers the queued, second incoming message. In the absence of cliques, the thread that generates an outgoing message would not be used to deliver more incoming messages, such as the "buffer filled" message, because a dead lock condition may result. To avoid a possible deadlock without use of cliques, the additional incoming messages would have to be delivered using another thread. Cliques thus also help to minimize thread processing overhead.
Additional features and advantages of the invention will be made apparent from the following detailed description of an illustrated embodiment which proceeds with reference to the accompanying drawings.