1. Field of the Invention
The present invention relates to computer systems with multiple processing units. More particularly, the invention concerns a method for managing access to a shared resource among competing processing units.
2. Description of the Related Art
Today people are confronted with an astonishing amount of electronic information to manage. Such management involves transmitting, receiving, processing, and storing electronic data. To meet these challenges, many people choose computer systems with multiple processing units. These systems enjoy significant computing power by using separate computers, microprocessors, processing threads, or other types of processing. These processing units may also be known by terms such as processors, processing elements, etc.
One recurring challenge to systems with multiple processors involves the sharing of resources by the multiple processors. As one example, digital data storage such as magnetic xe2x80x9chardxe2x80x9d disk drive storage is often shared by multiple storage xe2x80x9cadapters.xe2x80x9d Sharing such a resource is challenging because of the difficulties in arbitrating access to the resource. At any given time, which processor should be permitted access to the shared resource? Should other processors be given limited concurrent access? This is further complicated by the need to plan for possible failure of a processor or communications between the processors.
One popular approach to sharing computer resources is called xe2x80x9cmutual exclusion,xe2x80x9d which is often applied at the device level. With this approach, processors access the resource one-at-a-time. While one processor is accessing the resource, all other processors are excluded from that device. Although this approach is attractive in its simplicity, shared computer resources often possess significantly more input/output (xe2x80x9cI/Oxe2x80x9d) capability than the processors that manage them. In this case, the full throughput of the shared resource is wasted when it is being used by one processor to the exclusion of the other processors.
In the case of storage resources, the system takes longer to store and retrieve data when the processors are confined by one-at-a-time access rules. This is undesirable, since slower data storage and retrieval are frustrating to most computer users. Furthermore, slow data access may be intolerable in certain data-critical applications, such as automated teller networks, airline reservation systems, stock brokerage, etc. Furthermore, the use of mutual exclusion is complicated by the possibility that a processor with exclusive access to the shared resource experiences a failure, causing a severe problem for the excluded processors.
To orchestrate mutual exclusion, competing processors must exchange messages of some type. A different set of problems is thus presented by the possibility that messages are lost while a device is reserved to one processor, causing a situation known as xe2x80x9clivelock.xe2x80x9d A further difficulty inherent to mutual exclusion schemes is the need to fairly allocate access to the shared resource among competing processors, the consequences of misallocation potentially including xe2x80x9cstarvationxe2x80x9d of the losing processor.
Consequently, known strategies for arbitrating processor access to shared resources are not completely adequate for some applications due to various unsolved problems.
Broadly, the present invention concerns a method and apparatus for managing access to a shared resource among competing processors. The invention includes features that are particularly optimized for environments with two xe2x80x9cprocessors,xe2x80x9d also referred to as processing units, processing elements, nodes, servers, computers, adapters, etc. The invention is applied in a system with multiple processors that commonly access a shared resource, such as a digital data storage. The processors receive and process access requests originating at one or more hosts.
Each processor separately stores a lock table, listing subparts of the shared resource, such as memory addresses, extents, logical devices, or an entire physical data storage device. The lock tables are stored in nonvolatile storage. In each lock table, each subpart of the shared resource is associated with a xe2x80x9cstatexe2x80x9d such as LOCAL or REMOTE. In response to access requests from the hosts, the processors exchange various messages to cooperatively elect a single processor to have exclusive access to the subparts involved in the access requests. After one processor is elected, the lock-holding processor configures its lock table to show the identified subpart in the LOCAL state, and all non-lock-holding processors configure their lock tables to show the identified subpart in the REMOTE state. Thus, rather than replicating one lock table for all processors, the processors separately maintain lock tables that are coordinated with each other. Importantly, each processor refrains from accessing a subpart of the shared resource unless the processor""s lock table indicates a LOCAL state for that subpart.
In one embodiment, optimized for the two processor environment, the messages exchanged by the processors include lock request, lock release, and lock grant messages. When a processor seeks access to a subpart, but its lock table indicates a REMOTE state for the lock, the other processor owns the lock. In this case, the first processor transmits a lock request to the other processor. The lock-holding processor enqueues the lock request. The lock-holding processor sequentially processes queued messages, and upon reaching the first processor""s lock request, the second processor takes steps to hand the lock to the first processor. In particular, the second processor configures its lock table to indicate the REMOTE state for the subpart, and then transmits a lock grant message back to the first processor. In response, the first processor configures its lock table to show the subpart in the LOCAL state, at which point the first processor is free to access the requested shared resource subpart.
To increase reliability of message exchange, each message may include a token, where the processors require matching tokens for corresponding messages, such as lock grant and lock release messages. Using tokens increases the system""s tolerance of lost messages, duplicated messages, misordered messages, communication faults, etc.
The subpart states may also include a FREE state, in which no processor holds a lock on that subpart. In this case, a requesting processor""s lock request message can be satisfied with a prompt lock grant from the other processor.
Accordingly, in one embodiment the invention may be implemented to provide a method to manage access to a shared resource among competing processors. In another embodiment, the invention may be implemented to provide an apparatus, such as an adapter or other processing unit of a system with multiple processors, programmed to participate in the management of shared resource access. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform method steps for managing access to a shared resource among competing processors.
The invention affords its users with a number of distinct advantages. First, the invention takes advantage of the high-throughput capability of shared resources by more efficiently sharing the resources. In the data storage environment, for example, the invention stores and retrieves data more quickly. Consequently, computer users are more pleased with their systems, since they are faster to use. The invention is especially beneficial for the common configuration where two adapters or other processors share access to a common resource.
Furthermore, the invention provides a number of desirable properties for a dual locking protocol. These include safety, liveness, fairness, and efficiency. Safety is provided because if a lock is in the LOCAL state at one adapter, then it is in the REMOTE state at the other adapter. Liveness is provided because the invention guarantees eventual progress in granting locks, since individual locks are eventually released (because of completion or timeout), and because frustrated processors make repeated requests for a lock. Fairness is provided because each processor makes eventual progress in obtaining a lock without xe2x80x9cstarvingxe2x80x9d the other adapter. Efficiency is provided because there is minimal overhead involved in maintaining the status quo when a lock-holding processor receives multiple local requests for a lock while the other processor receives none.
The invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.