In modern computer systems it is increasingly common for the software processes (SW) running on the processor or processors to use multiple threads of executions. These threads of executions may demonstrate a perceived concurrency, when multiple threads are run on a same processor, or a true concurrency when there are multiple processors in the computer system.
Additionally, it is increasingly common in modern computer systems to have sophisticated hardware resources (HW), such as direct memory access (DMA) channels; peripherals with internal buffering or internal DMA engines; and media processing elements that work from a private memory area, internal buffers, or that use DMA. Some hardware resources also may have multiple channels, thus increasing the number of effective hardware resources that can be running simultaneously and that also need to be managed. [Since a multiple channel device is effectively usable by a computer system as multiple resources, we herein use the label “hardware resource” for each channel to avoid confusion with “hardware device.”]
The hardware resources in a computer system are often desirably made asynchronous, so that the processor or processors managing them can program each for a desired task, continue with something else, and later come back and get the result to the earlier programmed task. This creates true parallelism.
When a hardware resource is used by multiple threads of execution, regardless of the nature of the concurrency of the threads, some mechanism must be available to make sure that only one thread of execution at a time uses the hardware resource. This is necessary to secure the correct operation of the hardware resource and to insure that the results obtained by the threads of execution are correct. This is where the prior art approaches are still wanting, because they are based on software locks (e.g., software mutual exclusion locks, often termed a “mutex”).
FIGS. 1a-b (background art) are block diagrams stylistically showing examples of how a modern computer can be a very complex system. FIG. 1a shows a simpler computer system having a single main processor, and FIG. 1b shows a more complex computer system having multiple main processors. Under the control of an operating system (OS), each processor can potentially run multiple software processes (not directly shown here). For present purposes we are not especially concerned with the software processes at a high level, but rather with basic portions of them that we term “threads of execution” (TOE). Threaded program execution is covered in many excellent texts on the computing arts, so we merely note here that a single software process may have a single thread, or multiple threads, and that all of these may be in competition for resources within the overall computer system. In particular, the threads of execution in a computer system may need to asynchronously share one or more hardware resources (HWR).
Turning now to FIG. 1a, TOE #2 represents the simplest case for a thread of execution. It is not using and it is not waiting for any hardware resources. Similarly, HWR #1 represents a simple case. No thread of execution is using it and none are waiting to use it. TOE #1 and HWR #2 represent a slightly more complex case. TOE #1 is using HWR #2, and no other threads of execution are waiting to use it.
Things rarely stay as simple as just described. For example, what if TOE #1 is using HWR #2 and TOE #2 needs to use HWR #2 too? In a conventional computer system the operating system manages access to the hardware resources with software locks. A dialog like the following can take place:
(1) TOE #1: I need to use HWR #2.
(2) OS: (After checking its locks & updating them) OK TOE #1, go ahead.
(3) TOE #2: I need to use HWR #2.
(4) OS: (After checking its locks) No TOE #2, wait.
(5) TOE #1: I am finished with HWR #2.
(6) OS: (After updating its locks) OK TOE #2, go ahead.
. . .
This dialog is simplistic and does not cover all possible cases. For instance, what if step (4) never occurs, say, because TOE #1 crashes or is poorly programmed? The operating system has to also handle this.
Turning now to FIG. 1b, several more complex scenarios are depicted there, ones that are all increasingly common in modern computer systems. For example, in addition to multiple processors, these may be running different operating systems (e.g., the Windows™ operating system in Processor #1 and the Linux operating system in Processor #2). As also represented in FIG. 1b, and discussed presently, the hardware resources can themselves have sophisticated features that should also be considered.
In FIG. 1b a scenario is shown that is similar to the one underlying the dialog described above. Here TOE #2 is using HWR #3 and TOE #3 also wants to use it. For instance, say, HWR #3 is a printer and TOE #2 using it to print text characters, whereas HWR #3 wants to use it to print an image.
Again, the conventional approach is to have the operating systems controlling the respective threads of execution manage such contention for the hardware resources with a scheme of software locks. Typically, such a software lock is implemented as a mutual exclusion object (often termed a “mutex”).
A mutex is a program object that allows multiple threads of execution to share the same hardware resource, such as file or printer access, just not simultaneously. When a thread of execution is started, a uniquely named mutex for each shared hardware resource is created and managed by the operating system (or systems). Any thread of execution that needs the resource then locks the mutex while it is using the hardware resource, thus preventing possible interference from other threads of execution. The mutex is unlocked when the hardware resource is no longer needed or when the thread of execution using it is finished.
Continuing with FIG. 1b, TOE #4 here represents a case where a serious problem can occur. If a simple conflict resolution mechanism is used to handle the situation just described for TOE #2 and TOE #3, TOE #4 can end up stalled waiting for TOE #3 to get its turn to use HWR #3. Most sophisticated operating systems today have mechanisms to avoid this scenario, albeit ones that may not be as efficient as desired. But not all computer systems use such operating systems, and in many applications there is no other reason for using such a sophisticated operating system and incurring the inherent burdens that go with that. Accordingly, there remains a need today for an inherently less burdensome mechanism for accessing hardware resources in computer systems. That is, one that does not unduly require more sophisticated or burdensome operating system capabilities than a particular application may actually need.
Also, somewhat as foreshadowing, FIG. 1b depicts HWR #3 and HWR #4 as being part of a multi-channel hardware device (Multi-channel HW). Thus, if HWR #4 is another printer in a pool of collective printers that are equal in all relevant respects to HWR #3, it would be nice if our hypothetical print job from TOE #3 could simply be rerouted to HWR #4. Traditionally, it has been left to the operating systems in computer systems to manage such multi-channel hardware resources.
Turning now also to FIG. 2 (background art), it is a listing of pseudo code that represents how TOE #2 would obtain the use of HWR #3 in a scheme where traditional software locks are employed. First, ownership of HWR #3 is established with OS #2 on processor #2 (i.e., Lock #2 in FIG. 2). Then ownership is established with OS #1 on processor #1 (i.e., Lock #1). TOE #2 can now use HWR #3, until its use finishes or one of the operating systems steps in. Then Lock #1 is released. And then Lock #2 is released. As discussed next, the areas marked in FIG. 2 with asterisks are ones of particular interest.
FIGS. 3a-d (background art) are a series of timing diagrams that respectively show two representative cases, also a hypothetical situation approaching the worst case, and also a hypothetical situation approaching the best case for threads of execution accessing available (idle) hardware resources. FIGS. 3a-c progressively show the effect of increased speed of the hardware resource on performance in a traditional computer system. An important point to observe here is that, regardless of the duty cycle for the respective hardware resources, the duty cycle for software lock management remains essentially fixed (of course, it can increase if a hardware resource is not idle). Thus, as represented by FIG. 3c, the worse case is one where a thread of execution is spending most of its processor time dealing with the overhead of accessing hardware resources, rather than getting actual work done.
In contrast, FIG. 3d represents the best case, one where a thread of execution spends little of its processor time on the overhead of accessing hardware resources, and thus gets more actual work done. The case depicted in FIG. 3d is clearly desirable, but it is unattainable under the presently conventional schemes. There is inherently a limit to the extent that the software used for a processor-based software lock can be optimized, and in most operating systems today that limit has pretty much been reached.
For example, depending on the implementation of a mutex, its “overhead” can be on the order of 10s of microseconds, to lock and unlock the mutex to maintain atomic access to a given hardware resource. This is exacerbated when operating systems with protected process memory spaces are used, such as Linux and Windows™. This is also exacerbated when a mutex needs to be multiple processor safe. The magnitude of the overhead in these increasingly common cases may then even be significantly higher, tending to negate both the performance advantages of using a hardware accelerator as well as the advantages of making a hardware resource asynchronous. Furthermore, multi-processor safe locks are usually implemented as spin locks, and all of this may result in priority inversion across the multiple processors.
Accordingly, there especially remains a need today for a more efficient mechanism for accessing hardware resources in computer systems.