Bugs caused by unintended data races are common, usually serious, and often difficult to detect and reproduce. These bugs can routinely escape long periods of stress testing to cause havoc later. A data race occurs when the same memory location is accessed concurrently, with at least one access being a write. Programmers can prevent harmful data races and achieve a desired atomicity and/or ordering property by using synchronization operations. If a synchronization operation is left out, a wrong one is used, or one is used at a wrong place, a data race bug may be introduced. After a suspected data race bug causes a failure, it is typically difficult to reproduce the failure. This difficulty occurs because after a moderate amount of testing and bug-fixing, the remaining bugs tend to be the least likely to occur in normal execution.
One approach to deal with data races is to locate all shared memory accesses, eliminate those properly protected by synchronization, prune benign races according to some heuristic, and report the remaining as potentially buggy data races. This approach may have a relatively high rate of false positives. The problem of false positives is even more serious in kernel code, which employs a variety of synchronization methods beyond well-defined locking application program interfaces (APIs), such as disabling interrupts, lock instructions, and hardware states.
Another approach, known as systematic schedule exploration, systematically exercises the access interleaving space by controlling the scheduling behavior: the rare, buggy interleaving must be among those systematically explored. However, modifying scheduling behavior for a kernel is much more difficult than for applications.