With the widespread popularity of the Internet, distributed applications are becoming more important in the field of computing. Distributed applications include components running in separate runtime environments, such as on separate machines connected by a network. For example, in an order processing application, components of the system may run at different geographical locations and be controlled by different companies. One component may track what the customer has ordered, another component may track remaining inventory, and still another component may track account information. The components may be connected via a network and work together by communicating over the network whenever a customer places an order.
Unfortunately, distributed applications are difficult to design and test. First, there are physical limitations. For example, it may be difficult to assemble all of the software related to a distributed application in one location for review by a programmer. In many cases, the software is operated by different companies, who may be separated by great distance and use entirely different programming environments.
Second, even if it were possible to assemble the code into one location, companies may be reluctant to share their programming code with others. Such programming represents significant work on the part of a team of highly skilled programmers, so it is often simply not open for inspection by outsiders.
Third, many distributed applications are inherently concurrent. Due to the complexities of concurrent programs, errors are easily introduced but difficult to pinpoint. Programmers are quite familiar with and have even developed tools to help analyze sequential programs. However, concurrent programs pose a difficult challenge that is not easily grasped by most programmers, and tools written for sequential scenarios do not work well for concurrent systems. For example, even a simple distributed application involving three concurrently running components may have a dizzying array of possible sequences of events. If a programmer fails to anticipate a particular rare sequence of events, the program may become deadlocked (e.g., at least one component is blocked waiting for messages that are never sent) or otherwise behave unexpectedly. Error and timeout conditions are particularly fruitful areas for possible programming flaws.
Unfortunately, the appearance of a rare sequence of events can depend on seemingly random factors, such as machine load, connection speed, and the like. Thus, for purposes of testing, the systems appear to behave in a non-deterministic manner. Therefore, it may be almost impossible to reliably reproduce a particular error. As a result, the error becomes an elusive but persistent problem in the software.
Model checking has been used successfully to check sequential programs. However, typical approaches fail to take into account the possible interleaving that occurs in concurrent applications. In principle, model checking could be performed to analyze distributed applications, but such an approach has two serious disadvantages. First, it quickly leads to state explosion, because the state space of the system grows exponentially with the number of concurrent components. Second, it requires that the entire system be available for analysis, which is especially unrealistic for distributed systems such as web services. Therefore, new technologies in the field of analyzing distributed applications are needed.