1. Field of the Invention
This invention relates to the field of distributed computer systems and, more particularly, to testing heterogeneous distributed systems.
2. Description of the Related Art
As workloads on modern computer systems become larger and more varied, more and more computational resources are needed. For example, a request from a client to web site may involve one or more load balancers, web servers, databases, application servers, etc. Alternatively, some large-scale scientific computations may require multiple computational nodes operating in synchronization as a kind of parallel computing.
Any such collection of resources tied together by a data network may be referred to as a distributed system. A distributed system may be a set of identical or non-identical nodes connected together by a local area network. Alternatively, the nodes may be geographically scattered and connected by the Internet, or a heterogeneous mix of computers, each providing one or more different resources. Each node may have a distinct operating system and be running a different set of applications.
Because of the large number of possible system configurations described above, testing such a system for reliability may prove difficult. Multiple test scenarios are needed to examine the system's response to various crashes, outages, workloads, and other events. Certain common scenarios may be covered by manual testing, but such manual tests may not be nearly exhaustive, and require extensive manpower resources.
Alternatively, automated testing and scripting methods may be used to expand the footprint of coverage, but these methods also have limits. Scripting may have to be individually tailored for each node or resource, and may require extensive modification in the event of a configuration change. In addition, scripts may not be applicable across different operating systems. For example, a UNIX shell script may provide ample testing functionality for a given server cluster, but would need to undergo significant modification if new servers were added to the cluster. The UNIX shell script would also not be operable on a system running Microsoft Windows™.
Furthermore, scripting for each resource may not allow for extensive interaction between resources, and may lack a central point of control and analysis. For example, a UNIX shell script running on one node may not be able to control the operation of another node. Because a distributed system may be useful precisely because it allows the interaction of scattered heterogeneous resources to be directed from a central location, this may represent a limitation in any such testing strategy.