Distributed computer systems have increased in sophistication over the years, with more complex functions being performed and more strategic dependence resting on them. The architecture of such systems has evolved from the individual computer to the network. The spatial distribution of integrated functions includes many types of ground centers, as well as aircraft and satellites. The requirements for interoperation are even growing to cross-system boundaries. In military applications, interdependence of air-based and space-based sensors, navigation, communications, and weapons systems is evident in new developments. In civil applications, integration of the formerly separate parts of industrial enterprises is becoming commonplace. Concurrently, automated network based interactions of organizations with suppliers, customers, financial institutions, and government agencies are being established on local, national, and global scales.
The system-wide and intersystem integration of such computer systems requires finctionality and data that are both distributed and networked. Decentralized network architectures incorporating intelligent agent technology are desirable because of the heterogeneity of the mix of operational and developmental systems and the desire to control complexity. However, distributed systems introduce threats to system reliability that centralized systems do not suffer.
As system architectures have evolved toward network infrastructure models, changes in composition have also taken place. The granularity of processing subsystems and software modules is finer in modern systems than in earlier ones, with more dependence on distributed small processor hardware elements and off-the-sbelf software functional blocks, such as database management systems, graphical user interfaces, and network interfaces. The basis of integration has changed from direct interface specification to interface specification via government and industry standards. The standards tend to emphasize the data link and transport aspects of interfaces between processing entities. Typically, the standards do not address the more abstract aspects, such as the session and presentation layers of the ISO/OSI protocol model. Thus, modern distributed systems tend to be loosely coupled in terms of application-to-application interaction, with interfaces executed through messages sent asynchronously between nodes and with handshaking protocols, either nonexistent or negotiated, among designers of the respective applications.
Intelligent agent technology provides a modern approach to the automation of intersystem processes. For the purpose of this discussion, the terms "intelligent agent" or "agent" mean "an entity that functions continuously and autonomously in an environment in which other processes take place and other agents exist" (Shoham 1993). In simple terms, agents are special software programs which autonomously perform tasks for a user. An agent can monitor for a certain condition, decide what to do based on a set of rules provided to it, and take an action based on the rule corresponding to the condition. For example, the agent can monitor a stock price over the Internet and if the price drops below a given value, the agent can automatically purchase that stock at that price. In another example, the agent can be configured to monitor an inventory of a product, and if the inventory falls below a given number, the agent can automatically order more of the product from a supplier.
A characteristic that distinguishes agents from ordinary software is that agents operate at a high level in the abstraction spectrum, using symbolic representation of information, cognitive-like functions, and social behavior conventions to perform their tasks.
Intelligent agents permit information flow between their host systems without the need for direct interfaces between the host systems. The agents facilitate large-scale system dynamics by encapsulating among them the messages, protocols, and state behavior of the interacting systems. Because the agents become integral to the proper intersystem function of the distributed computing system, the agents'reliabiity becomes an important factor in the overall reliability of the distributed computing system and the interaction of the agents.
The integrity of data is critical to the reliability of agent-based distributed computing systems, both for the data owned by the agents and the application data being exchanged. If agents have incorrect data or do not have needed data, the distributed computing system becomes inefficient because the individual agents are not operating as other agents expect. To exacerbate the problem, the operational environment of distributed computing systems can be severe and opportunities for data loss and data corruption are great. Hardware, communications, and on-board memory failures are to be expected, and brute-force reliability is too expensive to guarantee.
Autonomous agents in local systems may use distributed object management ("DOM") techniques to collaborate in the production and utilization of data toward mutual goals. Distributed objects in DOM environments conform to type specifications and have values, as do objects in conventional object-oriented representation systems. The distributed objects in DOM systems differ from conventional objects in that they must contain additional attributes to identify the relationships and locations of their parts.
DOM design policies usually state criteria for access by agents to data owned by other agents and obligations of agents that want data to provide it. The design policies also typically require that DOM services not limit the autonomy of the cooperating agents. These policies conflict to the extent that distributed objects create dependencies among the agents. In fact, they do create dependencies, and the dependencies lead to obligations on the part of the agents. The obligations, if not recognized or taken into account, can induce faults into the distributed objects.
Separately, recent research and development efforts have proven the effectiveness of formal fault tolerance techniques on software in conventional computer system architectures (i.e., non-distributed). Software fault tolerance methods are intended to defend against defects that consist of design flaws or bad data. In general, fault tolerance techniques protect against faults through the controlled use of redundancy. Fault tolerance techniques can be classified as either masking or dynamic, depending on the way redundancy is used. A typical masking technique provides multiple parallel redundant versions of a critical process or creates multiple copies of the data to be handled. The multiple results are applied to a voting or averaging algorithm, and the best result is output as the product of the protected process. A typical dynamic fault tolerance technique has alternative versions of the process or alternative data representation schemes available, but the alternatives are invoked only when faults are detected. For this discussion, the term "fault tolerance" includes fault detection, fault isolation, fault containment, fault correction, and fault masking. For a fuller discussion of data-diverse software fault-tolerance tolerance techniques, the reader is directed to P. E. Ammann and J. C. Knight, "Data Diversity: An Approach to Software Fault Tolerance," IEEE Transactions on Computers Vol. 37, pp. 418-425 (Apr. 1988), incorporated herein by reference.
Unfortunately, existing technology has not addressed the problem of faults, such as corrupt data or communication failures, between agents interacting in a distributed environment. Faults are merely ignored, resulting in less then ideal conclusions to the interactions between the agents. Accordingly, a need exists for a fault-tolerant intelligent agent-based distributed computer system which provides highly-reliable interactions between the agents in the system.