The present invention relates in general to multi-agent systems and more particularly to a fault localization algorithm for automatically solving semantic problems in multi-agent systems.
Telecommunication systems have recently been designed for providing a variety of real-time services and features in an open distributed environment through the collaboration of a set of software components called agents. These multi-agent systems are designed in such a way that they may adapt and evolve in the face of changing environments. One such multi-agent system is known as MANA (Multi-Agent Architecture for Networking Applications) developed by Mitel Corporation. Through the use of a distributed agent architecture, this system meets high reliability levels and adapts to accommodate technological and service evolution. To achieve these goals, intelligence or learning mechanisms are provided to update service information derived from the operation of the agents. This information is used to redefine the agents and to reallocate resources for correcting failures and to meet the requirements of a defined service more precisely.
An application or service in a multi-agent system is mapped as a series of calls amongst agents to perform the service. Each agent specifies its type, quantity and quality of service (QoS) in order to provide for an overall application. Since multi-agent systems are implemented in an open environment, no agent has prior knowledge of any other agent. The only knowledge that an agent possesses is its requirements and capabilities to provide a specific type of service. Thus, an agent may be required to find other agents to fulfill certain of its service requirements. A calling agent (referred to herein as a contracting agent) sends out a bid for services to a plurality of called agents (referred to herein as contractor agents), each of whom may be capable of providing the necessary resources for the contracting agent to complete its task. The contracting agent receives and evaluates the bids from the various contractor agents and selects the agent which has the best chance of success in performing the requested service. A contract is said to have been made when a particular contractor agent has promised the contracting agent that it will provide its resources for a predetermined period of time. During system setup, the nature of services which are embedded in the agents is analyzed by a matching mechanism so that the system is able to recognize the connection of agents for future collaboration. This loose relationship existing between agents is referred to herein as understanding. The matching mechanism assigns all possible agents which can provide services (by checking templates residing in their scripts) for the agent based on the specification (format, content, etc.) of all predefined service requests. A script is stored in the head of an agent and contains a description of a specific job and a set of rules which control job behavior).
During the process of negotiation for a contract, the contracting agent provides job descriptions for the contractor agent and receives services from it. The contracting agent puts its trust in the contractor agent and only the application agent for the top contracting agent (the first calling agent that initiates the application, usually a helpdesk agent) has the requisite knowledge of itself to determine whether the result is successful based on overall performance. With this capability, performance failures of applications are discovered and handled at the semantic level of the application.
External resources connected to a multi-agent system may alter from time to time without notifying the system, for example as a result of technology innovation. In many cases, no problems are encountered in the system provided that the protocol between contracting and contractor agents is followed, as set forth above. However, in some circumstances, this protocol does not cover all changes to the system. For example, early dot matrix printers were capable of printing only text data files. Now, most printers employ either ink-jet or laser technology and provide a range of options to the user. The user cannot simply send a flat text file to a laser printer and expect the job to print because the default input format is in postscript. Instead, the laser printer requires additional details to be specified before the printing job can occur. In a multi-agent system with a held desk agent, a printer server agent, a file server and several printer resources, when the user wishes to print a document, he/she simply specifies the name of the data file and selects the appropriate print option at the help desk GUI (graphical user interface). The help desk agent passes the file name to the printer server agent which then obtains the data file from the file server and selects an available printer for the job. Before the advent of postscript, the protocol or agreement between the printer server agent and the printer resources were simple: any printer that was not busy would be selected and that printer would print the contents of the data file obtained from the file server agent. However, with the advent of laser printers and postscript format, an idle laser printer would not be capable of printing the flat text file since the postscript format requires certain fields which specify the file format. The printer would assume that it has received a postscript file since there has been no information specified in the file format field during system set up and therefore the print job fails. Nonetheless, the printer server agent assumes that the print job has been successful and the problem is discovered only after the printer output has been passed back to the help desk agent or to the client.
This type of failure is known as a semantic error. For the purposes of describing the present invention, a semantic error in a multi-agent system is defined as:
A fault that can only be discovered when the discrepancy of overall performance is detected by the application agent during the operation of an application.
Thus, semantic level errors occur at the top level of the agent connection hierarchy of an application, where the application agent resides. The semantic error may be caused by a hardware failure that cannot be detected by underlying agents (i.e. at the sintactic level) during operation, or due to a misunderstanding between agents during the application set up stage. The printer example set forth above is very simple. In the general case, semantic problems are more complicated due to the fact that a faulty resource may lead to semantic errors in some applications but work properly for others.
According to the present invention, a system is provided for automatically locating sources of semantic error in a multi-agent system based on setup connection tree information, and informing the appropriate agents so that they can avoid using the faulty resources in the future. The setup connection tree model is established based on patterns of agent actions for expressing the logical relationship between available resources in the disjunctive normal form (d.n.f.). A table is used to record different sets of resources for use in the resource selection process. Thus, faulty resources can be located by means of induction. A global database is also maintained (referred to herein as the SEDO Information Base (SIB), where SEDO stands for Semantic Error Diagnostic Operation), for updating information on semantic errors in the system. Thus, a system administrator can analyze the cause of semantic problems based on detailed information maintained in the SIB database.