Understanding and processing natural language, either spoken or written, has long been a goal in the field of artificial intelligence. As computers have been programmed to perform feats, such as defeating the world""s best human chess master in his game, other skills exhibited by humans are still seemingly beyond the reach of even the most powerful computers. Although a small child may not be able to play chess, that child has the facility to process and understand its native tongue. Computers, on the other hand, have yet to exhibit any significant level of mastery in the realm of natural language processing.
One attempt at simulating natural language skills is the virtual robot, or bot. A bot is a program that engages in a natural language dialog with a user. A bot may use a scripting language to match input sentences from a user against input templates of keywords. An input template might, for example, take a group of related keywords and lump them together for the purposes of responding. Thus, words like xe2x80x9cfather,xe2x80x9d xe2x80x9cmother,xe2x80x9d xe2x80x9cbrother,xe2x80x9d and xe2x80x9csisterxe2x80x9d might be grouped together for a response that relied on the concept of xe2x80x9cfamily.xe2x80x9d In addition to recognizing familiar words, a scripting language capable of recognizing the ways these words are used in the sentence, and of tracking context across sentences, enables a bot program to track and respond to a wide variety of utterances. Generally, the program that makes use of a scripting language will have a xe2x80x9cuniversalxe2x80x9d default response if none of the keyword templates matches the input sentence. Thus the bot always has a response.
A virtual robot generally embodies a particular xe2x80x9cuniverse of discoursexe2x80x9d reflective of the subject matter of interestxe2x80x94e.g., a bot developed to converse about personal computers should xe2x80x9cknowxe2x80x9d something about computers and their peripherals. The development of such a bot employs the scripting language to recognize aspects of the subject matter and respond with appropriate content. Often these xe2x80x9cscriptsxe2x80x9d are written in an action-response type style wherein the actual language supplied by the user embodies an xe2x80x9cactionxe2x80x9d to which the xe2x80x9cresponsexe2x80x9d is written into the script itself.
Scripts are often written by a bot administrator (possibly a machine) by defining a list of xe2x80x9ccategoriesxe2x80x9d in which the bot will be well conversant. Categories may comprise xe2x80x9ctopicsxe2x80x9d that are recognizable by a runtime executive. Topics, in turn, may comprise patterns or words that are matched against the stream of input communication (in either spoken or written or any other suitable form of communication) from the user.
The main drawback with constructing a virtual bot by a list of categories is that the topics developed cannot provide complete coverage of all subjects in the universe of discourse. The result is that the bot responds with the universal default to some queries that are appropriate in the universe of discourse. Such responses are considered xe2x80x9cmisses,xe2x80x9d because the bot demonstrates xe2x80x9cholesxe2x80x9d in its knowledge of the universe of discourse when it is forced to respond with the default. A related drawback is that the universal default response generally provides insufficient guidance to the user as to their original input: it doesn""t provide a knowledgeable response to the input, and it doesn""t provide information regarding why the input xe2x80x9cconfusedxe2x80x9d the bot.
FIG. 1 illustrates the problem that arises when bot programs incompletely cover the universe of discourse. Universe of discourse 100 represents the relevant body of knowledge in which the bot should be conversant. This includes the entire area of the box labeled 100. When a bot is developed with a list of categories, and implemented with an associated list of topics, only a portion of universe of discourse 100 is represented. FIG. 1 illustrates those topics 102 that have been developed. Each topic 102 potentially has its own default 104 that is related to the subject of topic 102. When the system identifies a query as related to the topic subject, but the query does not match topic 102, then default 104 will be used to construct a response. The universal default 106 represents the total area associated with universe of discourse 100 minus the summed areas of all default 104 areas. All queries that fall into this area (i.e., that miss an implemented topic 102 or its default 104) will be responded to by universal default 106. Since the defaults 104 are related to the topics, rather than to each other, or to universe of discourse 100, hitting defaults 104 or 106 provides the user with little information that would assist in continuing the conversation. There is thus an inherent qualitative problem in developing bots from a list of topics.
Since a bot is a conversational agent, its value derives entirely from how well it interacts with users. In this context, the word xe2x80x9cwellxe2x80x9d may be defined in terms of knowledge content the bot conveys, its friendliness, how easily it is confused, and how much interaction is required for a user to find what is sought. If a user asks questions that confuse the bot, then the bot is perceived as unhelpful and its value is diminished. It is desirable in the bot development task to recognize that the level of quality and value evidenced by users is not judged merely in discrete terms, but rather by the overall impression that they get from their interaction with the bot and by their level of satisfaction with the information the bot provides.
Consequently, there is a need in the art to have a means for easily designing and creating virtual bots that enables a bot to effectively respond to arbitrary utterances with knowledge regardless of the number of topics implemented, guides the user toward providing utterances that will move the user closer to the information they seek, provides the user with information about what in the user""s utterance confused the bot when that occurs, and performs these tasks within a framework that eases the maintenance and extendibility of the bot""s capabilities.
Method and apparatus are disclosed related to the development and implementation of virtual robots (bots) directed at conducting natural language interaction with computer users. Bots employing the present invention base their natural language interaction on a predefined universe of discourse. A universe of discourse completely covers the subject matter the bot is intended to address. The complete universe of discourse is broken down, or decomposed, hierarchically. A data management structure is established to provide a storage area for each component of the hierarchy resulting from the decomposition of the universe of discourse. Each such component is called a domain. The data management structure, itself, may reflect the hierarchical decomposition of the universe of discourse. For example, a computer file system with hierarchical directory support may include a directory or subdirectory for each domain, the directories and subdirectories having the same hierarchical relationship as that of the domains they represent. Each directory may then contain discourse content associated with the domain.
Domain topics containing program code to direct the bot""s natural language interaction are placed in the storage area for each domain. Pattern lists associate words expected to be xe2x80x9cheardxe2x80x9d by the bot during natural language interaction with particular domain topics. Domain topics are provided, as appropriate, to selectively direct a user""s attention toward other domains. The other domains may be hierarchically related to the first domain as a parent, sibling, or child. Domain topics are constructed in such a way that in conjunction with a specificity-based selection mechanism, the domain topics give preference to children first, siblings second, and parents last, in order to drive the interaction toward the specific information most likely to satisfy the bot user.
Domain censoring allows a domain in the hierarchy to be effectively excluded from the natural language interaction without removing the domain from the hierarchy. Such censoring is desirable for domains whose discourse subject matter is not fully developed and for debugging during development.
Domain tiebreakers intervene in the natural language interaction where it is advantageous to prompt a user to discriminate between two or more domains having logical subject matter overlap between or among them.
Universes of discourse developed in accordance with the present invention may advantageously be stored on portable data storage media for distribution or deployment. Such a media used in conjunction with an appropriate computer creates an operative bot.