For the purposes of the present invention, "virtual robots" (or "BOTs") are software programs that interact and/or communicate with users (human, machine or otherwise) that take actions or make responses according to input from these users. BOTs are the subject of the co-pending and co-assigned parent application entitled "Methods for Automatically Focusing the Attention of a Virtual Robot Interacting with Users", filed Jun. 4, 1997, Ser. No. 08/868,713 (pending), and incorporated by reference in its entirety herein. A common use of such a BOT is as an interface to a web site wherein the administrator of that site has programmed the BOT to answer simple inquiries that are typically asked by visitors to the site. The above identified application discloses a method of creating BOTs according to "scripts"--i.e. programs that are written in a very high level language that closely resembles a human natural language. These scripts embody a certain amount of information concerning the site that the administrator desires the BOT to communicate to a user during a connection session.
If a BOT is to be deployed in a publicly accessible way such as a web page or chat site, there is a need to test the BOT as thoroughly as possible to ensure that, as often as possible, it will produce an appropriate response to the inputs that it is likely to receive and the situations that it is likely to encounter. In this context, "input" refers to any description of a situation the BOT may encounter; although the most common inputs are textual inputs from users, inputs can be actions taken by users, external circumstances, or even events internal to the BOT such as an internal alarm clock. If the BOT can be tested in advance, the person or organization that is deploying the BOT can be more certain of its likely performance, and errors can be detected in advance that might otherwise result in mistakes that could mislead users interacting with the BOT and/or reflect poorly on the authors or deployers of the BOT.
Historically, most BOTs have been tested manually, by having a human user or set of human users interact with the BOT and observe any errors it might make. Such testing is ordinarily done when the BOT is first written, and may continue throughout the lifetime of the BOT as changes are made to it. Testing can also be said to occur after deployment as users interact with the BOT; errors found through this form of testing indicate that the BOT has already made a mistake when publicly deployed. Thus, there is a need to test thoroughly before public deployment.
Such human testing, although usually necessary, has a number of drawbacks. First, it is time-consuming. A typical BOT may contain thousands of possible responses, all of which need to be tested. Second, it is usually incomplete. Unless the testers are given a list of all possible responses that should be tested, the testers will only cover a subset of the possible responses. Furthermore, if the response given to an input may depend on the context, there is an exponential number of response sequences that must be tested. Finally, it is difficult to maintain assurance as changes are made to the BOT. In most BOTs, each change can potentially affect the responses given to many other inputs, so the entire testing effort must be repeated for each set of changes that are made to the BOT.
One possible solution to assist in the testing process is to create a "script" containing possible inputs and the expected responses. These inputs can either be textual inputs to the BOT or descriptions of other conditions for which the BOT should have a response. This script can then be used for automated testing by presenting each input to the BOT and determining whether the proper response is produced. Scripts are commonly used in the verification of other computer programs and could easily be applied to BOTs as well.
The use of such scripts has a number of desirable properties: once the script is developed, the BOT can be tested quickly; the script can be made as complete as needed; and the script, with appropriate modifications, can be re-tested any time changes are made to the BOT. However, there are still a number of drawbacks to the use of scripts for testing BOT performance. First, it is a significant effort to create the initial script. There may be thousands of inputs and responses that need to be included. Second, modification of such a script is difficult. Every time a response is changed or a new response is added, the script must be updated. The size of the script increases the complexity of this task. Thus, in order to change or add a response, the user must potentially search through thousands of inputs and responses to find the appropriate place to change or add the response. Third, a straightforward script still does not allow for the easy testing of cases in which the response may vary depending on the sequence of previous inputs--although a more complicated "test scripting" language can help with this problem. Finally, in cases where the correct response was not given, such a script does not ordinarily provide enough information to produce an error message that contains more information than the fact that the correct answer was not given.
There are a variety of well-known techniques that are used for verification of programs in traditional programming languages such as C or FORTRAN. However, the problems faced in automatic verification of natural language systems are significantly different than the problems faced in verification of other computer programs. In most programs, for instance in a typical numerical analysis system, the intended behavior of the system can be described for all possible inputs, and ordinarily there is only one or a few qualitatively different output possibilities. However, in a typical natural language system, there may be thousands of possible responses to inputs, all of which must be tested to insure that they will be given in response to appropriate inputs, and not given in response to inappropriate inputs. Well-known techniques of black-box testing can be applied to such a system, but as described in the previous paragraph, there are significant problems with such an approach.
Thus, there is a need in the art to have a means of automatically verifying the performance of a BOT that allows the creation of the testing information simultaneously with the development of the BOT and that allows the BOT author to easily modify the testing information as the BOT is modified.
There is also a need for the verification mechanism to be given sufficient information to provide useful diagnostic output when an error is found, in addition to simply reporting the error.
There is also a need, in the case where the response given by the BOT may vary depending on previous inputs given to the BOT, for the verification mechanism to be able to verify that a response will be given correctly regardless of the prior sequence of inputs the BOT has seen, or that a response will be given correctly under the condition that a particular sequence of inputs precedes it. There is a need for such verification to be done efficiently, without the need for testing an exponential number of sequences of inputs and responses.