The referenced microfiche appendix is on deposit at the U.S. Patent and Trademark Office and was submitted with related application Ser. No. 09/017,760. The microfiche appendix comprises source code of a present embodiment of the present invention. There are 178 frames contained in 2 pages of microfiche.
For the purposes of the present invention, xe2x80x9cvirtual robotsxe2x80x9d (or xe2x80x9cBOTsxe2x80x9d) are software programs that interact and/or communicate with users (human, machine or otherwise) that take actions or make responses according to input from these users. BOTs are the subject of the co-pending and co-assigned parent application entitled xe2x80x9cMethods for Automatically Focusing the Attention of a Virtual Robot Interacting with Usersxe2x80x9d, filed Jun. 4, 1997, Ser. No. 08/868,713, and incorporated by reference in its entirety herein. A common use of such a BOT is as an interface to a web site wherein the administrator of that site has programmed the BOT to answer simple inquiries that are typically asked by visitors to the site. The above identified application discloses a method of creating BOTs according to xe2x80x9cscriptsxe2x80x9dxe2x80x94i.e. programs that are written in a very high level language that closely resembles a human natural language. These scripts embody a certain amount of information concerning the site that the administrator desires the BOT to communicate to a user during a connection session.
If a BOT is to be deployed in a publicly accessible way such as a web page or chat site, there is a need to test the BOT as thoroughly as possible to ensure that, as often as possible, it will produce an appropriate response to the inputs that it is likely to receive and the situations that it is likely to encounter. In this context, xe2x80x9cinputxe2x80x9d refers to any description of a situation the BOT may encounter; although the most common inputs are textual inputs from users, inputs can be actions taken by users, external circumstances, or even events internal to the BOT such as an internal alarm clock. If the BOT can be tested in advance, the person or organization that is deploying the BOT can be more certain of its likely performance, and errors can be detected in advance that might otherwise result in mistakes that could mislead users interacting with the BOT and/or reflect poorly on the authors or deployers of the BOT.
Historically, most BOTs have been tested manually, by having a human user or set of human users interact with the BOT and observe any errors it might make. Such testing is ordinarily done when the BOT is first written, and may continue throughout the lifetime of the BOT as changes are made to it. Testing can also be said to occur after deployment as users interact with the BOT; errors found through this form of testing indicate that the BOT has already made a mistake when publicly deployed. Thus, there is a need to test thoroughly before public deployment.
Such human testing, although usually necessary, has a number of drawbacks. First, it is time-consuming. A typical BOT may contain thousands of possible responses, all of which need to be tested. Second, it is usually incomplete. Unless the testers are given a list of all possible responses that should be tested, the testers will only cover a subset of the possible responses. Furthermore, if the response given to an input may depend on the context, there is an exponential number of response sequences that must be tested. Finally, it is difficult to maintain assurance as changes are made to the BOT. In most BOTs, each change can potentially affect the responses given to many other inputs, so the entire testing effort must be repeated for each set of changes that are made to the BOT.
One possible solution to assist in the testing process is to create a xe2x80x9cscriptxe2x80x9d containing possible inputs and the expected responses. These inputs can either be textual inputs to the BOT or descriptions of other conditions for which the BOT should have a response. This script can then be used for automated testing by presenting each input to the BOT and determining whether the proper response is produced. Scripts are commonly used in the verification of other computer programs and could easily be applied to BOTs as well.
The use of such scripts has a number of desirable properties: once the script is developed, the BOT can be tested quickly; the script can be made as complete as needed; and the script, with appropriate modifications, can be re-tested any time changes are made to the BOT. However, there are still a number of drawbacks to the use of scripts for testing BOT performance. First, it is a significant effort to create the initial script. There may be thousands of inputs and responses that need to be included. Second, modification of such a script is difficult. Every time a response is changed or a new response is added, the script must be updated. The size of the script increases the complexity of this task. Thus, in order to change or add a response, the user must potentially search through thousands of inputs and responses to find the appropriate place to change or add the response. Third, a straightforward script still does not allow for the easy testing of cases in which the response may vary depending on the sequence of previous inputsxe2x80x94although a more complicated xe2x80x9ctest scriptingxe2x80x9d language can help with this problem. Finally, in cases where the correct response was not given, such a script does not ordinarily provide enough information to produce an error message that contains more information than the fact that the correct answer was not given.
There are a variety of well-known techniques that are used for verification of programs in traditional programming languages such as C or FORTRAN. However, the problems faced in automatic verification of natural language systems are significantly different than the problems faced in verification of other computer programs. In most programs, for instance in a typical numerical analysis system, the intended behavior of the system can be described for all possible inputs, and ordinarily there is only one or a few qualitatively different output possibilities. However, in a typical natural language system, there may be thousands of possible responses to inputs, all of which must be tested to insure that they will be given in response to appropriate inputs, and not given in response to inappropriate inputs. Well-known techniques of black-box testing can be applied to such a system, but as described in the previous paragraph, there are significant problems with such an approach.
Thus, there is a need in the art to have a means of automatically verifying the performance of a BOT that allows the creation of the testing information simultaneously with the development of the BOT and that allows the BOT author to easily modify the testing information as the BOT is modified.
There is also a need for the verification mechanism to be given sufficient information to provide useful diagnostic output when an error is found, in addition to simply reporting the error.
There is also a need, in the case where the response given by the BOT may vary depending on previous inputs given to the BOT, for the verification mechanism to be able to verify that a response will be given correctly regardless of the prior sequence of inputs the BOT has seen, or that a response will be given correctly under the condition that a particular sequence of inputs precedes it. There is a need for such verification to be done efficiently, without the need for testing an exponential number of sequences of inputs and responses.
The present invention meets these aforementioned needs by providing a variety of mechanisms for verifying the performance of a virtual robot or BOT. In an automated interface program designed to interact and communicate with users, said program executing actions when a category among a predefined set of categories is activated, a method is disclosed for automatically verifying the performance of said program, the steps of said method comprising:
(a) specifying inputs under which the program should be tested;
(b) associating said inputs with conditions within categories in the program, each said condition comprising at least one response which could be given if said condition is satisfied;
(c) executing said program under at least one said input;
(d) determining whether the associated condition is satisfied upon said input; and
(e) determining whether the response associated with said condition is given upon said input.
In another aspect of the present invention, the test inputs are embedded within the script itself, and specifically, within categories that can be automatically listed upon compilation of the script. Such list of test inputs can then be automatically executed to test the program. The execution of a test input can be used to check whether the test input activated the category in which the test input is embedded.
The response given upon execution of a test input can then determine whether other categories are erroneously activated; or whether inputs, other than the test input, erroneously activate the category associated with the test input.
Other aspects of the verification mechanisms are disclosed in the description given below when read in conjunction with the accompanying figures.