Field of the Invention
The present invention generally relates to text annotators used in natural language processing, and more particularly to a method of generating test cases (e.g., sentences) used to test a text annotator.
Description of the Related Art
As interactions between users and computer systems become more complex, it becomes increasingly important to provide a more intuitive interface for a user to issue commands and queries to a computer system. As part of this effort, many systems employ some form of natural language processing. Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation allowing computers to respond in a manner familiar to a user. For example, a non-technical person may enter a natural language query in an Internet search engine, and the search engine intelligence can provide a natural language response which the user can hopefully understand. One example of an advanced computer system that uses natural language processing is the Watson™ cognitive technology marketed by International Business Machines Corp.
Text analysis is known in the art pertaining to NLP and typically uses a text annotator program to search text documents and analyze them relative to a defined set of tags. The text annotator can then generate linguistic annotations within the document to extract concepts and entities that might be buried in the text, such as extracting person, location, and organization names or identifying positive and negative sentiment. FIGS. 1A-1B illustrate one example of annotations that may be performed by a prior art text annotator. In this example an annotation takes the form <annot type=“X”>text</annot>, where “X” may be any of a defined set of annotation types such as Person, Organization and Location, and “text” is the particular text in the document that the “X” annotation characterizes. The text annotation is inserted into or otherwise associated with an example text to indicate or delineate the beginning and end of the annotated text. So, in the sentence “‘Economic fundamentals remain sound’ said Alan Gayle, a managing director of Trusco Capital Management in Atlanta, ‘though fourth-quarter growth may suffer’”, “Alan Gayle” is an instance of the annotation type Person, “Trusco Capital Management” is an instance of the annotation type Organization and “Atlanta” is an instance of the annotation type Location. Further to this example, annotation type Location has a feature, shown as “kind”, with example possible values of “city”, “state”, and the like. The text annotator will accordingly annotate the sentence as follows: “‘Economic fundamentals remain sound’ said <annot type=“Person”>Alan Gayle</annot>, a managing director of <annot type=“Organization”>Trusco Capital Management</annot> in <annot type=“Location” kind=“city”>Atlanta</annot>, ‘though fourth-quarter growth may suffer’”. In this manner, artificial intelligence programs using text analysis routines can obtain an “understanding” of the meaning of the annotated sentence. Custom annotators can be configured to identify and extract domain-specific information.