1. Technical Field
The present invention relates to the field of text analytics.
2. Discussion of the Related Art
Text analytic solutions involve the process of annotating data within natural language documents with information. The annotations allow a text analytics application to scan information written in a natural language in order to extract information and populate a database or search index with the extracted information. Information is extracted from a document according to a set of rules defined in the text analytics application. Text analytic applications typically comprise two types of rules. The first type is dictionary rules, which define annotations that should be applied whenever a specified phrase is encountered. For example, the phrase ‘International Business Machines’ should be annotated as an ‘Organisation’. The second type is grammatical rules, which define the annotations that should be applied whenever a grammatical pattern is encountered. For example, in a grammatical pattern comprising the phrase ‘member of’ followed by any ‘Name’, the ‘Name’ annotation should be changed to an ‘Organisation’ annotation. In another example, a grammatical pattern comprising a ‘Name’ followed by a ‘Verb’ followed by a ‘Name’ can be extracted into a Subject-Object-Predicate triple for use in a semantic knowledge base
When presented with a test corpus of documents, text analytics applications are designed to identify those parts of the document that will cause a rule to be triggered. For example, the need to identify the occurrence of dictionary terms within a document would trigger dictionary rules when the text analytics application scans the document and locates a dictionary term.
When working with existing analytics rule development tools, the rule developer typically faces certain challenges, including: (1) ensuring that all phrases and/or variants of text that are the subject of a search are found; (2) identifying and resolving conflicts between two or more rules applied to the document or text corpus during a search; and (3) understanding the impact of a rule change to the overall performance of the analytics rule development tool.