Text analysis, referenced hereinafter as “TA,” is known in the art pertaining to this invention as a sub-area or component of Natural Language Processing, hereinafter referenced as “NLP.” TA is used in a range of commercial, research and educational industries and uses including, for example, information search and retrieval systems, e-commerce and e-learning systems. A typical TA involves an “annotator” which, as known in the relevant art, is a process for searching and analyzing text documents using a defined set of tags, and running the annotator on the text document to generate what is known in the art as “linguistic annotations.” Annotators and linguistic annotations are well known. For the interested reader, an example listing of the many available TA publications can be found at the following URL: <http://www.ldc.upenn.edu/>.
An example TA may be illustrated by: <annot type=“X”>text<|annot>, where “X” may be any of a defined set of annotation types such as, for example, Person, Organization and Location, and “text” is the text that the “X” annotation characterizes. This example TA, when inserted into or otherwise associated with an example text to indicate or delineate the beginning and end of the annotated text, may be as follows:                “The underlying economic fundamentals remain sound as has been pointed out by the Fed,” said <annot type=“Person”>Alan Gayle</annot>, a managing director of <annot type=“Organization”>Trusco Capital Management</annot> in <annot type=“Location” kind=“city”>Atlanta</annot>, “though fourth-quarter growth may suffer”.        
In the above example, “Alan Gayle” is an instance of the annotation type Person, “Trusco Capital Management” is an instance of the annotation type Organization and “Atlanta” is an instance of the annotation type Location. The example annotation type Location has an example feature, shown as “kind,” with example possible values of “city”, “state”, and the like.
A problem can exist or present, though, when using a new or unknown annotator, which is that the industrial fields or other TA objectives to which the unknown annotator relates, to which it may be best suited, may not be fully known.
These and other problems can be considerable, because a user often needs a particularly tailored application, e.g. particular annotation rules and annotation types (e.g. annotate all CEOs of IT companies), in documents from in a given document collection or domain. There are known ways of building such particularly tailored applications, such as, for example, Interactive Learning System (see, for instance, SAIL system—Semi-Automated Interactive Learning, being researched at IBM). These related art systems can eventually generate a rule-based annotation engine, capable of producing desired annotations, but have at least two shortcomings: human judgment is required, and there is a possibility of producing or converging on an inefficient result. For example, there may be a high number of processing rules generated by the SAIL system in response to interactions with the human user. Further, a new knowledge domain, and even a new document corpus, may require re-training of the system and re-generating the rules.