Modern pipelined text processing architectures are characterized by their open-ended, i.e., extensible, nature and by the high expressiveness of feature structure-based annotation schemes. Typically, text analytic pipelines seek to detect semantic elements in the underlying text repository, i.e., corpus, of documents. These semantic elements that represent portions of the documents are discovered using language analysis of the documents and are exposed or highlighted using semantic annotations associated with the semantic elements in the corpus of documents. While text analytics applications in a variety of information management scenarios facilitate arbitrarily deep and broad text analysis, these analyses often produce extremely dense annotation repositories where multiple levels of analysis get encoded to a given semantic elements as layered annotations.
The resulting layered annotations present challenges related to two use-case scenarios, i.e., cases where the annotations are being used to locate or to identify the desired semantic elements within a corpus of documents. The first use-case is where an analytics developer seeks to improve the actual detection of semantic annotations requiring deep language analysis. The second related use-case is where an end-user is trying to navigate a semantically annotated corpus and is seeking meaningful relationships between concepts in the domain that would be impossible to formulate in terms of a traditional keyword search. Faceted searches add to the complexity experienced by the end-user in searching the corpus of documents. A faceted search is distinct from a semantic search and involves progressively narrowing the range of choices in multiple dimensions. Therefore, the end-user is faced with the challenge of composing a complex query in multiple dimensions.
Systems for semantic search exist and include, for example, JURU (http://www.haifa.ibm.com/projects/imtljuru/index.html). Search facilities, including, for example, Lucene (http://lucene.apache.org/java/docs/) and Indri (http://www.lemurproject.org/indril), provide keyword search, and not semantic search. These existing semantic search tools, however, employ text only searching and in some cases allow only a restricted specification of types. For example, items from a defined and finite set of types, “Author”, “Title” or “Material”, are selected from a drop-down menu. These existing systems do not allow for the graphical composition of queries or the use of semantic or conceptual queries, i.e., queries with no literal term.