It has been estimated that 80% of a company's knowledge lies in employee mailboxes and desktops, and not in databases or reports. The former sources can be explored by search engine and text mining technologies, such as Google, Yahoo, or Business Object's Inxight product line. On the other hand, database vendors and Business Intelligence vendors are developing approaches to open their products, traditionally focused on the processing of structured data (e.g., database records or spreadsheet cells), to semi-structured or unstructured data (e.g., pieces of text in documents).
Several approaches have been explored for exploiting potential synergies between structured data and semi-structured (or unstructured) data. In one approach, features are extracted from a piece of text, and the features are stored together with the piece of text in a manner suitable for processing by traditional database or Business Intelligence systems. Many commercial tools, including Business Object's Inxight product line, are able to extract specific features (e.g., sentence, paragraph, clause, entity) from a piece of text and, for instance, build an XML file or a database that associates these features with the originating text. The features may then be used to search the text (i.e., unstructured data) using various front-end tools.
In another approach, indexes, XML documents or databases produced through indexing can be adapted for processing by databases or Business Intelligence software to create reports or analytics. For instance, a system may build an index that relates specific terms (e.g., product names, terms indicating a customer “mood”) to their occurrences in a collection of customer support emails. It is then relatively easy to compute aggregates and statistics about the most-commonly used terms, for instance, using database or Business Intelligence software.
Vendors have developed drivers that provide indexing of database or spreadsheet records. However, in general, the relevance of this indexing process is low due to lack of context about the intended semantics of the structured data. For instance, an important piece of customer-related information may be “hidden” in a system table with a name that has nothing to do with customers and won't ever be associated to a search string that involves customers. Business Objects has developed a system that leverages pre-existing knowledge of the business semantics of database data, captured by a “semantic layer”, in order to provide indexing of database content and retrieval of the indexed content using unstructured search terms. Such a system requires user-formulated search terms and may not provide results having suitable relevance.
Lastly, one approach (e.g., Microsoft's “English Query”) allows users to use natural language in order to express database queries. Such approaches rely on the system's ability to “understand” natural language clauses which, for instance, express complex conditions on data, and also require the creation of a thesaurus that relates database entities to natural language entities. Such approaches are generally not suitable, partly because the ambiguities of natural language queries force the system to systematically re-phrase and double-check its understanding of the query, which is both frustrating and time-consuming for the user, and also because of the cost of setting up the necessary linguistic knowledge in organizations.
Another approach, which has been exemplified by Business Object's Intelligent Question, is to help the user incrementally build a meaningful query in natural language through guided navigation into a user interface to ensure that the question is understood by the system.
Improvements to address one or more shortcomings of the foregoing approaches are desired.