Conventional search technologies have been helpful in guiding persons to possible sources of information that might contain answers they seek, but those technologies generally fail to do so in an effective manner. Traditional search mechanisms commonly overload users with many homogeneous sources of information such as hypertext links, electronic documents, etc. These sources may or may not contain the answer to the question sought, and if they do, the querier still has to examine each of those homogeneous sources (or portions thereof) to determine their relevancy to the user. And the longer it takes the user to acquire the answer sought, the higher the level of frustration and disappointment of the user.
To hasten the retrieval of the information likely to satisfy a querier, conventional search technologies have implemented a few common techniques to reduce the time to obtain an appropriate answer. One such technique employs document-level relevance, which is a measurement of the appropriateness of a document (on the whole) to a response of a particular query. As such, when a query is found to relate to a certain topic, a traditional search mechanism implementing document-level relevance retrieves one or more documents that best represents that topic. But with the advent of hypertext-based (e.g., Web-based) sources of information, classical document-level relevance has been modified in retrieval systems to perform link analysis when responding to queries. Link analysis examines the structure of the World Wide Web or enterprise Intranet and analyzes web page linkages from one to the next.
But while document-level relevance can be useful, there are significant drawbacks to a retrieval systems based entirely on this measure. First, typical retrieval systems using document-level relevance rely on the frequencies of either tokens or stems, and as such discard or otherwise ascribe de minimis value to “stoplisted” words. “Stoplisted” words are common words of a language, such as the English words “a,” “and,” “but,” “because,” etc. Since these words are generally not considered, syntactic or other linguistic information that otherwise can be used to hone a search is lost. Second, link analysis is effective only over a large number of links (e.g., collectively linking billions of unique web documents). But over a finite number of links, such as over thousands or millions of documents, link analysis is far less effective. Third, document-level relevance works well against short queries of general nature (e.g., one or two words), which are best answered by highly relevant documents. But it works poorly with specific or detailed questions, which are generally well-answered by a specific piece of text, even if the document from which the piece is taken is not relevant overall to the query.
Another conventional search technique uses ontologies in responding to natural language queries. An ontology is a set of “concepts,” where one or more concepts are associated by a set of relationships. A concept is generally understood to be an idea or thought (e.g., in the mind) that represents some tangible or intangible entity in some world (i.e., some domain), where the actual entity in the real world is called the referent of the concept. The set of concepts is open; there is no bound to the number of unique concepts constituting an ontology, whereas the set of relations among concepts is closed because there are a limited number of unique relationship types. Each concept is typically linked to at least one other concept by at least one relation. Examples of concepts include “Chief Executive Officer,” “houseplant,” “crying,” etc, and some examples of relations are “child-of” “member-of,” “synonym-of,” etc. But while the coverage and structural wealth of ontologies has increased dramatically, ontology use typically has not been fully developed.
As an example, consider a typical ontology-based search system that uses the following algorithm (or a variant thereof) to get an answer to a question. Once a query is received, the stopwords are stripped, which leaves the keywords as residue. Then, for each keyword, the system identifies a concept in the ontology. Next, from the relational position of each keyword concept in the ontology, the system follows a predefined traversal to reach a set of result concepts. Lastly, the system retrieves a number of documents containing the maximal set of result concepts from which to generate a response.
But while the classical use of ontologies is functional, there are several significant drawbacks. First, ontology-based retrieval systems are effective in obtaining the best answer only to the extent that the ontology covers the subject matter to which the query has been applied. These systems generally do not include concepts of the ontology that are attuned to match specific queries, such as unique vocabulary words, symbols, etc. Another drawback is that the classical ontology-based systems disregard linguistic cues, such as syntax. Without such cues, the response generated is not necessarily the best answer that the querier seeks. This is because a “one-method fits-all” technique (or algorithm) typically traverses only traditional ontologies, thus either failing to retrieve the answer to some questions or retrieving incorrect answers for others.
In view of the foregoing, it would be desirable to provide a system, a method, and a computer readable medium for efficiently determining an answer to which a query seeks to elicit. Ideally, an exemplary system, method, and computer readable medium would minimize or eliminate at least the above-described drawbacks associated with prior art systems.