The phenomenal success of Internet-based search engines has afforded considerable access to information via keyword queries. As a consequence, users of modern IR (information retrieval) systems (both Internet and as well as Intranet) are coming to demand access to a great variety of types of heterogeneous information, including semi-structured and unstructured documents, also referred to as “deep web”, through simple keyword queries.
In employing a simple keyword query-based search over a vast proliferation of heterogeneous information sources, some fundamental challenges are encountered, warranting new or revised approaches for search and information retrieval. One attempted solution encompasses analyzing both search queries and indexed documents by use of auxiliary data such as concepts embedded within them. In such a scenario, auxiliary data permit the engine to better interpret the search query terms and retrieve documents matching the “intent” behind the query, as opposed to documents that merely contain physical matches for the query terms. For example, a query such as “nyc map” can elicit an actual map of New York City.
The efficacy of such an auxiliary data-based approach lies in answering questions such as the following effectively: How to populate the library of the concepts? How to populate the library of instances for different concepts? How to represent a given query and/or documents through other templates that are derived using these concepts? How to leverage these templates to answer a query in an effective manner? To date, conventional efforts have not effectively addressed these and other questions in a manner that scales well or lives up to expectations of intent-based search (as opposed to keyword-based search).