Field of the Invention
The present invention relates to systems and methods for searching a large corpus of data to identify contextually relevant search results.
Description of the Related Art
As the amount of available digital data has exploded, so too has the need for more effective search and retrieval systems. Traditional search and retrieval systems, often referred to as “search engines,” typically operate by receiving a textual query of terms and/or phrases (the “search terms”), comparing the search terms against a body of searchable content (the “corpus”), and returning the data items in the corpus that are most relevant to the keywords (the “results”). The classic example of a search engine is an Internet search engine, which indexes web pages and returns the most relevant ones in response to search term queries.
Search and retrieval alone is insufficient to fully understand and utilize the data in a corpus. Traditional keyword search techniques consider only the text of searchable documents in the corpus and ignore other semantically relevant documents that may not include the keywords directly. Accordingly, keyword searching alone may be unable to identify all semantically relevant documents.
Semantic reasoners overcome problems with traditional keyword searching by applying inference algorithms to the content and/or metadata of corpus documents to infer logical consequences. Thus, semantic reasoners can expose connections that are invisible to traditional search engines and thereby allow users to find more relevant content. For example, a semantic reasoner may be able to identify relevant documents in the corpus that do not contain the given search terms but are nevertheless semantically related, to disambiguate entities in the data that have the same or different textual names, reduce the number of results to which a user is exposed, preserve linguistic flexibility in search terms, and enable accurate ranking of query results by trust in source, etc.
Although semantic reasoners provide powerful advantages over traditional search engines, such reasoners have remained impractical for large corpuses such as Internet content, intelligence report databases, corporate document databases, and the like. The inference algorithms applied by traditional semantic reasoners can significantly inflate the already large volume of corpus data, which requires prohibitive storage and/or computing resources. Furthermore, inference algorithms are often brittle and lose accuracy when applied to large volumes of documents that span beyond a single narrow domain (i.e., “the frame problem”).