The Internet is a ubiquitous source of information. Despite the presence of a large number of search engines, however, all of which are designed to respond to queries for information by returning what is hoped to be relevant query responses, it remains problematic to filter through search results for the answers to certain types of queries that existing search engines do not effectively account for. Among the types of queries that current search engines inadequately address are those that relate in general not just to a single entity, such as a single person, company, or product, but to entity combinations that are bounded by co-occurrence criteria between the entities. This is because it is often the case that the co-occurrence criteria can be unnamed in the sense that it may not be readily apparent why a particular co-occurrence exists.
For example, consider the sentence “in their speech Sam Palmisano and Steve Mills announced a new version of IBM's database product DB2 will ship by the end of third quarter.” This sentence contains the following example unnamed co-occurrences:
Sam Palmisano and Steve Mills, Sam Palmisano and IBM, Sam Palmisano and DB2, Steve Mills and IBM, Steve Mills and DB2.
One might wish to inquire of a large document corpus such as the Web, “which person co-occurs most often with IBM?”, but present search engines largely cannot respond to even a simple co-occurrence query like this one. Other co-occurrence questions with important implications but currently no effective answers exist, such as which medical conditions are most often mentioned with a drug, which technologies most often mentioned with a company, etc. With these critical observations in mind, the invention herein is provided.