With the advent of the World Wide Web and Internet, the volume of publicly available information has grown at an unprecedented rate. In order to make sense of this ever-expanding collection, significant attention has been paid to the development of improved document searching techniques, such as search engines and the like. While such techniques have greatly improved the speed, cost and accuracy of locating relevant documents in an essentially unstructured knowledge base, the realm of entity retrieval and ranking, until recently, has been the subject of limited research. As used herein, an entity is defined by its ability to be described by one or more nouns, e.g., a person, place or thing. By way of non-limiting example, in the context of commercial enterprises, entities may comprise employees, clients, projects, partners, alliances, facility locations, competitors, etc. Of course, similar entities will be readily apparent in numerous endeavors beyond the commercial context. Regardless, the ability to quickly identify entities relevant to a given topic of interest will find application in a wide variety of applications.
For example, referring again to the commercial context, the preparation of business proposals may be made more efficient if one is able to quickly identify subject matter experts within the organization submitting the proposal. In a similar vein, the ability to accurately identify the most qualified potential team members with specific skill sets would improve project staffing. Further still, identifying the best vendors for certain equipment or service needs would be greatly simplified through provision of a system that enables quick and accurate identification of relevant entities. Stated more generally, various knowledge management tasks can be greatly simplified or assisted by delivering relevant information about entities to those responsible for such knowledge management tasks.
Currently, it is very difficult to retrieve entity-related information. In a business context, any commercial enterprise search engine, in a manner akin to web search engines, will yield a list of documents relevant to a particular topic query. However, such engines are of little help in retrieving a reliable ranked list of entities relevant to the topic, and it is left to the requester to sift through the returned documents to identify any particularly relevant entities.
More recently, entity, and especially expert, ranking has received a growing amount of attention. For example, the Initiative for the Evaluation of XML Retrieval (INEX) has introduced an entity ranking track. Such systems currently rely on the retrieved entities being marked up with Extensible Markup Language (XML). However, not all content within a given knowledge base may have entities tagged with appropriate mark-up. The Text Retrieval Conference (TREC) recently introduced an enterprise track, including an expert finding task. In one approach, a list of experts is provided and, for a given expert, a pseudo-document is created from all documents located that include a mention of that expert. In another approach, potentially relevant documents for a topic are retrieved and experts are subsequently extracted from (i.e., identified in) the set of documents. Ranking of the extracted experts according to their relevance to the topic is inferred by the number of mentions for each expert; more mentions results in higher rankings. However, to the extent that the number of mentions of an expert in a set of documents is subject to numerous other factors beyond relevance to a given topic, such systems are susceptible to providing inaccurate results. Further still, some expert identification techniques exploit structural information of documents, such as references from other, topically relevant documents or, in the example of emails, explicit links to other emails. With regard to these expert identification techniques, expert retrieval, while important, is appropriately viewed as a subset of entity retrieval and ranking and is thus limited in scope. That is, a more general entity retrieval and ranking approach represents a more scalable solution allowing for application to a wider variety of situations, and would therefore represent an advancement in the art.