The following relates to the information management arts, relevant person identification arts, data mining arts, and related arts.
Increasingly, documents are being stored and manipulated collaboratively, and are annotated with social context metadata specifying persons associated with a document and their various roles. For example, an electronic mail (that is, “email”) message is typically annotated with information such as the “sender” (i.e., author) and the list of recipients. Typically, these persons are identified by their email addresses, although some email systems may also use personal names when available. A scientific paper in electronic form (e.g., a pdf document) may be annotated with social context metadata such as the authors of the scientific paper, the authors of references cited in the scientific paper, the conference or journal editor, the conference session chairperson, or so forth. Here people are usually identified by legal or commonly used names, e.g. in a “first name” “last name” format, possibly with an intervening middle name or middle initial. A document on a social media network or blog may be annotated by the document owner, the document creator (who may or may not be the same as the document owner), persons who have left comments about the document, or so forth. In this environment a person is typically identified by a nickname, user name, login name, or other naming convention employed by the social media network.
The foregoing are merely illustrative examples. It is seen that the social context includes more than simply a list of the names of persons associated with a document. The social context also identifies the roles of those persons within the social context. For example, associating “John Smith” with an email as the “sender” has a different significance than associating “John Smith” with an email as a recipient. The social context role of the person can be indicative of the extent of the person's involvement in the document (e.g., an author of a paper is usually much more involved with the paper than the author of a reference cited in the paper) and the nature of that involvement (e.g., the paper author has an active, creative role whereas the author of a reference has a passive role). The person's role in a document can also be indicative of that person's social status. For example, the journal editor is typically a senior scientist, whereas the author of a paper appearing in the journal may be a senior scientist, a junior scientist, or an undergraduate student.
Database operations typically focus on document retrieval. However, it can also be useful to leverage a database that contains documents annotated with social context information to identify persons who meet specified selection criteria. For example, a human resources department seeking to fill a skilled position might consider using a database of scientific papers to identify well-qualified candidates for the skilled position.
However, such searching employing existing database systems can be tedious. In the above example, the human resources department may be able to obtain a list of authors of papers directed to subject matter pertaining to the skill set of the skilled position using keyword searching; but then, someone must cull through that list to identify suitable candidates. Many of the listed authors may be too junior for a senior position (that is, underqualified) or, alternatively, too senior for a more junior position (that is, overqualified). Such existing approaches are also prone to missing relevant persons. For example, the aforementioned search may miss persons who do not publish frequently, perhaps because their current employer discourages publication, even though those persons may be visibly active in other ways such as serving on conference committees or by editing special journal issues.