Related fields include linguistic processing of semi-structured documents, mining of networked data, artificial intelligence, and probabilistic scoring. Industrial applications include, but are not limited to, professional networking, recruiting, demographic studies, and trend-spotting.
Finding known individuals has become vastly easier with the development of large-scale networks and efficient search engines. Increasingly, such networks are also used to find unknown individuals with specific desired qualifications (collectively, “candidates.”) To do this, searchers gather and analyze documents describing specific people as having those qualifications, such as online resumes or profiles (collectively, “biographies.”)
In most cases, candidates with a given expertise are sought by searchers who do not share it. A recruiter with a background in human resources works on behalf of hiring managers with numerous different backgrounds. A salesperson's target market is often not other salespeople. Someone considering a career change wants informational interviews with those already in the prospective new career. A student or junior professional seeks a mentor. A working team notices a need for a skill the present members lack.
Hiring, or being advised by, the wrong person can do significant harm. Studies show that the cost of hiring the wrong person is often equivalent to 6 months of that person's salary. In a position to influence business directions and use of resources, the wrong person may do irreversible damage.
Automated search engines for the Internet and smaller networks are optimized to find and rank bodies of information. Boolean search techniques, which link user-selected keywords or key-phrases with logical operators such as “and,” “or,” and “not,” are highly effective for characterizing information. Improvements include automatically searching for different forms of the same word (“stemming”), learning synonymous terms, and bracketing quantities (e.g., “published between 1995 and 1998,” “costing less than $30 USD”). These “semantic search” extensions reduce the incidence of a wanted result being excluded because the text has something similar but non-identical to the search term.
In U.S. Pat. No. 7,599,930, Burns & Rennison develop one approach tailored to evaluating resumes. They represent concepts by patterns and tokens, apply hash functions, and find matches in a lexicon or ontology often implemented for fast lookup in a hash table or a database. Potential pitfalls can occur if items in the resume are not in the underlying ontology (e.g., a period of employment with a small independent company) or if the ontology is not equipped to disambiguate similar names (e.g., if the collection of patterns that were being applied only had “University” within a few words of “Texas,” then resumes citing “University of Texas at Austin” and “Texas A&M University” could not be separated by this method).
Boolean-based searches have also been used to identify individuals having selected qualifications or connections to other individuals, according to information available on the network. However, the volume of accessible information can be overwhelming and is constantly increasing. Besides, information about people (apart from filled-in forms with minimal opportunity for improvisation) is subject to contextual nuances that conventional search engines often do not detect. Consequently, the result list from such a search can be unmanageably large, swollen with erroneous returns.
For example, suppose a company wants to recruit someone experienced to maintain its internal computer network. The most common relevant job title is “system administrator,” also sometimes “systems administrator.” Others with less-common titles might still list “system(s) administration” among their duties. Putting those terms into a Boolean-based search engine for the entire Internet would probably return the online resumes or profiles of people suiting the company's needs. It would also return sites for schools that train systems administrators, news about careers in system administration, and every site that mentions its own system administrator or anyone else's. The number of returns could reach into the millions, but some of those wanted might still be missed because of semantic differences.
To limit the results to resumes and profiles of system administrators, one can either add ‘and (resume or profile)’ to the all-Internet search, or search inside a specialized database of resumes or profiles. Either way, more of the wanted returns are likely to be missed because of the additional constraint; the word “resume” or “profile” may not, per se, be in the document or the person may not be in the database(s) chosen. Also either way, the result list will still be very large and glutted with unwanted returns: managerial ‘administrators’ of school ‘systems,’ ‘administrative’ assistants at companies with names containing ‘Systems’; perhaps even health workers trained on ‘systems’ for ‘administration’ of anesthesia.
Perhaps someone in the company happens to know that the computer-related type of system administrator will often use the abbreviation “sysadmin,” while those other types generally do not. Adding ‘and sysadmin’ produces a more computer-oriented result list, smaller than the previous ones but still perhaps hundreds of returns long. The returns encompass sysadmins of many levels and subspecialties, including individuals with no relevant experience who list it as a future goal or include it in a keyword-list or metadata. The returns also include those who are not sysadmins themselves but manage, train, or offer products and services designed for them. In common experience, the best department manager is not necessarily the best at performing the actual work of the department, nor vice versa. Meanwhile, more wanted returns will almost certainly be excluded for lack of the abbreviation; many job-hunt advisors discourage use of such “insider language” in resumes and profiles.
Some result lists from Boolean-based search engines are ordered by the number of times the search terms appear in the document. This is a fairly helpful approach when searching for reference materials, but not for candidate resumes or profiles. Consider that “sysadmin” would appear 5 times in a resume describing relevant work for 5 different employers for less than a year apiece, but it would only appear once in a resume describing 10 years' work for a single employer. Alternatively, the result list may be in chronological order with the newest first; they may be ranked by how many times the document has been viewed (no matter by whom); the result order may be alphabetical by name or completely random; or originators may be able to jump to the top of the list by paying a premium. Other search engines, to hamper aspiring list-jumpers, do not fully disclose how their result lists are organized. None of these ordering methods are viable proxies for how well a resume or profile fits a set of desired characteristics, so a significant part of the list may need to be perused before even the first promising candidate emerges.
By contrast, a human very familiar with both the relevant field and the searcher's needs can often select or reject a candidate resume or profile within seconds of quickly skimming the biography. For several reasons, though, this is seldom a practical solution. Such an individual may not be available within the often-urgent timeframe. If available, they require payment that matches their considerable expertise. At a rate of 1 minute per evaluation, a result-list of 3000 resumes or profiles would require 50 expert-hours to sort.
Some human recruiters can reportedly sift 500 resumes per day, but at that pace thoughtfulness is likely to be compromised. A human quickly scanning for terms that “jump out” is arguably performing a machinelike keyword search, which as discussed above has yielded suboptimal results and invited biography writers to attempt to fool the system. Additionally, humans attempting to process information too quickly are subject to error sources to which machines are immune. A human brain immediately reacts to whether the esthetic aspects of a document match its subjective preferences, and only then begins to absorb the document's content. When a human skims through biographies too rapidly, both positive and negative decisions can easily be contaminated by subjective esthetics. Moreover, a human's rapid-processing acuity is sensitive to brain oxygenation, blood sugar, emotional state, and other factors that change over the course of a day. Because biographies in the result list of a conventional search engine are not necessarily in any more useful order than paper resumes that arrive chronologically in the mail, the search engine does not mitigate the need to analyze many, many biographies in what is likely to be insufficient time for high-quality thought.
Therefore, a need exists for someone from one field to be able to reliably identify those candidates from another field who are best suited for a particular set of requirements, and do so quickly and cost-effectively even when the initial pool of candidates is very large.
Identifying promising candidates, while a challenge in itself, is only the first step in most of these processes. The next step is usually to contact those candidates and pique their interest. Unless the candidate craves new contacts or the searcher credibly offers something the candidate already wants, approaching the candidate as a complete stranger is likely to fail. Referral by a mutual acquaintance can help immensely.
Online social networking sites have made it possible to determine quickly whether a searcher and a candidate have mutual acquaintances, and if so, who they are. When Searcher queries a social-networking application about a particular named Candidate, a resulting “referral path” (if any are found) is of the form “Searcher knows A, A knows B, and B knows Candidate.” Most of these applications can only find a referral path if every person represented by a node on the path has entered a biography in the same network and has affirmatively acknowledged (“published”) a connection to the nearest neighbors on the path. Thus, even in very widely used social networks, a single missing link can, sometimes inadvertently, block many connection opportunities. In U.S. Pat. No. 7,818,396, Dolin et al. enable a member of a first social network to retrieve profile and connection data from additional social networks into an aggregate social graph. This method, however, only provides additional data about people who are already members of the first social network.
Therefore, a need exists for effective synthesis of information about people's qualifications and connections from multiple sources with disparate information structures.
At the other end of the spectrum, those “power users” of existing social networks with hundreds or thousands of connections may find themselves with multiple referral paths to a new person they decide to contact. The user must then either take a scattershot approach with many paths, or research all the intermediate links to determine the most promising path.
Typical networks, if they rank alternate referral paths at all, do so only by the number of degrees of separation. For example, in U.S. Pat. App. Pub. 2003/0187813, Goldman & Murphy link data from multiple databases through a central database, calculate the shortest referral paths between pairs of users, and score longer paths by likelihood of closeness. Since each link may represent a single meeting or years of association, and may be social, professional, or both, ranking by degrees of separation does not necessarily identify the best referral path.
Some refinements, such as Hardt's in US2010/0082695, estimate closeness of connection by, for example, the number of times a pair of people have communicated. This is only practical in a microcosm, such as an enterprise where employees consent to have their electronic communications logged. Pitfalls exist, such as a tendency to talk to one's closest colleagues in person (which would not be logged by the system), and the multiplicity of communication generated by non-close interactions such as confusion over details in a seldom-used procedure. Therefore, a need exists for rapid comparison of multiple referral paths and recommendation ranking of those paths based on meaningful variables.
To summarize: Most of the Boolean and other keyword-based search engines are sub-optimal for finding candidates through a keyword search on desired characteristics. Biographies are structurally, semantically, and idiomatically different from documents containing other types of information. Meanwhile, most social-network advancements have concentrated on locating individuals with known identities rather than identifying individuals with desired characteristics. Large networks of resumes, profiles, and other biographies would be leveraged much more efficiently to find and reach candidates if the speed of automated search were combined with the nuanced judgment of a human—specifically, a human very familiar with biographies of the type of candidate sought. After identifying a human-manageable number of candidates with the requested qualifications, the system would continue to an automated survey of any referral paths between the searcher and each candidate. If multiple referral paths are found, the system would choose the path most likely to yield a prompt, well-received introduction of the searcher to the candidate. This choice is necessarily based on at least some of the criteria a human would consider important.