1. Field of the Invention
This invention pertains in general to information retrieval, and more specifically to contextual personalized retrieval of information in response to user queries.
2. Description of the Related Art
Information retrieval systems face several daunting problems with delivering highly relevant and highly inclusive content in response to a user's query. These problems include synonomy, polysemy, spelling errors, abbreviations, and word concatenations in both the queries and the documents being queried. Information retrieval systems further face problems with partial matches, incomplete queries, complex meanings that extend beyond the words entered in queries and account for the relative significance of a users' query in a document, and the implicit preferences of the individuals conducting queries that were not specified in the query but can be inferred by the information retrieval system. These types of problems can be faced in the searching of various types of documents. For instance, these problems are illustrated in searches conducted for candidates to fill job openings or searches through résumés for particular criteria that match a set of desired criteria in a job description. Some examples of these types of common problems with searches are described in more detail below (using the job search model example for illustration):                Synonymy: There may be many different ways to refer to the same thing, and thus a query using a particular term might not retrieve search results including documents stating synonyms for that term. As one example that involves a job search situation, a résumé document may contain one set of words that refer to a concept (e.g., J2EE), while the job requisition (e.g., job description or list of skills, experience, etc. that a company is looking for in a job candidate) or the query may use a different set of words to refer to the same concept (e.g., Java 2 Enterprise Edition).        Polysemy: The same word(s) can have many different meanings For example, the word “Berkeley” can refer to the university, “UC Berkeley,” the city of “Berkeley, Calif.,” a company called “Berkeley Systems, Inc.,” etc.        Spelling Errors: There may be spelling errors in a document being searched (e.g., in résumés, as well as the job requisition/query). Thus, a query, “Berkeley,” will not retrieve a document incorrectly stating “Berkley.”        Abbreviations: Similar to the synonymy issue, various different abbreviations can be used to refer to the same term. For example, a résumé can use the abbreviation “NYC,” but a query constructed to search through a database of résumés might use the search term of “New York.”        Concatenation Words: Certain words can be concatenated in some instances, but remain separated in others. A résumé can contain the term “MS Office,” whereas the query can be “MSOffice”        Partial Matches: There can also be partial matches for certain terms. For example, a document can contain the term “Stanford,” whereas the query might be “Stanford University.”        
In addition, different users may have different requirements and preferences, many of which are not entered as part of the search. Users commonly do not know exactly what they are looking for when conducting a search. Users often do not have the time to be complete and to explicitly specify all the parameters of their search. Even if a user was complete and explicit about all of his parameters, the user might not find any matches because very few candidates would meet all of that user's criteria. Moreover, users do not always know exactly what they are looking for until they see a few results, at which time they can refine their search. Thus, in general, preferences may not be known until a number of outcomes are experienced.
Another problem faced in searching is that, given the exact same search, two different users may have an entirely different ranking of the search results. Thus, the search results may need to be tailored to the person for whom the search is being conducted.
Accounting for hierarchical relationships when searching can also pose a problem. For example, when a user searches for people who went to U.C. Berkeley, the user expects to see people went to Haas Business School, or Boalt Law School within U.C. Berkeley. However, when a user searches for people who went to Haas, the user does not likely expect to find people who went to Boalt, or other departments of U.C. Berkeley, in general, outside of Haas.
A further problem is accounting for degree of match regarding search results. A piece of information may only contain part of a particular search criterion, so it may be necessary to look at how much of the search criterion is actually contained within the information. Search systems often fail to consider hierarchical relationships in this analysis. For example, if a résumé describes someone who has J2EE experience, that person will implicitly have Java experience. However, someone who has Java experience will not necessarily have J2EE experience. Further, many search systems do not support inclusion of scoring of documents under a hierarchy. For example, if a user's search criterion is “Web Application Server,” then the system should be able to differentiate between a document that has BEA WebLogic and IBM WebSphere, and document that only has BEA WebLogic. In addition, commonly search systems are not be able to support the ability to measure the relative importance of content in a document. For example, if a user is searching for candidates with résumés who have “5 years of Web Application Server” experience, then the system should be able to differentiate between a résumé that lists 3 years of WebLogic experience and 2 years of WebSphere experience, and a résumé that lists 5 years of WebLogic experience and 1 year of WebSphere experience based on date information extracted from the résumés that is correlated to specific contents of the résumés. Search systems also sometimes fail to have the ability to determine how recent the search requirement is within a document. Degree of match calculations such as these should be configurable and adaptable.
Another problem faced by search systems is that not all search criteria are equal, and not all documents are equal. For example, if a user is searching for a résumé that lists “5 years of Web Application Server” experience, then the system should be able to differentiate between a résumé that refers to 4 years of WebLogic experience and 2 years of WebSphere experience, and a résumé that refers to 6 years of WebLogic experience and 1 year of WebSphere experience depending upon collection of resumes in the pool, AND who is doing the search. If all of the résumés in the pool list WebLogic experience and only a few people have WebSphere experience, then the first résumé should be ranked higher than the second résumé. However, if all of the résumés in the pool list WebSphere experience and only a few list WebLogic experience then the second résumé should be ranked higher. If all of the résumés in the pool list WebLogic experience and only a few résumés have WebSphere experience, but the project for which these resumes are being searched is based on WebLogic and not WebSphere, then the second résumé should be ranked higher than the first. A search system should be able to figure out the relative importance of all the search criteria, and personalize the importance of criteria for different individuals.
Furthermore, search systems are generally unable to mimic the way that a human performs a search or finds documents. The system should place a higher priority on concepts (e.g. skills and experience) that are more recent (e.g. from within the last two years). The system should understand which set of concepts (e.g. skills) are more important than others for a particular user. Setting “required,” “desired,” and “undesired” parameters can be helpful, but in many cases it is much more subtle and complicated to figure out which sets of concepts go together and are more important. In addition, the solution should be intuitive and easy to use (since the more “knobs” people have, and are required to turn, the less likely people will turn them). The system should be able to handle hidden criteria. For example, the user may prefer to hire people from competitors, thus the system may need to infer the value or weight of these criteria. As another example, a user may not want to hire over-qualified people, and so the system may need to infer the value or weight of job titles. Furthermore, the system should consider how much experience a résumé reflects that a candidate has working in a certain industry and regarding specific sets of skills. Additionally, the system should consider how long the candidate has held particular job positions (e.g., too short or too long may not be considered desirable).
Previous Approaches
A number of different approaches have been used for attempting to solve some of the problems delineated above, including keyword searching or Boolean queries, concept tagging and conceptual searches, automatic classification/categorization, entity extraction using natural language parsing, and the like. These approaches and their limitations are described in more detail below.
Keyword Searching or Boolean Queries
Keyword searches and Boolean queries do not fully address some of the most basic full-text search problems, including synonymy, polysemy, spelling errors, abbreviations, concatenations, and partial matches. Synonymy can be addressed using Keyword expansion or elaborate Boolean queries, but very few people know how to perform these types of queries, and even when an elaborate query is constructed, it can still bring back the wrong results because of the other problems. Polysemy can be addressed by contextualizing the search to a specific field, but results can be missed because of spelling errors, abbreviations, concatenations, partial matches, etc.
Concept Tagging and Conceptual Searches
To address the enormous problems surrounding keyword searching and Boolean queries, a commonly accepted practice is to tag documents with “concepts,” i.e. map documents into a “concept space,” and then map the query into the same “concept space” to find search result. If this is done properly, this approach can address the some problems of synonymy, polysemy, spelling errors, abbreviations, concatenations, and partial matches, with one solution. The key question is how to accurately extract concepts from documents with the highest degree of precision and recall. To be successful when working with résumés (as well as other types of documents), the concept matching algorithms must handle text strings of text strings that range from a single word to multiple words with no grammatical structure to short phrases to sentences, paragraphs, and long documents; all with the same degree of accuracy.
Several approaches are being used today with varying degrees of success. These include categorization, entity extraction using natural language parsing, and manual tagging, as described below.
Automatic Classification/Categorization
There are several algorithms used currently to automatically categorize a document into a taxonomy of concepts. These algorithms typically use various forms of Bayesian Networks with apriori learning to classify documents. The limitations with this approach include the following:                A low degree of accuracy, usually in the 60% to 80% range        A significant amount of training is required in order for the classifiers to work properly. This training requires manual intervention, either with selecting a set of documents to train the classifier how to recognize a concept, or by “interpreting” the results of an automatic taxonomy generator        Poor results, or it simply doesn't work, with short phrases or a string with a few words        No ability to match an input query of a few words into concepts in the taxonomy—this defeats the purpose of concept-based searching in the first place        
While automatic classification/categorization software can provide some benefits, these limitations make it unlikely to provide sufficiently useful results.
Entity Extraction using Natural Language Parsing
Extracting concepts from text using natural language parsing (NLP) techniques is another method commonly used. This approach uses semantic or lexical analysis to parse text into parts of speech. These lexical elements are then matched against grammar rules to extract entities from the text. While this approach is useful for extracting new concepts out of full text documents, it suffers from a number of limitations that make it unusable as a complete solution when dealing with résumés (as well as other documents), including the following:                Content may not have any grammatical structure, and hence the parsing simply fails        Very brittle—If the text does not follow the grammatical rules, then concepts are missed        Does not work well when there is ambiguity in the text        Language dependent        Even when you have successfully extracted a string containing a concept, it still has to be matched up against other known concepts, and in doing so, the concept must be normalized to account for spelling errors, synonyms, word order, abbreviations, concatenations, etc.        
While Entity Extraction using NLP is useful for finding (potentially) new concepts, it is generally not sufficient for finding existing, or known, concepts.
Traditional Collaborative Filtering Engines
Traditional collaborative filtering engines tend to work well under the following conditions:                When there are a closed number of items (e.g. there are a finite number of books, music tracks, products, etc.)        When the number of users (U) is much greater than the number of items (I):                    U>>1                        Most of the items have been seen and rated by at least one of the users        
These conditions exist in large market places, such as for companies like AMAZON®. Unfortunately, with most search-related applications, especially when searching résumés, the above conditions do not hold. In fact, the conditions are the opposite, as follows:                The number of searchable items, e.g. résumés, is increasing and changing constantly—new résumés are arriving every day        The number of users is much LESS than the number of items:                    U<<1                        More than likely, very few of the items/resumes have been seen and rated in the past        
Given these conditions, traditional collaborative filtering techniques do not work with résumés, or other enterprise document search applications. It is preferable to deliver personalized search results in order to deliver a successful search solution (e.g., for the recruiting process). The current approaches described above do not effectively address this problem.