1. Field of the Invention
The present invention generally relates to natural language document searching in a computer system and, more particularly, to answering queries submitted by a user to the computer system.
2. Background Description
There are several key ideas that together advance the state of the art in document search. In application Ser. No. 09/339,872, the concept of categorizing user input to determine a topic or domain in which to restrict a subsequent keyword search was disclosed. In application Ser. No. 09/570,788, we disclosed the concept of categorizing a session history of user input to further identify and narrow the topic of a search and to determine whether the user has switched topics. The keywords used in the search phase are determined by whether or not the user is refining a topic or switching topics. In the case of topic refinement, we assumed in our prior applications that the keywords used in the search stage were based solely on all the previous input that was in the identified topic. For example, if a user first said xe2x80x9cloansxe2x80x9d, then xe2x80x9cautoxe2x80x9d, then xe2x80x9cnewxe2x80x9d, determining at the point of the third input the category xe2x80x9cauto loanxe2x80x9d and keywords {loan, auto, new}, all the keywords from the input would be sent to the search engine and would be treated as equal. It is assumed that the keyword search is disjunctive and hence the effect on the results of any keyword is independent of when it was entered.
In a session-history-based search system, as introduced in the above co-pending patent applications, there were several key problems to solve: (1) topic identification, (2) topic refinement, (3) topic switch, and (4) keyword selection and/or weighting as a function of the session history. The first three of these issues were solved in the inventions disclosed in the copending patent applications.
It is therefore an object of the invention to provide a method and apparatus for solving the problem of keyword selection and/or weighting as a function of a session history of user input.
It is another object of the invention to provide a method and apparatus to answer queries submitted by a user to a computer system by providing answers based on stored documents.
According to the invention, the aim is to find the best answers by matching stored natural language documents both to the most recent query, by itself, and to the most recent query in a context that captures the recent history of interaction. To do this, answers are matched against a set of keywords extracted from the most recent query as well as a set of keywords extracted from those queries received since the last topic switch was detected. Thus, the system must detect topic switches. The detection of a topic switch suggests how far back to go in the history before which keywords are deemed irrelevant to subsequent queries. This requires us to assume that there is a set, a hierarchy of, or a partially ordered set (i.e., partially ordered by specificity) of categories to which answers belong, and that, as part of the process of setting up a system implementing this method, the categories to which each answer may belong are recorded in a file or a database. Then the system will take as evidence of a topic switch the system""s decision to display an answer to a user that does not belong to any of the most specific categories to which the last answer displayed to the user belonged. If the answer evidencing a topic switch would have been found solely on the basis of keywords extracted from the last query (in other words, would have been found solely on the basis of keywords with age 0), then the system should subsequently stop using keywords from older queries.
The user""s queries may be submitted in any way that supports an ongoing exchange with the user, such as through a Web site or via a telephone where the user""s speech is converted to text by a speech recognition system.
A central feature of the method according to the invention is for the computer system implementing this method to maintain a session history for each user session history. Keywords are extracted from each query by a system implementing this method. A graded keyword list is a list of keywords paired with ages, which are indicatorsxe2x80x94generally numericalxe2x80x94of how long ago in the session the user employed this keyword in a query. Graded keyword lists are maintained in the session history so the system can assign weights to keywords, with more recently received keywords being assigned higher weights than keywords with comparatively greater ages. The weights assigned to keywords are used in computing scores that indicate closeness of match of a document to a list of keywords. The scores for all possible answers are compared with a threshold in order to determine which answers have high enough scores to warrant selection as being proper responses to the user""s query.
Although we refer to the use in search of keywords extracted from queries, the techniques used in this method could equally well be applied to other features extracted from queries in lieu of keywords, such as (1) phrases consisting of multiple words or (2) features assigned to text on the basis of other processing. Also, a keyword found in textxe2x80x94either a query or an answerxe2x80x94may well be stemmed from or replaced by a canonical form in the process of being identified for use in a system employing this method.