This invention relates to information retrieval technologies, and more particularly to a method for retrieving documents by intelligently matching a query string to one or more pre-stored strings. A novel ranking method is employed for said intelligent matching.
Frequently Asked Questions (xe2x80x9cFAQsxe2x80x9d) are commonly presented by customers to a company. Due to the high repetition of FAQs, standard answers are usually pre-stored in a database retrievable by a query inputted into the system. A customer may present the question by dialing into the IVR system of the company, or may input the query at the website of the company.
Natural language queries are more acceptable to common customers as no special searching rules are required to be understood. A questioner can simply input a question (a query string) in natural language into the retrieval system and receive the prestored, correct answer. This is implemented by a mapping technique used inside the retrieval system. Specifically, a group of sample questions are pre-stored in the database, each with a corresponding answer. Upon receiving a query in natural language format, the system intelligently maps, by using a relatively complex, artificial intelligence algorithm, the query question to a pre-stored sample question which is coupled to an answer.
Due to the casual use of words in a natural language query string, it is important to improve the technique in successfully mapping the query string to a sample string. At present, natural language processing techniques are able to detect equivalent strings (strings that have the essentially the same meaning as the query string). They may detect the equivalent strings that are worded very differently from the query string and reject strings that are worded similar to the query string but have a different meaning. Usually more than one equivalent string is mapped to the same query string and ranked by meaning. An answer coupled to the top ranked equivalent string (i.e., that which has a meaning closest to the input string) will be retrieved and displayed to the questioner.
However, there is no technique to further distinguish equivalent strings from each other if they have the same ranking in meaning. Furthermore, the ranking among equivalent strings relies solely on either correlation in meaning or correlation in wording pattern, neither of which may be accurate enough and both of which have their limitations.
Therefore, there exists a need for improved techniques for the retrieval system to map the query strings and the prestored strings more accurately.
In the novel method of the present invention, both meaning and wording pattern are taken into consideration in ranking equivalent strings. Separate modules are utilized, a first for matching the meaning of an input string to prestored questions, and a second and independently operating module for matching word patterns of an input string to a prestored string. When plural strings are deemed to have an equivalent meaning, the word pattern of each is examined and the word pattern closest to a prestored word pattern is utilized.
In a preferred embodiment, correlation in meaning and correlation in wording pattern are weighted with different factors to obtain a combined correlation for each equivalent string, and the ranking is implemented based on the combined correlation thus obtained.