The present invention relates generally to methods and systems to answer a question, and more particularly to methods and systems to accurately answer a natural-language question.
Numerous search engines in the market have provided us with an unprecedented amount of freely-available information. All we have to do is to type in our questions, and we will be inundated by information. For example, there is a search engine that regularly gives us tens of thousands of Web sites to a single question. It would take practically days to go through every single site to find our answer, especially if our network connections are through relatively low-speed modems. We do not want thousands of answers to our questions. All we want is a handful of meaningful ones.
Another challenge faced by users of many search engines is to search by key words. We have to extract key words from our questions, and then use them to ask our questions. We might also use enhanced features provided by search engines, such as + or xe2x88x92 delimiters before the key words, to indicate our preferences. Unfortunately, this is unnatural. How often do we ask questions using key words? The better way is to ask with a natural language.
There are natural-language search engines. Some of them also provide limited number of responses. However, their responses are inaccurate, and typically do not provide satisfactory answers to our questions. Their answers are not tailored to our needs.
Providing accurate responses to natural language questions is a very difficult problem, especially when our questions are not definite. For example, if you ask the question, xe2x80x9cDo you like Turkey?xe2x80x9d, it is not clear if your question is about the country Turkey or the animal Turkey. Add to this challenge is the need to get answers quickly. Time is very valuable and we prefer not to wait for a long time to get our answers.
To further complicate the problem is the need to get information from documents written in different languages. For example, if we want to learn about climbing Mount Fuji in Japan, probably most of the information is in Japanese. Many search engines in the United States only search for information in English, and ignore information in all other languages. The reason may be because translation errors would lead to even less accurate answers.
It should be apparent from the foregoing that there is still a need for a natural-language question-answering system that can accurately and quickly answer our questions, without providing us with thousands of irrelevant choices. Furthermore, it is desirable for the system to provide us with information from different languages.
The present invention provides methods and systems that can quickly provide a handful of accurate responses to a natural-language question. The responses can depend on additional information about the user and about the subject matter of the question so as to significantly improve on the relevancy of the responses. The user is allowed to pick one or more of the responses to have an answer generated. Furthermore, the answer to the question can be in a language different from the language of the question to provide more relevant answers.
One embodiment of the present invention includes a system with an input device, an answer generator and an output device. The answer generator, having access to a database of phrases and question formats, identifies at least one phrase in the question to generate phrased questions. This identification process uses phrases in the database and at least one grammatical rule.
The identified phrase can then be linked to at least one category based on, for example, one semantic rule. Then the system provides a score to the categorized phrase. This score can depend on a piece of information about the user and/or about the subject matter of the question. In one embodiment, this piece of information is different from the fact that the user has asked the question.
The piece of information can be related to the user""s response to an inquiry from the system. For example, the system can ask the user to specify the subject matter of the question. Assume that the user asks the following question: xe2x80x9cIn the eighteenth century, what did Indians typically eat?xe2x80x9d The system can ask the user if the subject matter of the question is related to India or the aboriginal peoples of North America. Based on the user""s response, the system can provide a more relevant response to the user.
In another example, the piece of information is related to an interest of the user. Again, if the user is interested in traveling, and not food, certain ambiguities in his question can be resolved. Based on the user""s response to certain inquiries from the system, the accuracy of the answer can be enhanced.
In another embodiment, the piece of information about the user is related to a question previously asked by the user. For example, if the user has been asking questions on sports, probably the word, ball, in his question is not related to ball bearings, which are mechanical parts.
Typically, the more information the system has on the user and the subject matter of the question, the more accurate is the answer to the user""s question. The reason is similar to the situation of our responding to our friend""s question before he even asks it. Sometimes we understand what they want to know through non-verbal communication or our previous interactions.
Based on information on the user, the score of the categorized phrase can change. In another embodiment, based on information of the subject matter the question is in, the score of the categorized phrase can change.
After providing the score to the categorized phrase, the system can identify at least two question formats in the database based on the score. These question formats can again help the system resolve ambiguities in the question. For example, the question is, xe2x80x9cHow to play bridge?xe2x80x9d Assume that the question is in the general subject area of card games. It is not clear if the user wants to find out basic rules on the card game bridge or to learn some more advanced techniques. Then, one question format can be on basic rules on bridge, and the other format can be on bridge techniques. The user is allowed to pick at least one of the question formats to have the corresponding answer generated.
In another embodiment, the answer can be in a language different from the language of the question. This improves on the accuracy of the answers to the question. For example, if the user is interested in Japan, and if the user understands Japanese, based on the question format picked, a Japanese answer is identified to his English question. Such answers can provide more relevant information to the user.
Other aspects and advantages of the present invention will become apparent from the following detailed description, which, when taken in conjunction with the accompanying drawings, illustrates by way of example the principles of the invention.