The present invention relates to data analysis and processing, and more specifically, to a system and method for deriving questions and answers and summarizing textual information based upon the analysis thereof. Part of the method for deriving questions can be utilized for generating context free grammars that can parse said questions. That is, said context free grammars can in term be utilized for processing speech recognition of said questions (utterances) or for a variety of natural language processing programs, which typically rely on speech grammars.
Today, organizations such as companies, libraries, educational institutions, for example, publish and store hundreds of documents in their computer network that contain pertinent information to be retrieved by users (e.g., employees, students, and the general public). These users may typically have similar questions regarding the content of the textual information in these documents. Conventionally, these organizations may provide a document or page (e.g., a web page) for frequently asked questions (i.e., FAQs) that provides questions and answers to similar questions asked by various users. These questions may be general questions concerning for example, procedures, rules, timelines or an application process. However, these FAQs documents may not provide answers to more specific questions of a user that may be answered implicitly or explicitly by the textual information contained with the documents. Thus, it may take a user a number of days, weeks or months of reading through the documents before finding the answer to their specific questions. While the textual information of these documents is important for reference, obtaining answers to specific questions regarding the textual information is more important and useful to the users.
In a fully automated computer environment, once the questions have been captured, processing the questions for retrieving the answers typically requires the use of context free grammars that can process those questions. In this invention, substantial number of questions would likely be generated. For a software developer, handwriting the grammars that can process said questions would be a daunting task.
Therefore, it is desirable to provide a system and method for automatically deriving all potential questions and answers available from the content of the textual information within the documents.
It is further desirable to provide a system for automatically deriving the context free grammars that can process the generated questions, at the same time and in the same manner that is consistent with the method for generating the questions.