This invention in general relates to information analysis and specifically relates to a system and method for generating sentence patterns used for determining the information of interest of an end user.
The world-wide web contains billions of web pages of information. In addition, a large amount of information is also stored on enterprise systems, public and commercial databases, etc. As the number of information sources increase, identifying or finding the information of interest is requiring more time and becoming increasingly difficult for a user. There is a market need to find and present the information of interest to a user from one or more of the aforementioned sources of information.
In order to display to the user his/her information of interest, culled from a body of source information in an acceptable amount of time, co-pending patent application titled “Capturing reading style”, Patent application No. 1819/CHE/2005 filed in India on Dec. 13, 2005 illustrates a method of capturing the reading style of a user, wherein the reading style is a set of one or more declared patterns. A declared pattern contains a set of source components. The user declares patterns from source components. There are different kinds of source components such as sentences, paragraphs, etc.
In the method and system disclosed herein, a declared pattern is used to determine the information of interest from an information source. There is unsatisfied market need to fully expand and apply all the manifestations of the declared pattern of an end user's reading style. There is also a need to generate a set of language-specific sentence patterns, specific to the reading style of a user, that would expand upon the declared sentence pattern, and that is used for recognizing a larger number of matching word patterns in the information source, thereby providing the ability to comprehensively and accurately determine the information of interest from an information source.
Reference resolution is performed to remove ambiguity and clearly define words such as “The President”, he, she, it, they, etc. There is an unmet market need to resolve references without a significant consumption of computing resources for information processing. Therefore references need to be resolved at run time without the use of more traditional natural language processing techniques.