The integration of document processing, query generation and user feedback continues to challenge information retrieval (IR) technologies. While search portals that are readily accessible on the Internet and corporate intranets remain among the most successful information retrieval applications, their ability to generate queries and utilize user feedback has some limitations. For example, these search portals typically require that users state their information needs in explicit queries. While this rigid protocol may benefit information technology professionals, lay users have difficulty formulating and satisfying their information needs through explicit queries.
In addition, the query processing mechanism is typically the same for all users, and does not allow fast and intuitive customization. Feedback is obtained through continuous solicitation of relevance judgments, which disrupts many users"" information seeking behaviors and subsequently discourages them from either using the search portals or providing feedback. Even when provided, feedback is commonly utilized in the query space alone. Consequently, the search portals"" behavior remains the same over multiple interactions.
While the search portals allow users to perform searches on different topics over the Internet, corporate intranets, and private databases, they neither support nor integrate with document processing. Thus, to perform a search relevant to the document at hand, users must disengage from document processing to use a different application. On the other hand, current text editing and word processing applications allow users to create documents about any topic or issue,but lack the means to integrate document creation with simultaneous retrieval of relevant information.
Therefore, what is lacking in the art is the integration of document processing, query generation and feedback in an application-embedded distributed IR system. The implementation of such a system for text processors would make IR transparent yet responsive to the needs of common computer users. What is needed is a non-intrusive, feedback-sensitive IR system that users can embed into their applications to tap into and monitor information sources while still engaged in routine usage of those applications. Such applications include text processing, spreadsheets and other commonly used software. The need for such a system is motivated by a growing number of information sources with a wealth of data, particularly over the Internet, but with few tools to timely and efficiently put the data to use.
In view of the above, a system and a method are presented for application-embedded information retrieval from distributed free-text information sources. An application""s usage is sampled by an embedded IR system. Samples are converted into queries to distributed information sources. Retrieval is managed and adjusted through a user customized interface. The IR system is preferably embedded in a text processor.
A system for embedded distributed information retrieval includes a module for embedding a distributed information retrieval system in a computer application program. A free-text parser is coupled to the application program. The free-text parser is operative to receive scheduled reads of textual information from the application program, parse the textual information into sentences, and rank the sentences by their content-bearing capacities. A query engine is coupled to receive free-text sentences and generate structured queries in response thereto. The query engine includes a semantic network processor program, and is coupled to at least one knowledge base. A metasearch engine is coupled to receive and submit the structured queries to at least one information source. A retrieval manager is coupled to the metasearch engine. The retrieval manager receives the retrieved links associated with the structured queries, and ranks and filters the retrieved links based upon predefined criteria.
A method for generating structured queries in an embedded distributed information retrieval environment includes receiving scheduled reads of textual information, and parsing the textual information into sentences. The found sentences are ranked by their content-bearing capacities based on their terms, i.e., words and phrases. Structured queries are then generated using a semantic network processor program. The structured queries are submitted to at least one information source. Retrieved links associated with the structured queries are received. The retrieved links are ranked and filtered based upon predefined criteria.
The present invention accordingly provides the integration of document processing, query generation and feedback in an application-embedded distributed IR system. The presently preferred implementation is to embed such a system in a text processor application, but other application programs that include textual or numeric data can readily take advantage of the benefits of the invention. These benefits include a non-intrusive, feedback-sensitive IR system that users can use to automatically tap into information sources while still engaged in routine usage of the underlying application program. By automatically generating structured queries in the background, such a system allows periodic access to the growing number of information sources provided over the Internet, as well as on proprietary and intra-corporate data sources. The frequency of query generation and the relevance of retrieved information are controlled by the user to tailor the information retrieval process to the user""s precise needs and desires.
These and other features and advantages of the invention will become apparent upon a review of the following detailed description of the presently preferred embodiments of the invention, when viewed in conjunction with the appended drawings.