Modern computing and networking technology including the Internet makes it possible to organize, store and transfer large bodies of electronic data with minimal effort virtually anywhere in the world. With so much material so easily accessible, many have realized that the real issue is no longer getting enough information, but sorting out what is useful to them from vast quantities of irrelevant material. In a conventional search system, indexing, also known as marking up, the content of a document means that certain terms or keywords are selected or otherwise used to represent the content of the document. Online search engines often use these index terms to locate web-based resources. A typical online information retrieval system matches query terms with the index terms to identify relevant information. Unfortunately, when search terms occur in inappropriate contexts, queries to these systems often retrieve irrelevant material. Most users need to find answers to very specific questions, but this type of conventional search lacks precision. Information retrieval systems using statistical methods and natural language processing also suffer from inherent query ambiguity: these systems cannot identify the context of the query terms precisely.
This need for precise and fast retrieval of information is even more pronounced in certain professions. For example, to provide optimal patient care, health-care professionals in clinical environments need to retrieve, in a timely fashion, accurate and up-to-date health-care and medical related information from a variety of online resources such as electronic textbooks, scientific journals, news, research papers, edited reviews, and medical databases. In recent years, researchers have developed a variety of systems to improve the indexing and searching of these electronic resources. The primary goal is to increase research precision/specificity (i.e., fraction of relevant search hits/results to all hits) without severely reducing recall/sensitivity (i.e., fraction of relevant hits to all possible relevant hits in the database).
For example, a content-based indexing system is disclosed in “MYCIN II: Design and Implementation of a Therapy Reference with Complex Content-Based Indexing” Proc Amia Symp 1998: 175-179, by Kim et al. Co-developed by co-inventors Fagan and Berrios, MYCIN II is a prototype information retrieval (IR) system capable of searching content-based markup in an electronic textbook on infectious disease. Users select a query from a pre-determined set of query templates in the query model. The selected query is then passed to a search engine for processing.
A markup tool for the MYCIN II search engine was developed by co-inventors Berrios and Fagan and disclosed in “Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease” Proc Amia Symp 1998:975. Domain experts use the markup tool to provide the Hypertext Markup Language (HTML) indexing required for the MYCIN II search engine. As is known in the art, indexing medical textbooks and references relies heavily on the domain expert's expertise and knowledge, e.g., an infectious disease expert, to properly generate concepts, markup medical text based on these concepts, and generate query templates. Accordingly, it is no surprise that, in this system, a significant amount of manual work was required by the domain experts to generate the ontology of concepts in the concept model and the set of questions in the query model.
In addition, because the tools such as the markup tool and the search engine in the MYCIN II system were developed independently, there was minimal integration amongst them. This resulted in having the domain experts repeating several common tasks when using these tools.
In the above-referenced U.S. patent application, we addressed the aforementioned deficiencies by providing a highly integrated system and method that significantly increases search precision while reducing the amount of manual work, repeated common tasks, time, and cost necessary to markup/index a file of electronic text for searching.
There is a continuing need, however, to correspondingly and appropriately update the indices whenever a previously indexed document and/or the query model changes.