1. Field of the Invention
The invention generally relates to a customized, specialty-oriented database and index of a subject matter area, and methods for constructing and using such a database and index. In particular, the invention provides a customized, specialty-oriented database in which articles are selected and indexed by experts in the topic with which the database is concerned, in a manner that allows the database to respond to queries as an expert system; providing facile, rapid retrieval of highly relevant articles with few or no false positives.
2. Background of the Invention
The ability to search and retrieve information electronically is a cornerstone of the “Information Age”. This ability allows large segments of the population to rapidly access vast amounts of information, with clear benefits for many diverse endeavors.
In particular, for professions that rely heavily on published material as a source of current information (e.g. health and science professionals), the ability to electronically search a large database for current, up-to-date information on topics of interest has been a great boon. For example, health professionals have access to databases such as MEDLINE, which was established by the National Library of Medicine in order to facilitate the dissemination of information to the health care and scientific communities. MEDLINE is currently considered to be the “gold standard” for a periodical bibliographic database and virtually all services link to MEDLINE.
However, because the present knowledge-based society is continually producing published documents at an extremely rapid rate, databases have become repositories of stunningly vast collections of published information. For example, as of this writing, MEDLINE contains in excess of 15 million articles that are accessible via search engines such as PubMed, and is growing daily. While access to this vast repository of information is in some ways advantageous, serious problems arise when an individual attempts to electronically search this database for materials truly relevant to a specific topic of interest. It is not unusual for a MEDLINE search, no matter how well crafted, to result in the return of several tens of thousands of “hits” that correspond to the designated parameters, yet many of which are not relevant. The potential for this problem was recognized as early as 1948, when claude Shannon of Bell Labs stated: “classic informatics theory shows that as information increases, the amount of irrelevant and inaccurate information (often referred to as “noise”) increases.” Even to a highly skilled individual, scrolling through a list of several thousand articles in order to identify those that are truly apposite is a daunting, extremely frustrating and impractical task. Finessing search parameters is time-consuming, and frequently still results in a very large number (many hundreds or several thousands) of returned or retrievable documents. Yet, adding search terms to reduce the number of hits may inadvertently cause exclusion of desired information.
The retrieval of such a large number of documents per search when accessing a large database is in large part due to the classification procedure used by such services, which fail to distinguish between the most relevant articles, and those which include only ancillary references to the topic of interest (e.g. in a book or product review, or a minor section of a lengthy paper or review article, etc.). This is due in part to the large initial pool of journals (and other) sources of the indexed articles, the very large number of articles that are indexed (relatively indiscriminately) from those sources, the large number of “MeSH” (Medical subject heading) terms that are used to index the individual articles, and the large number of codes that are assigned to each article. For example, MEDLINE currently has in excess of ˜15 million indexed articles drawn from ˜4600 journals. About 30 codes are typically assigned to each article, and about 22,000 Mesh terms are utilized in the database. Given that MEDLINE was designed for the academic and research community, the need for completeness and sensitivity, rather than efficiency and relevance, was paramount in its design.
This is not the case for medical clinicians, for whom specificity, relevance, and efficiency are of greater concern. The return of a large number of documents from a search can thus be a particular problem in the practice of medicine today, which requires speed in obtaining current, up-to-date information on specific topics. In particular, physicians are typically in need of literature concerned with the diagnosis and treatment of specific diseases. Physicians are paid by HMOs based on the number of patients they see, not the time spent with each patient, and information gathering activity (or access to the clinical literature) is not reimbursed by third party carriers. Thus, minimization of total retrieval time of relevant, up-to-date information (which includes realizing the need for information, accessing a search engine, picking search terms, doing the search, and retrieving and assimilating the search results) is a vital issue, as this activity is likely to be confined to short breaks between seeing patients during the course of the day. Therefore, time is of the essence in retrieving and applying information from relevant articles.
Physicians are also concerned with the potential for malpractice claims. Rapid access to current, relevant information may help to avoid or to mount a defense against such claims. In addition, Board Certified Specialists must meet a minimum number of Continuing Medical Education (CME) credits annually and must pass Board Certification exams every 5-7 years, depending on the specialty. Failure to qualify may result in removal from HMO listings and a significant loss of compensation. In addition, a minimum number of CME credits is required to maintain state medical licenses. Frequently, peer-reviewed journal articles directed to a specialized topic are the basis of these exams and are thus helpful during preparation for exams.
Another issue facing general practitioners is the requirement that they function beyond the level of a generalist. Physicians are encouraged or mandated by HMOs to handle more complex cases rather than refer them to a specialist. This enables the physician to increase his/her personal income and to reduce the HMO costs by avoiding referral to a specialized expert. Other current issues for physicians include their desire to keep patients satisfied. Patients have an ever-growing access to medical information on the Internet, but no way to evaluate its validity or relevance. Less relevant or possibly invalid information may cause inappropriate or even harmful action to be taken. Studies show that patients would prefer to receive this information directly from their personal physicians, whom they trust. However, no source for rapid retrieval of only highly relevant information currently exists for physicians or their patients.
For example, of the over 100 million people who used the Internet in 2003, 75% of these individuals sought reliable medical information. Ninety percent of physicians now access the Internet and 71% expect to increase their use in the future. However, neither physicians nor patients are satisfied with current on-line sources of medical information. Patients do not trust the validity of medical information on many Web sites, and doctors want higher quality access to more relevant journal articles for faster, more confident medical decision making.
The prior art has thus-far failed to provide an electronically accessible database that is developed by seriously taking into account the relevance and appropriateness of an article for inclusion in the database, particularly in regard to multiple anticipated groups of users, and for its facile and reliable retrieval by the user. Such a database should permit rapid retrieval of a manageable number of highly relevant documents per inquiry, and eliminate articles that are marginally relevant to the topic of interest, or to the doctor's specialty. However, no system meeting these criterion has previously been developed.