It is well known in many fields of knowledge or professions, such as the medical profession, to use reference books and other printed publications to assist practitioners in conducting their required duties. For example, doctors often consult medical references when diagnosing patients. These medical reference materials typically provide a variety of information, such as the names of established diagnoses, radiological images and/or medical illustrations, imaging findings, differential diagnoses, typical pathologies, common clinical issues, and a host of other helpful materials/content. For hundreds of years, hardbound reference books have been the dominant source of medical information. However, with the advent of electronic data storage and transfer techniques, electronic libraries are becoming widely utilized.
More particularly, the recent introduction of the Internet and the world wide web (“WWW”) to the world of communication and media has increased the general ability to disperse and disseminate reference materials and related information. Extensible markup language (“XML”) was formed under the auspices of the World Wide Web Consortium (“W3C®”), an international consortium of companies involved with the Internet and the WWW. XML is a flexible, and relatively simple, text format that was originally designed specifically for electronic publishing. XML has become a widely utilized medium for the exchange of data on the WWW. Some examples of the implementations of XML, as particularly related to the dissemination of medical reference materials, are outlined generally below.
Using the WWW as an example of an immense heterogeneous database, it makes sense that utilization of XML benefits for describing data could be adapted on a smaller scale to an environmentally closed system. In fact, the broad benefits of XML markup have been recognized for improving efficacy of databases, and traditional database vendors (such as Oracle®, IBM®, and Microsoft®) have fast-tracked XML implementation modules for their traditional databases and have (or are) designing XML native DB's. Already, a cottage industry has arisen for XML-native databases that do not require the construction/deconstruction events of the traditional database programs.
This language allows designers to create their own customized tag elements, enabling the definition, transmission, validation, and interpretation of data between applications, and has been a boon to the business community, particularly publishers. It has had a profound impact on a variety of applications ranging from inter-bank transactions, to online catalog maintenance, to updating and modification of customer service records. For the first time, XML has enabled efficient description of heterogeneous data sources allowing for computer-to-computer exchange between often-discordant database environments.
In relation to the publishing of reference materials, such as medical texts or treatises, XML has been utilized as the supporting language to a variety of sources, including: UMLS® Metathesaurus®, SPECIALIST Lexicon, and UMLS® Semantic Network.
UMLS® Metathesaurus® (the “Metathesaurus®”) currently contains content from over 60 biomedical vocabularies and classifications. It preserves the names, meanings, hierarchical contexts, attributes, and inter-term relationships present in its source vocabularies, adds certain basic information to each concept, and establishes new relationships between terms from different source vocabularies. The Metathesaurus® supplies information that computer programs can use to interpret user inquiries, interact with users to refine their questions, identify which databases contain information relevant to particular inquiries, and convert the users' terms into the vocabulary used by relevant information sources. The Metathesaurus® is intended primarily for use by system developers, but can also be a useful reference tool for database builders, librarians, and other information professionals.
UMLS® SPECIALIST Lexicon (the “SPECIALIST”) is a general English lexicon intended for use by natural language processing systems. Each lexicon entry for each word or term records the syntactic, morphological, and orthographic information needed by the SPECIALIST natural language processing system. The lexical programs generate a range of variations for English lexical items, which should be useful for recognizing lexical variation in biomedical terminologies and texts, and consist of several different modules that may be combined in a variety of ways. Several lexical databases that may be useful for developers are available and include a file of known derivational variants, a file of closely related terms that mean the same thing but may have a different syntactic category, a file of spelling alternations, and a file of neoclassical combining forms with their meanings.
UMLS® Semantic Network (the “Semantic Network”) provides 134 semantic subtypes to provide consistent categorization of all concepts within the Metathesaurus® with 54 links between semantic subtypes. While all information about specific concepts is found in the Metathesaurus®, the Semantic Network provides information about the basic semantic types that are assigned to these concepts, and it defines the relationships that hold between the semantic types. Thus, the Semantic Network serves as an authority for the semantic types that are assigned to concepts in the Metathesaurus®. It defines these types, both with textual descriptions and by means of the information inherent in its hierarchies.
System developers can use these UMLS® products free of charge after applying for a UMLS® license. Applications of UMLS® can be found in systems focused on patient data, digital libraries, Web and bibliographic retrieval, natural language processing, and decision support.
MeSH, yet another known lexical product, provides a simple layer in that it consists of a thesaurus with a set of terms or subject headings that are arranged in both an alphabetic and a hierarchical structure. It contains more than 19,000 main headings as well as 103,500 headings called Supplementary Concept Records within a separate chemical thesaurus. There are also thousands of cross-references that assist in finding the most appropriate MeSH heading (e.g., Vitamin C see Ascorbic Acid). MeSH is free to users and an electronic form can easily be downloaded.
Most of the existing work on developing the “semantic web” has focused on finding ways to express relationships between existing resources (i.e., content). This has led to the development of the Resource Description Framework (“RDF”) and the RDF Schema (“RDFS”) as forms for expressing relationships and semantic metadata. RDF is a general framework used for describing metadata and provides interoperability between applications that exchange machine-understandable information. RDFS is a specification that describes how to use RDF to describe RDF vocabularies and defines a basic vocabulary for this purpose, as well as conventions that can be used by semantic applications to support a more sophisticated RDF vocabulary description. A further development has been the DARPA Agent Markup Language (“DAML”) and the Ontology Inference Layer (“OIL”) specifications, which are currently being combined to produce DAML+OIL. DAML+OIL is a semantic markup language for Web resources that builds upon the earlier W3C® standards of RDF and RDFS, extending these languages with richer modeling primitives allowing more complex objects and operations to be constructed.
Despite the reference products that are currently available, as generally outlined above, there still exists some unique and challenging problems with the current state of the art, as outlined below.