Using drugs safely is an international public health issue. Adverse drug reactions constitute one of the ten principal causes of mortality in the United States. In France, one hospitalization out of ten is the consequence of an adverse drug reaction.
According to the World Health Organization (WHO), pharmacovigilance is “the science and activities relating to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem.” Pharmacovigilance is the medical domain related to the review, statistical analysis, and detection of adverse reactions related to drug administration. The purpose of pharmacovigilance is to identify a potential relationship between a drug and an adverse reaction, defined as a clinical manifestation unexplained by the natural evolution of the patient clinical condition. The adverse reaction status is justified only by the logical exclusion of any other factors related to the patient, hence allowing the intake drug to be incriminated. These hypotheses must be justified by conducting research to establish a causality relationship between the drug and the adverse reaction.
Pharmacovigilance reports describe and code observations of adverse reactions after drug administration. Pharmacovigilance reports also may be referred to as drug safety reports. Pharmacovigilance reports describe the suspicion of a causal relation between the administered drugs and the observed reactions. These reports are centralized and stored in national and international databases, and are periodically manually reviewed by medical experts. Some statistical scans can be done on these databases to attempt to detect or to confirm a problem related to a particular drug. These databases are very large, for example, there are 3.8 million reports in the World Health Organization international database. Moreover, the number of pharmacovigilance reports related to adverse drug reactions is exponentially increasing. Therefore, detecting a medical problem in such databases can be very difficult as adverse reactions are often rare events.
The rapid and effective review of pharmacovigilance reports involves: 1) Accurate coding and documenting of pharmacovigilance reports based on the World Health Organization—Adverse Reaction Terminology (WHO-ART) and Medical Dictionary for Drug Regulatory Activities (MedDRA) regulatory coding terminology, and 2) Efficient grouping of similar pharmacovigilance reports through the terminology structure (i.e., hierarchy). The coding terminology has an important role in indexing, sharing, analysis, and reporting of data during clinical trials and for post marketing drug surveillance.
Pharmacovigilance case assessment, however, is a highly costly task within drug safety departments. In the last few years, the focus on drug safety terminology has increasingly shifted from coding to data retrieval and analysis for risk assessment and safety signal detection (e.g., data mining).
The terminologies currently being used for pharmacovigilance report coding are not optimal for data retrieval and case review. Indeed, these terminologies do not group pharmacovigilance reports to enable a user to efficiently review specific medical issues, particularly those medical issues not covered by the requirements of the regulatory authorities.
The computational representation of terminologies for pharmacovigilance is difficult. Most items in a pharmacovigilance report are described using medical terms. Physicians can naturally understand the meaning of terms and exchange this meaning with colleagues due to their knowledge of medicine. When information is transferred from the paper-based description of an adverse drug reaction to a pharmacovigilance database, the meaning of information may be lost in different computational operations. If data about the patient and knowledge about the adverse drug reaction can be represented, the computer system can accomplish many tasks that may enhance the ability to retrieve relevant pharmacovigilance cases and learn more about adverse drug reactions. One approach to such representation is knowledge representation, a collection of techniques drawn from computer science.
A concept is an abstract, universal psychical entity that serves to designate a category or class of entities, events, or relations. A concept as a “unit of thought” may include two parts: 1) its extension, which includes all objects belonging to the concept, and 2) its intension, which includes all attributes belonging to the concept. In knowledge based systems, these concepts are described using a formal language. A formal language is a language that can be processed by a computer to produce results which meaning can be understood by the user.
Three generations of terminologies have been used to provide concept representation systems in medicine.
First generation terminologies are based on textual descriptions of concepts. These terminologies do not provide categorical structure, and concepts are designated by codes and strings. These traditional terminologies are paper-based, but can be electronically available to allow the storage, transmission, and retrieval of strings and codes attached to the concepts. These types of terminologies have a fixed and usually unique hierarchy devoted to a single application.
An example of a first generation terminology is World Health Organization—Adverse Reaction Terminology (WHO-ART), which may be used in pharmacovigilance for data coding and data statistical analysis. WHO-ART was the first adverse event terminology used in pharmacovigilance and was created in 1968 by the founders of international pharmacovigilance system. The WHO-ART system is maintained by the Uppsala Monitoring Centre, which is the World Health Organization's collaborating center for international drug monitoring. WHO-ART is a dictionary meant to serve as a basis for rational coding of adverse drug reaction terms. The main purpose of WHO-ART was to give a standardized way to input data in early computer databases. WHO-ART terminology has been developed for more than 30 years and serves as a basis for rational coding of adverse reaction terms. WHO-ART terminology has a hierarchical structure with restricted multiple inheritance. WHO-ART has three levels in theory, but is primarily organized in only two levels and medical terms with different levels of generalization may be siblings.
WHO-ART is organized on three hierarchical levels: (1) adverse drug reactions (ADRs) are coded using one thousand eight hundred and fifty seven preferred terms (PT); (2) some PTs are grouped into one of one hundred and eighty high level term (HLT) classes; and, (3) at the most general level, PTs are grouped according to thirty two system organ classes (SOC). Most SOCs group terms according to an anatomical perspective, for example, in a “Gastrointestinal disorders” SOC, and some of the SOCs use a problem oriented approach to group terms, for example, in a “Neoplasm” SOC.
MedDRA is another example of a first generation terminology. MedDRA defines a clinically validated international medical terminology used by regulatory authorities and the regulated biopharmaceutical industry throughout the entire regulatory process (from pre-marketing to post-marketing activities) for data entry, retrieval, evaluation, and presentation. In addition, MedDRA provides the adverse event classification dictionary endorsed by the International Conference on Harmonization of Technical Requirements of Pharmaceuticals for Human use. MedDRA is used in the United States, European Union, and Japan, with its use currently mandated in Europe and Japan for safety reporting.
MedDRA coding terminology is hierarchical and multiaxial in nature. A data retrieval section of MedDRA is an associative grouping of terms. The different levels of the terminology in MedDRA from highest (broadest concept) to lowest (most specific) are the following: (1) System Organ Class (SOC); (2) High Level Group Term (HLGT); (3) High Level Term (HLT); (4) Preferred Term (PT); and (5) Lowest Level Term (LLT). MedDRA has similar organization as WHO-ART, with one difference being that MedDRA includes a new level of grouping HLGT, thus allowing MedDRA to include a greater number of groupings than WHO-ART.
Second generation terminologies are compositional systems which are built using a categorical structure and a cross thesaurus. The categorical structure is composed of a set of meta-term descriptors to describe a concept in a domain of expertise. For example, <morphology>, <function>, <topography>, and <etiology> are useful descriptors for the categorical structure of adverse drug reactions. A morphology descriptor may contain terms used to describe structural changes in the body. An example of a morphology is “inflammation.” A function descriptor may contain terms used to describe both normal and abnormal functions of the body. An example of a function is “tachycardia.” A topography descriptor may contain detailed anatomic terms. An example of a topography is “upper limb.” An etiology descriptor may contain terms that deal with the causes or origin of disease, and the factors which produce or predispose toward a certain disease or disorder. The etiology descriptor may include, for example, living agents such as, bacteria, viruses, or parasites.
A term from a first generation system (e.g., WHO-ART, MedDRA) may benefit from a description using a second generation terminology. For example, the MedDRA term “Gastric ulcer hemorrhage,” which is an example of a “molecular” terminology phrase, may be dissected into basic units. Terms that cannot be further dissected are “atomic” terms, e.g., ulcer, stomach, hemorrhage. The cross thesaurus is a multiaxial thesaurus that provides the atomic terms to enter descriptors from the categorical structure. For example, a Gastric ulcer hemorrhage could be described by the following dissection:
Gastric ulcer hemorrhage is an adverse drug reaction that:                has_morphology: hemorrhage AND ulcer        has_finding_site: stomach        
Relationships such as “has_morphology” and “has_finding_site” are semantic links.
In third generation terminologies, terms are described using a logical language and the position of terms in the hierarchy is found by computing subsumption relations. For example, “gastric ulcer hemorrhage” is a kind of “gastric ulcer.” In this setting, a concept is defined by a label and a formal definition. The formal definition is composed by a number of phrases expressed according to a logical formal language aiming to indicate one and only one meaning to the label (i.e., disambiguate) from the various signification that one person could give.
An example of a third generation terminology is Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT). SNOMED CT is a systematically organized computer processable collection of medical terminology covering most areas of clinical information, such as diseases, findings, procedures, microorganisms, pharmaceuticals, etc. SNOMED CT allows a consistent way to index, store, retrieve, and aggregate clinical data across specialties and sites of care. SNOMED CT also helps organize the content of medical records, reducing the variability in the way data is captured, encoded, and used for clinical care of patients and research. The design of SNOMED CT is based on “description logic.”
SNOMED CT is emerging as a standard terminology for data coding in clinical domains. SNOMED CT may be used to represent signs, symptoms, diseases, and laboratory examination results. SNOMED CT is an attempt to provide a formal ontology in the medical field. An ontology is a formal system whose purpose is to represent knowledge in a specific domain by means of basic elements called concepts, which are defined and organized in relation to the one another.
Conventionally, pharmacovigilance reports stored in pharmacovigilance databases are typically accessed using a lexical search (i.e., search by character string, key words, and synonyms) or by navigation within the groupings of the terminologies (e.g., WHO-ART, MedDRA) used for the coding. This approach is problematic due to organizational problems of the WHO-ART and MedDRA terminologies. In particular, grouping of preferred terms (PT) by means of high level terms (HLT) is not always carried out in a systematic and consistent way.
For example, WHO-ART is a mix of diagnostic and descriptive terms. Often pharmacovigilance reports confusingly contain a combination of diagnostic and descriptive terms as a descriptive part of a disease that can also be covered by a diagnostic term. The hierarchical organization of WHO-ART also may lead to characteristic details that are hidden behind higher level terms and may split up a general pattern over different low level terms (i.e., descriptive terms). This is aggravated by some inconsistencies in WHO-ART (e.g., myocardial infarction and thrombosis coronary were two different terms in different SOCs without any link).
A problem with the MedDRA terminology is that MedDRA terms can be linked to only one HLT inside the same SOC. For example, “gastric ulcer hemorrhage” belongs to the “gastric ulcer and perforations” HLT, but not to the “gastric and esophageal hemorrhage” HLT. Therefore, there is a missing relationship between the “gastric ulcer hemorrhage” PT and the “gastric and esophageal hemorrhage” HLGT due to the structural design of MedDRA. In another example, the “gastric ulcer and perforations” HLT is linked to the “vascular hemorrhagic disorder” HLGT in the “vascular disorder” SOC. This may help the user to find additional gastric hemorrhages, but the “vascular hemorrhagic disorder” HLGT may contain other hemorrhages that are not located in the stomach.
These and other problems exist with conventional systems.
These and other embodiments and advantages will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the various exemplary embodiments.