1. Field of the Invention
The present invention relates generally to systems and methods for processing drug information. More specifically, it relates to extracting data from drug information sources in a manner to support use of the data with artificial intelligence tools.
2. Background
Over 9,500 prescription drug products have been approved by the U.S. Food and Drug Administration (FDA). Label data for each drug is prepared by the drug manufacturer and approved by the FDA. Navigating through label data to locate information relevant to a prescribing decision, e.g., appropriate selection, dosing cross-drug effects, contraindications, and warnings, is a daunting task for physicians, pharmacists, pharmaceutical benefit managers, hospital formularies, insurance companies, and others.
Compilations of label data are available. The Physicians' Desk Reference®(PDR) compiles full-length entries of the exact copy of most drug's FDA-approved label in hard copy. Computer-searchable versions of this data are available from the publisher of the PDR®; while computer-searchable versions of similar data are available from vendors such as Multum Information Services, Inc., Denver, Colorado and ePocrates, Inc., San Carlos, Calif.
Other drug information sources are available, such as articles from medical journals and formularies used by insurance carriers and health maintenance organizations (HMOs).
Each of these drug information sources may contain explicit and implicit information. For example, the drug label for RUBEX® doxorubicin hydrochloride for injection includes the following adverse event content in text form:
ADVERSE REACTIONS . . . Cutaneous: Reversible complete alopecia occurs in most cases. . . . Gastrointestinal: Acute nausea and vomiting occurs frequently and may be severe.
The adverse event content above contains implicit information regarding an adverse event, e.g., alopecia, and its frequency of occurrence when the drug is used, i.e., most.
As a further example, the drug label for REMICADE™ infliximab includes the following adverse event data content in table form:
ADVERSE REACTIONS IN CROHN'S DISEASE TRAILSPlaceboInfliximab(n = 56)(n = 199). . .Pts with ≧1 AE35 (62.5%)168 (84.4%)WHOART preferred term12 (21.4%) 45 (22.6%)Headache. . .
As another example, consider the drug label for PROZAC fluoxetine hydrochloride. Label adverse reaction information is given both explicitly in tables that contain percentages, and implicitly by use of the words frequent, infrequent, and rare.
In addition to adverse event data content, drug information sources, such as labels, typically contain instances of drug rule content. Instances of drug rule content include prose containing one or more drug rules. As an example, consider the drug label for ENBREL® entanercept. Its label contains the following drug rule content
CONTRAINDICATIONS
ENBREL should not be administered to patients with sepsis or with known hypersensitivity to ENBREL or any of its compounds. . . .
Typical existing approaches to managing drug information present the information in a simple manner, e.g., in a “warehouse” fashion, and do not focus on indirect or implicit information (especially adverse event data and drug rules). More specifically, existing approaches do not focus on capturing drug information in a manner amenable to use with artificial intelligence tools. Existing approaches typically focus on categorizing verbatim text without regard to the underlying logical content.
In addition, differing terminology employed by data authors also makes conventional queries cumbersome and the results less reliable than desired. This problem is acute in the area of medical information related to substances such as drugs. Drugs and other therapeutic substances may be known by a variety of names. In addition to the chemical name, many drugs have several clinical names recognized by health care professionals in the field. It is not uncommon for a drug to have several different trade names depending on the manufacturer. This matter is further complicated by one or more functional names that may be associated with a drug or other substance. For example, an antidepressant may be identified as Prozac®, a fluoxetine, a serotonin reuptake inhibitor, or a serotonin receptor specific modulator. However, antidepressants include many other drugs, such as lithium and other catecholaminergic drugs, and there are serotonin reuptake inhibitors in addition to Prozac®. Even “standardized” terminology can differ between compilations. For example, references that can serve as sources of standard terminology include Medical Dictionary for Regulatory Activities (MedDRA™), World Health Organization Adverse Reaction Terminology (WHO-ART), or Coding Symbols for a Thesaurus of Adverse Reaction Terms (COSTART) developed and maintained by the FDA's Center for Drug Evaluation and Research.
Typically, compilers of drug label data make minimal, if any, effort to improve the quality of the data. Data corruption can include extraneous non-alpha characters, noise words, misspellings, and dislocations (e.g., data that is valid for one category, erroneously entered into another, inappropriate field).
In addition, existing methods of compiling and organizing such data do not focus on the rules regarding drug safety contained within drug information sources. Existing approaches typically focus on categorizing verbatim text without regard to the underlying logical content.
Existing methods, alone or in combination, do not address improving the quality of the underlying verbatim drug information source data. Nor do existing methods address mapping this underlying data to accepted pharmaceutical community terms and hierarchies through which to direct queries. The problem of differing terminology among the disparate labels also remains un-addressed; as does the problem of data corruption in the form of misspelling and extraneous characters.
Typical existing methods of processing drug information are not focused on extracting rules or adverse event data from drug information sources. Nor do those methods address structuring these rules in a format amenable to use by inference engines, reasoning engines, or other similar sophisticated data processing techniques.
In view of the above-described deficiencies associated with data concerning drugs and other substances associated with medical databases, there is a need to solve these problems and enhance the amenability to efficient use of such data. These enhancements and benefits are described in detail herein below with respect to several alternative embodiments of the present invention.