Medical documents contain a wealth of biomedical information, but unfortunately 85% of this information is in free text and not accessible for data mining or analysis without expensive effort to read and code these documents. Although natural language programs have achieved a limited ability to extract and code medical findings, the capability to semantically process all the free text in a medical document has never been achieved in a large scale medical domain.
Health professionals increasingly believe the adoption of electronic medical records (EMR) will improve medical care by fostering the sharing of patient information. The federal government has taken a leadership role in this area, through the endorsement of standards for EMR interoperability. One component of EMR interoperability is a lexicon, which is a dictionary of standard terms, each assigned a unique identifier. The federal government has endorsed the following standard lexicons for EMR data exchange: (1) The College of American Pathologists Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for laboratory result contents, non-laboratory interventions and procedures, anatomy, diagnosis and problems, and nursing. (2) Health Level 7 (HL7) for demographic information, units of measure, immunizations, and clinical encounters. (3) Laboratory Logical Observation Identifier Name Codes (LOINC) for laboratory test orders and drug label section headers, and (4) the Health Insurance Portability and Accountability Act (HIPAA) transactions and code sets for electronic exchange of health related information in billing and administrative functions. Other standard code sets have been devised or are being created, which will further facilitate the transfer of electronic health information.
While the adoption of standards is desirable and necessary for medical information exchange, new challenges arise that were much smaller problems in the world of “paper” based records. Under the old paradigm there was a limited expectation of receiving codified information. The government and insurance companies received codified data to pay claims against two standard code sets: (1) Current Procedural Terminology (CPT) published by the American Medical Association, which describes services rendered by physicians, and consists of 8,568 codes and descriptors and (2) International Classification of Diseases, Ninth Revision, Clinical (ICD-9-CM) published by the Center for Medicare and Medicaid Services (Federal Agency), which describes diagnoses and procedures, and consists of; approximately 17,000 codes.
Currently, health information coders, using narrative information from diagnoses and procedures provided by physicians and other recognized practitioners, assign codes to medical reports using these two standard code sets. Coding is necessary for reimbursement of patient services, and coding errors can lead to denial of payment. Because the government puts a great deal of emphasis on thorough and correct coding, and given that even these relatively small code sets are complicated to use, a large consulting and software industry supports health information coders.
The government's push to promote robust but more complex coding standards will require new technology to assist coders. Health and Human Secretary Thompson announced in that SNOMED CT would be free to use by all U.S. health providers under a license agreement between the federal government and the College of American Pathologists. The National Library of Medicine paid for this nationwide license because they believed the SNOMED CT lexicon will serve as key clinical language standard for the national health information infrastructure; however, there are few medical coders that can code an entire medical document against SNOMED CT.
The SNOMED CT system is several orders of magnitude more complex to use than CPT or ICD-9-CM. As of early 2006, there were 368,000 unique terms in SNOMED CT. Unlike CPT or ICD-9-CM, SNOMED codes can also define relationships between concepts. For example, the concept of fracture of shaft of tibia can be qualified by laterality (laterality=right) and by fracture type (fracture type=spiral). SNOMED calls this post-coordination. There are three types of post-coordination: refinement, qualification, and combination. One problem with post-coordination is the opportunity to designate multiple valid sequences of codes to describe the same clinical concept. In the above example, if at a future time SNOMED creates two new “pre-coordinated” concepts, “fracture of shaft of the right tibia”, and “fracture of shaft of the left tibia”, a coder may use either the more specific code, “fracture of the shaft of the right tibia” or two codes “fracture of shaft of tibia” qualified by “right”. This is a simple example, because the clinical concept is relatively straightforward. However, as the complexity of clinical concepts increases the number of valid SNOMED code sequences increase. This is undesirable for interoperability, data mining, and decision support. Yet, there are no good automated tools that fully address this problem.
Autocoders are software utilities that have been used to perform coding of medical records. Typical autocoders use a multi-step process consisting of word based tokenization, normalization, stemming, and token matching of medical expressions to concepts in a standard lexicon. Generally the best match is considered to be the one with the greatest number of shared tokens between the target phrase and the standard lexicon. Unfortunately, this approach is poorly suited to codifying the meaning of sentences that contain modifiers, qualifying clauses, or other implicit information. Simply put, the semantics of a sentence is more complex than the additive sum of its words.
Semantics is a complex field which looks at least two components of meaning, intensional and extensional. The physical objects to which the expression refers is the expression's extensional component, and the characteristic features of the physical object which are used to identify the object is the intensional component [CAMPBELL K E, OLIVER D, SPACKMAN, K A, SHORTLIFFE, E H. Representing Thoughts, Words, and Things in the UMLS. JAMIA. 1998; 5:421-431.] Understanding the expression's intensional and extensional components is essential to semantic representation. Only when the entire context is fully considered can synonymy be decided. For example, in the phrase, “Semi-Upright Portable film of the chest”, an autocoder would match the token ‘Semi-upright’ to the SNOMED concept ‘Semi-erect body position’. However, if the autocoder made this same match for the phrase, “A 45 degree semi-upright venographic table”, it would be in error. The error is the result of failing to understand the intensional component of this phrase.
Accurate coding critically depends on synonymy. Names that have the same meaning should refer to the same concept. Unfortunately, rarely do two names have exactly the same meaning, because their semantics is often fuzzy. Names may closely overlap in meaning, but are not equivalent in all contexts. In some cases they may be practically synonymous, although they are not logically synonymous. For example, a physician may write, “There are diffuse pulmonary infiltrates.” SNOMED would represent this as a post-coordinated sequence of two concepts: (1) 409609008—Radiologic infiltrate of the lung (disorder) and (2) 19648000—Diffuse (qualifier). However, a pulmonary infiltrate is a pathologic process independent of the means used to detect it. Nevertheless, because a chest x-ray is a common diagnostic tool for detecting pulmonary infiltrates, this sequence of SNOMED codes is close enough to the semantic meaning of this sentence. A medical expert is in the best position to judge whether this code sequence is “close enough”. For high precision matching, human judgments are required to accurately determine the semantic equivalence between a sentence expression and concepts in a standard lexicon. Even experts may have trouble agreeing on the synonymy of clinical expressions [KIN WAH FUNG, K W, HOLE, W T, NELSON, S J, SRINIVASAN, S et. al. Integrating SNOMED CT into the UMLS: An Exploration of Different Views of Synonymy and Quality of Editing. J Am Med Inform Assoc. 2005; 12:486-494.] Therefore, even the best autocoders make mistakes, especially when they must return complex post-coordinated code sequences, because they lack domain knowledge. Current coding applications do not adequately address the problem of semantic equivalence.
An evaluation of two popular SNOMED autocoders was performed by the Veterans Administration Hospital and the Utah Department of Medical Informatics, Salt Lake City [Penz J F, Brown S H, Carter J S, Elkin P L, Nguyen V N, Sims S A, Lincoln M J. Evaluation of SNOMED coverage of Veterans Health Administration terms. Medinfo. 2004; 11 (Pt 1): 540-4]. They were interested only in the accuracy of the SNOMED autocoders to code for the pathologic diagnosis, and not every sentence in the report. Yet even for this limited task, the two autocoders completely agreed only 12% of the time, with partial agreement 82% of the time. Common reasons for partial matches were spelling errors and abbreviations in the target phrase. Expert review of the autocoders' accuracy showed that only in those cases in which the two SNOMED autocoders completely agreed was there high precision (88%) in coding. In the case of partial agreement, precision slipped to 50%. Additionally, neither SNOMED autocoder could assign a code to 6% of the diagnoses.
Consider the following sentence from a radiology report, “There is a right internal jugular line in place with the tip in the superior vena cava.” The best sequence of SNOMED codes consists of: 405425001—Catheterization of internal jugular vein (procedure), 24028007—Right (qualifier value), 1872000—In (attribute), and 48345004 Superior Vena Cava Structure (body structure). Note that the semantics of “catheterization of the internal jugular vein” is not logically equivalent to “internal jugular line in place”, but is closely related. Likewise the attribute, “in”, refers to the “catheter tip”, and not the entire catheter, yet, again there is a close relationship. Although an autocoder equipped with a very large synonym table might get some of these codes correct, the autocoder would lack the domain knowledge and judgment to determine the overall quality of this match. Autocoders do not have the ability to rate the quality of their semantic matches except through some arbitrary scoring algorithm. For example, an autocoder might assign a score of 0.8 if it could match 4 of the 5 significant words in the target phrase. This may have little relevance to the actual match quality as determined by a human reviewer, yet measuring code quality is vital to the coding industry.
Dart and Rawlins [U.S. Pat. No. 6,529,876] taught a method for generating Evaluation and Management (E&M) codes using electronic templates to gather the required information in a standardized fashion. However, their approach requires data be entered in a standardized form. Similar systems require data be input in predefined fields. These systems are unable to process non-standard input data, such as a free text. They place a significant data entry burden on the healthcare provider.
Cousineau et. al. [USPTO application 20060020493] discuss a method to “correct” non-standard, or free text input data, using a syntax processing block and a knowledge ontology to generate one or more healthcare billing codes. The details of using natural language processing to generate the “corrected” data file are not disclosed. The problem of semantic equivalence is not addressed.
Boone et. al. [USPTO application 20040243545, 20040220895] disclosed a system for automated coding. Their system uses a classification engine which depends on statistical models developed from training data. The statistical models vary with document type. Rules are added to perform additional filtering. Golden et. al. [USPTO application 20030018470] teaches a method for coding free text data using Hidden Markov Models and the Viterbi algorithm. However statistical approaches run into similar problems as autocoders, because there are no strong methods to guarantee or even measure semantic equivalence.
Lau et. al. [USPTO application 20020198739] teaches a system for mapping and matching laboratory results and tests. Their approach is dictionary based and does not perform semantic analysis at the sentence level.
A more sophisticated approach was disclosed by Heinze and Morsch [U.S. Pat. No. 6,915,254]. Their system employs a parser using syntactical and semantic rules that allow for more accurate coding than those with only employ computerized look-up tools. Phrases, clauses, and sentences are matched individually and in combination against knowledge-based vectors stored in a database. They describe a component called a resolver, which applies high-level medical coding rules to produce diagnosis, procedure, and EM level codes. Their resolver includes a knowledge base of severity and reimbursement values per code, code ordering rules, code mappings specific to particular payers, and which codes are not billed by particular providers or billed to particular payers. The heart of their natural language processing system is an engine that takes terms in free text, and matches them to vectors which consists of lists of valid word sequences for a specific concept. Although their system can process the free text associated with a subset of billing codes, it does not try to semantically process all the free text in the medical record. They do not propose a systematic method for deriving all the relevant concepts or extracting a comprehensive knowledge representation scheme for all the semantic knowledge contained in medical free text documents. Without this knowledge one can not completely code a medical document against a complex compositional lexicon such as SNOMED CT.
Another problem with prior art approaches is that some information is implicit in discourse, such as the connections between sentences and sentence constitutions. One type of implicitness is anaphora, which occurs when an abbreviated linguistic form can only be understood by reference to additional context; the reference is called ‘anaphora’, and the mention of the entity to which anaphora refers is called the ‘antecedent’.
Consider the following radiology report. Source: RIGHT, TWO VIEWS. Description: There is a nondisplaced spiral fracture of the distal fibula. Ankle mortise radiographically stable. Impression: Reduction maintained since June. In this case the ‘reduction’ refers to the spiral fracture, so the last sentence could more clearly state, “Reduction of the spiral fracture of the distal fibula maintained since June.” Unfortunately, busy physicians rarely have the time to completely specify all their antecedents. While a human reader would have no trouble resolving the ambiguity of this sentence, it is far more challenging for a computer. Although there are many active investigators in the field of anaphora resolution, and several promising techniques, there is no general algorithm to solve this problem. Yet, without addressing this problem, high precision coding is impossible.
A high precision coding system requires a deep understanding of the knowledge domain. It must squarely address how to identify linguistic expressions that are semantically equivalent, a difficult problem, since computational linguists have not yet developed tools which can analyze more than 30% of English sentences and transform them into structured forms [Rebholz-Schuhmann D, Kirsch H. Couto F (2005) Facts from text—Is text mining ready to deliver? PLOS Biol 3(2): e65]. Without identifying most or all of the linguistic variations that represent the same statement semantically, the coding system will have suboptimal precision.
A major hurdle to providing this deeper level of knowledge is discovering all the relevant concepts in a circumscribed area of knowledge—a domain. Few tools and methods are available to systematically categorize domain knowledge, especially in medium to large scale domains. IBM researchers built a tool, BioTeKS, capable of highlighting some semantic categories and their relations using automated annotators [Mack R. et al. Text analytics for life science using the Unstructured Information Management Architecture. IBM Systems Journal. September, 2004], but could not extract the detailed semantic relationships found in medical documents without having domain experts construct and refine finite state grammar rules, which have been shown to be difficult to construct, and rarely complete except in very simple domains.
For all these reasons, the high precision coding system of the present invention does not exist in the current art. Significant features of the system include: (1) a deep understanding of the knowledge contained in the documents being encoded, (2) mapping semantically equivalent linguistic expressions to a logical structure called a proposition, so that standard codes which represent this knowledge are consistent (both now and in the future), (3) resolving anaphora, (4) using human judgments to make the best possible match between semantic propositions and codes in the standard lexicon, (5) judging the quality of a coding matches, and (6) using software tools to make the process maximally efficient while at the same time very precise. Prior art systems do not meet these demanding requirements