Assigning pre-specified codes to words and phrases found in a clinical note has been attempted for various coding schemes. International Classification of Diseases version 9 (ICD-9), a coding scheme recommended by the World Health Organization and adopted in USA, is being replaced by a newer version called ICD-10. The ICD-10 coding scheme is a much more detailed coding scheme compared to ICD-9; for this reason, the coding procedure becomes more complex. For healthcare providers, there are about 69,823 diagnostic codes under the new ICD-10-CM (clinical modification) codes, five times more than its predecessor ICD-9-CM. An even more complex matrix of about 71,924 new codes for hospital-based procedures awaits in the ICD-10-PCS (Procedural Coding System), 19 times more codes than ICD-9-CM volume 3. With an increase in the number of concepts, the complexity of automating the identification of coding has also increased.
Another major difference between the ICD-9-CM Procedure and the ICD-10-PCS is structural coherence. While the ICD-9-CM Procedure is flat in its structure, ICD-10-PCS has a multi-axial seven-character alphanumeric code structure (Avril, R F et al., 2011). To address this complexity of medical coding, Computer Assisted Coding (CAC) computer software systems automatically generate a set of medical codes for review/validation and/or use based upon clinical documentation provided by healthcare practitioners.
Natural Language Processing (NLP) and machine learning has been the mainstay of earlier CAC methods such as ICD-9. The new coding scheme of ICD-10-PCS presents challenges such as highly different textual descriptions between the clinical text and the coding descriptions as well as much more fine-grained and multi-layered/multi-level coding structure. Rule-Based NLP systems utilize base dictionaries, which generally do not capture semantic and syntactic variety of entities. Over the years, different research has proven that dictionary-lookup based methods yield no better than 71.5% of F-score (Savova, Guergana K., et al. “Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.” Journal of the American Medical Informatics Association 17.5 (2010): 507-513). Rule-based methods also depend heavily on syntactic parser accuracy, which is also insufficient for the clinical domain.