Electronic patient information may be represented in structured, semi-structured and/or unstructured clinical documentation. In the context of the Chinese healthcare system, similar documents are generated in large volumes during routine and emergency clinical processes. However, information retrieval from unstructured clinical documents written in Chinese can be very challenging, especially for clinical decision support and real-time knowledge discovery. Extracting information from unstructured clinical notes may be very useful for many clinical applications.
Although there have been many studies on extracting information from electronic health record (EHR) clinical documents written in English, few studies have explored using natural language processing techniques to extract information from Chinese clinical notes. There have been some research efforts on extracting structured clinical concepts from free-text clinical notes in Chinese. For example, machine learning (ML) based approaches, such as logistic regression models have been used. ML-based approaches require manually annotated datasets for training; however, generating sufficient expert-annotated data to train and test ML models can be very time-intensive and expensive. Furthermore, the performance of these ML algorithms is heavily dependent on the extent to which the expert-annotated data represents the knowledge domain of interest. For instance, a ML model trained on radiology notes may have limited performance when applied to extract information from cardiology notes. Applications using ML algorithms trained in a particular domain or sub-domain may not be extensible to other distinct knowledge areas.