This invention relates generally to text mining and, more specifically, relates to text mining in large medical text datasets.
This section is intended to provide a background or context to the invention disclosed below. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise explicitly indicated herein, what is described in this section is not prior art to the description in this application and is not admitted to be prior art by inclusion in this section.
The following abbreviations that may be found in the specification and/or the drawing figures are defined as follows:
AE Adverse Event
AEFI Adverse Events Following Immunizations
BOW Bag Of Words
EHR Electronic Health Record
MedDRA Medical Dictionary for Regulatory Activities
ML Machine Learning
MO Medical Officer
MR Medical Record
NLP Natural Language Processing
PT Preferred Term
SRS Spontaneous Reporting System
TC Text Classification
TM Text Mining
VAERS Vaccine Adverse Event Reporting System
Biomedical research is often confronted with large datasets containing vast amounts of free text that have remained largely untapped sources of information. The analysis of these data sets poses unique challenges, particularly when the goal is knowledge discovery and real-time surveillance. See Sinha et al., “Large datasets in biomedicine: a discussion of salient analytic issues”, Journal of the American Medical Informatics Association, 16(6):759-67 (2009). Spontaneous Reporting Systems (SRSs), such as the U.S. Vaccine Adverse Event Reporting System (VAERS), encounter this issue. See Singleton et al., “An overview of the Vaccine Adverse Event Reporting System (VAERS) as a surveillance system”, Vaccine, 17(22):2908-17 (1999).
When extraordinary events occur, such as the H1N1 pandemic, routine methods of safety surveillance struggle to produce timely results due to the resource-intensive nature of the manual review. For instance, Medical Officers have to peruse these reporting systems and determine whether adverse effects occur, e.g., as a result of the H1N1 vaccine. Consequently, there is an urgent need to develop alternative approaches that facilitate efficient report review and identification of safety issues resulting from the administration of vaccines. Text classification (TC) provides an alternative and more efficient process by distinguishing the most relevant information from adverse event (AE) reports.