1. Field of the Invention
The present invention relates to forecasting an outcome based on an n-gram found in a text string and more specifically to a system and method for generating a classifier to forecast an outcome and a system and method for using the generated classifier.
2. Introduction
The public and private health sector has been investigating different approaches to disease outbreak detection using automated syndromic surveillance systems. In such systems, patient chief complaint data is collected by medical staff in an emergency department or outpatient clinic. The medical staff typically enter the patient chief complaint data in free-form text to be stored in electronic records. An automated syndromic surveillance system may perform natural language processing to analyze the free-text of the patient chief complaint data. Further, various automated syndromic surveillance systems code and prioritize symptoms differently. Thus, a syndrome profile for a group of patients may vary depending on a definition used by the automated syndromic surveillance system in a given clinical setting.
Existing automated syndromic surveillance systems suffer from a number of problems. Changes to a system may be time consuming and expensive. Because of the time involved to make changes, a practical size limit for a training set is imposed. New languages or dialects may require development of new programs for automated syndromic surveillance systems. Further, some automated syndromic surveillance system may require preprocessing of chief complaint data.
The International Classification of Diseases (ICD) coding system is an international classification system which groups related disease entities and procedures for the purpose of reporting statistical information. ICD version 9 (ICD9) and ICD version 10 (ICD10) are widely used codes. The purpose of the ICD code is to provide a uniform language and thereby serve as an effective means for reliable nationwide communication among physicians, patients, and third parties. Several days may pass from the time that a patient's chief complaint data is recorded, on first entering an emergency or urgent care department, to the time that ICD diagnoses are given. One or more ICD9 or ICD10 diagnosis codes may be assigned by medical professionals based on their diagnoses of the patient's condition, using a combination of inputs including physician notes, patient vital signs, laboratory test results, and medical examination results. (Unfortunately, assignment of ICD9 or ICD10 diagnosis codes may also be influenced by treatment and payment options.) The patient's chief complaint may provide an early indication of these diagnoses.