1. Technical Field
The present invention relates to extracting and classifying medical data from a medical data storage source, and more particularly, to extracting and classifying medical data that is in unstructured form from such a source.
2. Discussion of the Related Art
In general, an electronic medical record (EMR) is a computerized legal medical record created in an organization that delivers care, such as a hospital or doctor's office. In an EMR, various data elements may be associated to a patient or a patient visit; for example, diagnosis codes, lab results, pharmacy, insurance, doctor notes, radiological images, genotypic information, etc. EMRs tend to be part of a local stand-alone health information system that allows storage, retrieval and manipulation of records.
Data in an EMR is stored in structured or unstructured form. FIG. 1 shows an exemplary EMR 100 with structured and unstructured data. In FIG. 1, the patient's name “John Doe” in field 110 and the examination date “Jan. 1, 2007” in field 120 are examples of structured data. The medical report (e.g., doctor's note) “Patient presents . . . ” in field 130 is an example of unstructured data. Other examples of structured data may include date of birth (mm/dd/yyyy), zip code (a five-digit number), smoke status (either yes/no), insurance type (either medicare/medicade/private), or medication list (medication A, medication B . . . ). Other examples of unstructured data may include images, lab reports, biological sequences and other forms of written reports.
The distinction between these two data types is that desired information can be easily extracted from structured data by using a standard database query language, such as Structured Query Language (SQL). This is so, because the format of the structured data is generally fixed and already known. In contrast, it is not easy to extract desired information from unstructured data. This is so, because the format of the unstructured data is generally not fixed or it is too generic.
For example, with reference to FIG. 1, it is straightforward for a computer to determine the patient's name from the name of patient field 110, or the date of the patient's examination from the date of examination field 120, in both cases assuming the computer knows the formatting of fields 110 and 120. However, due to the freeform entry of data into the medical report field 130, it is not straightforward for a computer to determine what the patient's prescription is from field 130.
As can be gleaned, unstructured data is an essential source of patient information. In fact, it is widely accepted that key clinical information in an EMR is stored in unstructured form. However, by their inherent nature discussed above, it is difficult to automatically extract useful information contained in unstructured data and make it available in a readily usable form. Such information is typically found through manual search.