The present invention relates generally to validating data from text extracted from a set of records. More specifically, the present invention relates to capturing and applying refinements made by a domain expert to the validity, relevance, and temporal significance of “facts” (extractions of discreet data elements, their location within the document, their normalizations, and their ontological classifications) automatically extracted from electronic text.
In the medical field, health care providers (e.g., physicians, medical technicians or administrators) typically dictate diagnoses, medications and other patient medical reports in a free form manner. These dictations are then transcribed into documents. The transcribed documents are typically then submitted to the provider for review and approval. The transcribed documents will likely contain data that is relevant to different users at different times. Additionally, many legacy databases contain documents that include data with varying degrees of relevancy.
Automatic extraction of specified data from electronic medical records has been known for some time. It is well known in the art that computation algorithms may be employed to process text of an electronic document to extract specific data from the document. However, validating the relevancy, relevance, classification, and temporal significance of the data has not been possible heretofore.
Presently, users are required to manually review extracted data in order to validate the data. The manual process requires review of the text document, a time consuming review process in which the user may edit and approve the text for ultimate storage in a database where the text may be reviewed at a later time. Manual operation may include data entry using drop down menus, mouse pointing clicks, typing and time consuming records review. It is therefore desirable to provide users with a validation process that utilizes automatically extracted, relevant data items from free form dictated and transcribed documents.
The significance of facts can change over time. A deficiency in current systems that perform extraction is that they do not account for the temporal significance of the fact. For example, a problem that is relevant today may be resolved tomorrow, and thus the fact that the problem exists is true only when the context of the time period (today) is provided.
An additional problem exists relating to nomenclature. There are several ways to describe many different physical ailments. More particularly, users of such systems often use different phrases to describe a single type of event. For example, one physician may use ‘myocardial infarction’ while another physician may use ‘heart attack’ to describe a problem for a patient. In this example, there may be up to 25 phrases that describe the same or similar ailment to the heart. As such, a searcher who wishes to find a group of records that involve a particular term of art would have to know and use of all the variants of those phrases in order to ensure a complete search. It would be desirable to provide a grouping of like and similar variants of key medical facts, medical concepts, and present those in a user interface along with extractions of the discrete data elements.
Health care providers are often responsible for maintaining lists of current problems, medications, allergies, and procedures for patients. Problems in this context can be anything that is relevant to the physician or affects the care and treatment of the patient. Facts on the current list are significant over a particular time period, after which the problem may no longer be relevant to the patient's treatment and care, or the patient's problem may have been resolved, or the medication discontinued, et cetera.
Manual processes for maintaining these lists often include paper forms wherein the provider writes in new items on the list, dates it, and signs it, or through dictation wherein the provider dictates the actual insertions and removals, where these changes are then made by clerical personnel at the time the dictated report is transcribed. Automated processes found in electronic medical record systems require data entry of the items on the current list.
The deficiencies inherent in manual processes are numerous. When a paper form is used, only one copy of it is available, whereas when this information is stored electronically, multiple viewers can access the information at the same time. It is difficult to locate information on paper forms or even in electronic documents as these storage mechanisms do not provide sorting and filtering features that might be available when the information is stored in a database. A further problem is that when the provider dictates changes to the list, there are time lags introduced by the transcription and editing process that create a delay between the dictation of these changes and the actual implementation of these changes on the storage media. This imposes a delay on the availability of changes to the provider and to the rest of the medical community providing patient care.
When current lists are maintained in electronic medical record systems, the user must manually enter the information in the list, rather than have the system suggest to them changes that might be made to the current list based on extracted facts.
Finally, when current lists are maintained on forms, through dictated changes, or even in electronic medical records, the context in which the problem, medication, allergy, or procedure mentioned for the patient is not available. Therefore, the only information available to the medical community is the item on the current list, without more detailed context that might provide for better medical care.
Thus, present systems do not have the ability to integrate information in real time to a current lists report and cannot provide context for that information. It is desirable to provide a system that presents discrete data elements for approval in real time by a user with the ability to determine the context of a report, namely, the creation point of the report, the creator, the time frame and the relevance of the discrete element for extraction.