1. Field of the Invention
The present invention relates generally to an improved data processing system and in particular to a method and apparatus for mapping semantically different (heterogeneous) data from one or more sources to an aggregated, conformed data set in a target enterprise. Still more particularly, the present invention relates to a computer implemented method, apparatus, and a computer usable program product for defining semantic level concept mapping definitions to enable the utilization of standard extract, transform, and loading process from data source to data target using metadata semantic concept mapping, particularly in a clinical research environment.
2. Description of the Related Art
Researchers and healthcare workers are often confronted with the problem of understanding the denominator aggregates of patients or subjects based on healthcare records with heterogeneous coding. These are frequently legacy records, prepared with differing standards, protocols, and formats, and for different purposes. This is a difficult and onerous task that often slows research and is often solved at great cost. This is compounded by legal and policy constraints for privacy. (Often a researcher may have to know the number of patients with a specific condition to get permission from the IRB to see the patient data
A continuing problem in information management is the desire to transfer information stored in one format into information stored in another format. Transfer of information may be desired in order to take advantage of new software, to incorporate older information created in individual past projects into newer forms, to compile information in a central repository, or for other reasons. Particularly in the area of clinical research, clinical researchers often encounter the problem of analyzing healthcare or life sciences data, where such data is located in a wide variety of disparate clinical studies, protocols, file systems and/or repositories located on a variety of disparate computing environments. Additionally, the various forms of data can lack semantic equivalency. Semantic equivalency means that the same terms refer to the same concepts in the same manner. Thus, for example, patient records could refer to “gender” as “M_F,” “0—1,” “Male/Female,” or any number of other terms that have the same meaning but not the same name as the term “gender.”
Traditionally, integration of healthcare or life sciences data has been performed by information technology specialists who have the high degree of both domain knowledge and information technology knowledge required to map the various forms of data into a target data repository, such that the data in the target data repository has a desired format. However, these information technology specialists are usually not subject matter experts with regard to healthcare or life sciences research.
Thus, two significant roadblocks exist with regard to performing new analysis and hypothesis generation support in healthcare and life sciences research. The first roadblock is that few information technology specialists have the expertise required to perform the extract, transform, and loading (ETL) process necessary to transform one form of data into a target data repository. Thus, availability of these experts can hamper or delay the desired transfer of data. The second roadblock is that the information technology specialists may not perform optimal mappings or may not perform mappings of most interest to clinical researchers, because the information technology specialists are not aware of issues that relate to the desired clinical research.
In addition to these two roadblocks, even after information technology specialists have created an extract, transform, and load program or plan, such a program or plan is handcrafted to the precise project at hand. Thus, each individual data transfer project is source specific, possibly target specific, and has little capability for reuse by other research projects. As a result, other research projects are forced to “reinvent the wheel” every time an extract, transform, and load process is to be performed from one or more sources of data to a target data repository.
Moreover, in analyses involving clinical outcomes and drug efficacies, individual patient data must frequently be collected, extracted, and subsequently aggregated. This raises Health Insurance Portability and Accountability Act (“HIPAA”) issues. This can limit the ability to perform both retrospective, patient based research and prospective follow-up research. Strict compliance with HIPAA has frequently been associated with diminished follow-up surveys and also recruitment for new studies.