Many research endeavors leverage large data repositories storing raw experimental or product usage data. The data repositories often employ unique formats, making it necessary to manipulate the data from each repository to get the data in a standardized form. Often this manipulation is performed on a per repository basis in an ad hoc manner. In fields that involve some sort of regulatory agency, information regarding the various data sources that were used to derive results may be required by the agency.
In the field of pharmacovigilance (PV), in which adverse reactions to drugs (typically called adverse events (AEs)) are tracked, data is recorded in many formats and data coding standards by many data collection agencies. Thus, it is difficult for drug companies to leverage the vast amount of AE data that is available. Drug companies must often choose between ignoring AE data sources that are incompatible with their analysis systems or performing expensive custom rationalization of the reference data prior to importation of the AE data into their analysis systems.