In many circumstances, data may be shared between different groups. The data may be stored as a collections of values for variables, each variable having a variable name. The variables may be named according to a standard that provides guidelines or naming conventions which should be followed across an industry or other group. For instance, the Food and Drug Administration (FDA) reviews data from pharmaceutical companies for regulatory purposes. In order to ensure that the FDA can understand the data, the FDA mandates that submitted data be formatted according to standards established by the Clinical Data Interchange Standards Consortium (CDISC).
Such standards typically need to be flexible to account for new or unforeseen types of data. However, this flexibility may leave naming conventions open to individual interpretation, which can result in the same types of data being given slightly different names across data sets. For instance, one pharmaceutical company may identify clinical subjects using the variable “SUBJID” (for “Subject Identifier”), while a second uses the variable “USUBJID” (for “Unique Subject Identifier”).
Consequently, variable names may need to be harmonized before or after the data is reported. The process of harmonizing the variable names is often tedious, error prone, and time consuming.