The present invention relates generally to the field of big data, and more particularly to providing a model-based approach for transforming data units.
Big data is a broad term for data sets that are so large or complex that the data sets are difficult to process using traditional data processing applications. Challenges include, without limitation: analysis; capture; curation; search; sharing; storage; transfer; visualization; and information privacy. Analysis of larger data sets can find new correlations to spot business trends, prevent diseases, combat crime, etc.
Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring, instead, massively parallel software running on tens, hundreds, or even thousands of servers. The level of difficulty varies depending on, for example, the capabilities of the organization managing the set and the capabilities of the applications that are used to process and analyze the data set in its domain.
Data is a set of values of qualitative or quantitative variables; more simply, pieces of data are individual pieces of information. Data is measured, collected, reported, and analyzed to be visualized using graphs or images.
In metadata and data warehousing, a data transformation converts a set of data values from the data format of a source data system into the data format of a destination data system. Data transformation can be divided into two steps: (1) data mapping maps data elements from the source data system to the destination data system and captures any transformation that must occur; and (2) code generation that creates the actual transformation program. Data element to data element mapping is frequently complicated by complex transformation that require one-to-many and many-to-one rules. When the data mapping is indirect via a mediating data model, the process is also called data mediation. The code generation step takes the data element mapping specification and creates an executable program that can be run on a computer system. Code generation can also create transformation in easy-to-maintain computer languages.
A master data recast is another form of data transformation, where the entire database of data values is transformed, or recast, without extracting the data from the database. All data in a well-designed database is directly or indirectly related to a limited set of master database tables by a network of foreign key constraints. Each foreign key constraint is dependent upon a unique database index from the parent database table. Therefore, when the proper master database table is recast with a different unique index, the directly or indirectly related data are also recast or restated. The directly or indirectly related data may also still be viewed in the original form since the original unique index still exists with the master data. Also, the database recast should be done in such a way as to not impact the application's architecture software.