As a process for improving the quality of data stored in a database (DB) before the data is used, data cleansing is performed in a name aggregation process or the like. In data cleansing, useless garbled data is deleted and styles of data included in the same column are made consistent.
For example, in data cleansing, when data representing a full name such as “TOKKYO Taro” is included in a column “family name”, this data is deleted. Furthermore, in data cleansing, when data representing “090-xxx-xxxx” and data representing “090yyyyyyy”, which have different styles from each other, are stored in a column called “cellular phone number”, all the data are standardized so that the symbol “-” is not included in cellular phone numbers.
Furthermore, a data integration apparatus that executes data cleansing and mapping when data is transmitted and received between different systems has been widely used. The data cleansing that the data integration apparatus performs includes deletion of blank space in data, deletion of linefeed codes in data, conversion of units, conversion of character strings, standardizing era names, and standardizing numbers of significant figures. Furthermore, the data integration apparatus generates a mapping definition of the mapping in accordance with the data structure of the copy source and the configuration of the data structure of the copy destination, and stores the copy source data that has been subjected to data cleansing in the copy destination in accordance with the mapping definition.
For example, in the related art, data cleansing is executed assuming that data is stored in accordance with a schema for a corresponding column, that is, assuming that specific data is stored in the column. Therefore, in the related art, if there is data to be stored in a certain column but the data is stored in the wrong column, the data is deleted as noise.
Furthermore, in mapping in the related art, because a mapping definition is generated in accordance with the copy source's schema and the copy destination's schema, data stored in an incorrect column is stored in the copy destination without correction.