A process of integrating extracted data is called physical integration (Extract/Transform/Loading (ETL)). In physical integration, data extracted from an information source by an extracting function (Extract) is subjected to physical integration (Transformation). Results of the integration are registered to a user-side by a registering function (Loading). Physical integration is applicable to collective processing executed as batch processing. In physical integration, the recency of information is ensured at the extraction of the information. As a result, overlapping management of the information sources and integrated results is apt to occur.
One method of achieving physical integration is carried out in such a way that functions to be integrated are configured into integration components as integration logic is written preliminarily in an integration process logic description language called Transformation Description Language (TDL) and are registered with a repository. At the time of execution, the integration components are processed sequentially according to the TDL integration processing logic (see, e.g., U.S. Pat. Nos. 6,014,670 and 6,339,775).
A process of collecting and integrating, in real-time, data present in multiple information sources in response to a request from a user-side is called virtual integration (Enterprise Information Integration (EII)). In virtual integration, necessary information is retrieved and collected from information sources in response to a request from the user-side, and the collected data is integrated and returned as integrated data to the user-side to realize virtual information integration. Virtual integration allows the user-side to acquire real-time information from an information source at the point that the user-side needs the information, thus enables use of fresh information. Used information is discarded to make overlapping information management unnecessary.
For information integration such as the physical integration (ETL) and the virtual integration (EII), a function of converting a format of an original value (From value) into a format of an object value (To value) is essential and is generally referred to as a data type converting function or a cleansing function.
However, the conventional cleansing function is premised on preliminarily determining a combination of data type and type attribute specifically indicating a property thereof, and a type converting function and a cleansing function for converting the type and the attribute and has a problem in terms of expandability. Specifically, no unit exists for expanding the data type and type attribute possessed by a system in advance, and a combination with the cleansing function and a specifiable type attribute must be determined in advance, for example, for character code system conversion for a character code system specifying a type attribute (char_code) that identifies a character code system.
Since no unit exists for ensuring the consistency of type attributes and processes in the case of combining and using multiple data types, type attributes, and cleansing functions, it is problematic that consistency is impaired at the time of expansion. For example, when a data type similar to an existing data type is defined, no unit exits for ensuring consistency between the existing data type and the cleansing function, which increases the burden on the developer.
If data types, type attributes, type converting functions, and cleansing functions are increased, the number of combinations thereof increases, thereby increasing the burden on the developer and problematically complicating management. Since no unit has been provided to efficiently select and use necessary cleansing functions for the many combinations that exist, performance problematically deteriorates.