1. Field of the Invention
The invention relates to a technique, specifically a method, apparatus, and article of manufacture that implements the method, to determine a target data type in a heterogeneous environment.
2. Description of the Related Art
Computer software systems typically process data. For example, a computer software system may be an application program or a “system” program. Examples of application programs include, and are not limited to, an information integration system, a database management system, and a spreadsheet program. Examples of system programs include, and are not limited to, an operating system and a file system. Typically, an application program relies on at least a portion of a system program to perform its function. Some computer software systems may be coupled to a repository to store data in persistent storage.
In a computer software system, data is typically associated with a data type that defines the data. Some exemplary data types include, and are not limited to, a numeric type, string type, a date type, a time type, and a binary large object type. Some exemplary numeric data types include, and are not limited to, integer, short integer, long integer and floating point.
In FIG. 1, an exemplary database table 10 of a database management system has rows 12 and columns 14 to store data. A row 12 is also referred to as a record. A data type is associated with each column to define the type of data that is contained in that column. For example, the data type for column one 16 is integer and the data type for column two 18 is string.
In a heterogeneous environment, data may be stored in various repositories. The repositories include, and are not limited to, the tables of database management systems, spreadsheet files, flat files, text files such as email, extensible markup language (XML) documents, web pages, image files, and audio or video data files. A repository may be a source of data for a query, and a target when a data value is assigned in an update or an insert. A single query may be used to retrieve data located on any one, or a combination, of the data sources. The repositories may represent the same or related data differently. In other words, related data from different repositories may have different data types.
Typically, in a heterogeneous environment, the data types supported in various software systems and the semantics related to the data types are highly diverse. Different software systems may associate different data types with the same or related data. When data is transferred between software systems, the software systems typically transform the data type of the input data, and output the data with a data type that is different from what was input. In addition, the software systems may be interconnected by software interfaces that may transform the data type of the data as it passes through the interface. Therefore, a consistent view and behavior related to the data may not be provided.
The updating or inserting of data into a repository is referred to as an assignment. When assigning data in a heterogeneous environment, the semantics for the assignment of the data as it passes through various software systems and interfaces are unclear, and the result of the assignment is inconsistent and unpredictable. The source data may pass through multiple levels of software systems and interfaces before reaching a target repository at the lowest level. In the process, the data type associated with the data may be altered multiple times. The uncertainty of when and how the data type is altered may produce inconsistent and unpredictable results for the assignment.
In addition, when an assignment updates or inserts data into multiple repositories, the semantics to determine the target type in the multiple repositories are unclear and may produce inconsistent results. For example, when integrating data from multiple repositories, data having different data types in different software systems is presented as a “union all” view to provide a single uniform view of the data. When data is assigned across the underlying target software systems of the union, the data type may vary across the underlying target software systems inconsistently and unpredictably.
Some software systems use a data type mapping mechanism to map data types between different software systems. For example, when mapping tables of different database management systems, the data types are individually mapped column-by-column. Using this data type mapping mechanism, data in one software system can be viewed from another software system; however, the assignment semantics are unclear and the results of an assignment are inconsistent and unpredictable.
Therefore, there is a need for a method, apparatus and article of manufacture implementing the method, to provide consistent and predictable results when assigning data in a heterogeneous environment. The technique should also provide consistent and predictable results when assigning data to a target computer system in a multi-level environment. In addition, the technique should provide consistent and predictable results when assigning data in a multi-target environment.