A database is a central repository for storing information. Information in a database may be stored in a number of different formats (e.g., datetime, numeric, text, binary, etc.). A character set maintains the correspondence between binary codes and textual characters in a particular language. Some character sets are known as code pages. In many situations, multiple types of systems may be accessing many different types of databases. The situation is complicated further because each system and each database may have different formats for storing and retrieving data. For example, a single database may be accessed by systems in the U.S. and in Russia, yet data may be stored in a single character set not readable by all accessing systems. Because English and Russian have different character sets associated with them, even by the same system, storage of information in one character set would require conversion for a system accessing the database with a different character set. As such, databases store information in different character sets which may not be readable by all systems. Additionally, some databases treat multiple character sets differently. Therefore, information may need to be converted into a character set that is readable by a system attempting to retrieve that information.
As a result, existing systems convert all retrieved data to an international character set, such as Unicode, and then into the character set requested by a data manipulator, typically called an "Activity." This causes a significant delay in the amount of time spent processing a request. As shown in FIG. 1, existing systems request and retrieve data from a database as in step 110. The system then converts the data to a universal character set, for example, Unicode, as shown in step 120. Step 130 converts the Unicode into the target character set. In step 140, the Activity performs some operation on the data.
For example, an Activity may request data from an Oracle database to be stored in a DB2 database. The Activity uses a data provider/consumer, typically called a Link, to retrieve the data from the Oracle database. The Link converts the DB2 character set into Unicode, and then converts it into the target character set. A system typically processes an Activity in the following manner: The Activity requests data from an Oracle database, for example, a stream of text containing the letters "ABC," in the character set "code page 437." The data is converted into an international character set, such as Unicode. Because DB2 does not support this character set, the Unicode must be converted to a character set supported by the database. In this example, the Unicode is converted into code page 857, as the text corresponding to the representation of "ABC" in code page 857.
Therefore, information from one database being moved to another database supporting the same character set would still undergo multiple conversions. For example, if two databases support code page 932, an Activity requesting data be moved from one database to another (or within the same database) is retrieved as code page 932. This character set is already in the target character set, but because the conversions are done automatically, the character set could be converted to Unicode and then be converted back to code page 932. This results in two conversions occurring when there is no need for any conversions to take place. As data may be transferred to several databases, numerous (and possibly unnecessary) conversions may occur. Depending on the amount of data retrieved, the time spent converting the information could be significantly large.
These and other problems exist in existing systems.