Migrating, switching, or porting a system from an old one to a new one can be complex, intensive and error-prone. For one example, data, such as databases and files, may be difficult to port from a SOLARIS® or HP-UX® operating system (OS) to an AIX® OS. Another common example is the porting of data from an ORACLE® database to a DB2® database.
Data migration often requires data encoded content updates from the original system to the new one. Data incompatibility issues may occur because new systems might use different encoding standards or different versions of the same encoding standard. There are many encoding conversion tools, such as but not limited to Unix command and API “iconv,” for converting data from one encoding to another, or from one version to another in the same encoding system. However, such tools often require knowledge of the original encoding version, which is often difficult or even impossible to determine because databases and file systems may not store data encoding version information. For example, for a database table that was created in 1995 to store UTF-8 data, data might have been stored in Unicode from version 3 to version 6 during the last 15 years without specifying what data record was stored in what version of the Unicode.
Moreover, code points for certain characters may be vendor-defined in Private User Area (PUA) of the Unicode table. Different vendors may define different code points for the same character or same code point for different characters. Later, the vendor-defined PUA character may be deprecated and promoted into an official character in the Unicode table with a new code point. A document or database containing such characters may be incompatible among different systems, and thus may not be convertible from one system directly to another.
Conflicts often exist in accumulated records collection databases, storages, or file systems such as health record management system, digitized libraries, and banking databases. After character mapping has changed from an original system to a migrated, or switched/ported, system, the characters encoded as old value may not be accessible. Such records are called “ghost records.” Moreover, ghost records may be lost or corrupted during data/system migration due to the lack of a character mapping deriving mechanism on current migration tools. Further, users may not realize that there are ghost records or corrupted data in their migrated systems because it is hard to get warnings to report, trace and correct ghost records during system migration.