1. Field:
The disclosure relates generally to an improved data processing system and more specifically to user identifier management. Even more specifically, the disclosure relates to a method, computer program product, and apparatus for managing user identifiers.
2. Description of the Related Art
Users of data processing systems are commonly identified using a user identifier. A user identifier is a name that uniquely identifies the user in the data processing system. The user identifier is used for many tasks in the operation of the data processing system. For example, the user identifier may be used to generate log entries associated with the user, store the user that created or modified a file, or other suitable purposes. One example of a user identifier is “JohnSmith.”
User identifiers may also be used by international users that communicate in languages other than English. The user identifier for the international user may contain characters not present in the English language. For example, a user identifier for a Chinese user may contain Chinese characters. Characters in English and other languages entered into a data processing system are mapped into code points before the characters are stored. Mapping, as used herein, means performing a translation. For example, a data processing system may map a character into a code point in a standardized character code system by translating the character into the code point that corresponds to the character in the standardized character code system. The code point uniquely identifies the character from all the possible characters known to the data processing system. A code point is a collection of bits that may be represented by letters, numbers, symbols, or a combination of letters, numbers, and/or symbols.
The code points are standardized among data processing systems so characters appear the same on different data processing systems presenting the same data. In other words, multiple data processing systems use the same code points to identify the same characters. One example of a standardized code system for characters is Unicode. In Unicode, the letter ‘a’ is mapped into the code point U+0061. Characters in other languages are mapped into code points as well. For example, the letter “ö” is mapped into the code point U+00F6. In these examples, the code points are represented with four or more hexadecimal numbers.
In a standardized code system, some characters in the system may have one or more character variants. As used herein, a character variant is a character that appears visually similar to another character, but has a different code point in the standardized code system. For example, the character “” has the code point U+5317, while the character “” has the code point U+F963. The characters may appear to a human to be visually similar, but a data processing system stores the characters as different code points without being related.