The application has particular, though not exclusive, application to compiling of C source code. A C compiler must normally handle the Standard C Character Set which consists of at least 91 characters (upper and lower case alphabetic characters, numbers, and symbols representing various operators). The characters are encoded in the source code with numeric values or “code points.” Code points representing the complete sets of valid C characters are generally referred to as “character coded set identifiers” (CCSIDs). During initial phases of compilation, a lexical scanner module usually converts raw source code into corresponding tokenized target code. The lexical scanner relies on a character identification table embodying a particular CCSID to interpret the source code and identify individual characters.
The lexical scanner associated with a C compiler expects to receive source code associated with a particular character set. However, various CCSIDs are used to encode the Standard C Character Set, and source code incorporating one character set must be converted before the code may be compiled with a compiler that uses a different character set. Utilities are commonly available to convert C source code from one standard character set to another.
It would be desirable to provide a lexical scanner that can adapt itself dynamically to scan source code in different standard character sets and to produce corresponding target code in different standard character sets.