In the context of computer compilers and translators, a "label" is generally defined as a string of characters. A label translator is a software tool (computer program) employed by compilers and translators for translating valid labels from a "source" language into valid labels in a "target" language.
One example of where label translation is necessary is when a computer aided engineering (CAE) product, e.g., a software circuit board description, must be integrated with an altogether different software system having a compiler that employs different labels than those used in the circuit board description. A problem arises when a label that is valid in one system (e.g., the source system) is not valid in another (e.g., the target system), for example, when the label contains a character that is illegal in the target language. When this occurs, the offending character(s) must be either mapped (in a one-to-one manner) into valid characters in the target system, thereby forming a valid label in the target system, or replaced with an "escape sequence" consisting of an "escape character" followed by a sequence of other characters. The escape sequence is also known as an "expansion sequence". Typically, the escape and expansion characters are specified by the system user or application developer.
Methods of performing label translation are well known in the art. See, e.g., Aho, Alfred et al., "COMPILERS: Principles, Techniques and Tools," pp. 92-105 and 113-158, Addison-Wesley Publishing Company, March 1988 (ISBN 0-201-10088-6); Jones, D. S., "Elementary Information Theory," chp. 2, Oxford University Press, 1979 (ISBN 0-19-859637-5); Tanenbaum, Andrew S., "Computer Networks," chp, 4, Prentice Hall, Inc., 1981, (ISBN 0-13-165183-8); and, McNamara, John S., "Technical Aspects of Data Communication," chp. 17, 18, Digital Press, 1977 (ISBN 0-932376-01-0), all of which are incorporated herein by reference. Until recently, label translators had been manually constructed. Recently, however, tools have been developed for automatically constructing label translators based upon descriptions of valid target language labels provided by the user. In one automatic label translator generator, the user-provided description is contained in a file called a Translation Configuration (TC). As discussed more fully below, the TC contains, among other things, a description of valid labels recognized by the target language. This description is in regular expression (RE) format. The TC is read by the translator generator, which then parses the TC and produces a state machine (SM) representation of the REs. Such a method is described in the aforementioned Aho et al. reference entitled "Compilers; Principles, Techniques, and Tools."
FIG. 1 illustrates the overall process involved in automatically generating a label translator as known in the prior art. A user-defined TC, 1, is read by the translator-generator program, as shown at 3. The translator-generator program converts the RE description in the TC into an SM. The SM is output to an SM file as shown at 5. At this point, the work of the translator-generator 3 is completed. The SM file 5 is read by a general translation implementation program, as shown at 7, which performs the actual translation of source language labels into target language labels, as shown at 8 and 9.
A problem with these label translator generator programs is that, unbeknownst to the user, the user-specified escape and/or expansion characters are sometimes inappropriate, or even invalid in certain situations. The user of such a program, who is usually not an expert in the art of label translation, does not become aware of the problem until compilation or translation is attempted, and one or more error messages are provided. The user must then revise the escape and/or expansion characters in the TC and rerun the source program through the compiler. The process might be an iterative one that is time consuming and inefficient.
It is therefore desirable to provide a method for use in connection with an automatic label translator that will minimize the errors and time required in generating a valid set of escape and/or expansion characters. The present invention achieves this goal.
Before proceeding to a description of the present invention, it is helpful to define some relevant terms employed in connection with label translation. It should be understood that these definitions are provided solely for the purpose of providing a complete understanding of the invention, and should not be construed as limiting the scope of the invention in any respect, except as may be recited by the appended claims.
Escape Sequence Encoding. Translators which employ escape sequence encoding convert illegal characters into an escape sequence containing the escape character followed by, for example, a group of "digit" characters. This sequence of digit characters represents the numeric value of the illegal source label character. The escape and digit characters are specific to the target language. Therefore, the escape character will often not be the ASCII value 27 (i.e., the ASCII escape character), since that character is not a valid character in most languages. In addition, the digit characters are not necessarily limited to "0" through "9".
As an example of how escape sequence encoding works, assume that the escape character is defined as "X" and the digit characters are "0" through "9". If the target language alphabet only contains the characters "A" through "Z" and "0" through "9", the label AB"CD would be translated to ABX034CD, where 034 is the ASCII value (in decimal) for a quotation mark (").
Valid Label Specification. Production of legal labels from a translation requires definition of the legal labels in the target language. For example, most languages allow the characters "0" through "9" to be used in labels, but not as the first characters. To allow as much latitude as possible, regular expression (RE) notation is generally used to describe legal labels.
Since label length is limited in many languages, the maximum label length may be specified in the TC. In addition, particular reserved words, or "keywords", may also be specified to ensure that these labels are not generated by the translator. Together, all of this data forms the valid label specification.
Translation Configuration (TC) File. The TC file has been previously explained. An exemplary TC file is presented below. The target language for this example is VHDL (a hardware description language). Note that comments in the TC file are preceded by the characters "//". In this example, the statements "target", "label", "length", and "keywords" are required to appear in the TC and the "escape" and "digits" statements are optional.
The "target" statement specifies the name of the target language. This string is used to produce appropriate names for files in the generated translator source code. The "label" statement defines the syntax of valid labels in the target language. The form of this statement is a RE. The "length" statement specifies the maximum label length allowed in the target language. The length can be specified as either "UNLIMITED" or a decimal number.
The "escape" statement specifies the character to be used for indicating the start of an escape sequence. The "digits" statement specifies the numeral characters to be used to represent the numeric values of illegal characters in escape sequences. The "keywords" statement (under "reserved words") specifies the labels which must not be generated by the translator.
__________________________________________________________________________ EXAMPLE __________________________________________________________________________ //Exemplary Configuration File for Translation to VHDL //This configuration file for VHDL was derived from the //"IEEE Standard VHDL Language Reference Manual" (IEEE Std 1076- //1987) published March 31, 1987. References to sections in this //configuration file indicate the relevant sections of that //manual. target = "VHDL"; // Section 13.3 -- Definition of "identifier" // Note that although upper and lower case are valid, they are // considered equivalent. Therefore, only one case should be // used (upper). label = [A-Z] ([.sub.-- ]? [A-Z0-9])*; // Section 13.3 -- "All characters of an identifier are // significant, . . ." length = UNLIMITED; // The selected escape character (always valid but not common). // Note that ".sub.-- " is not valid (can't be first character). escape = Z; // The selected expansion number system (hex is easy to // understand) digits = "0123456789ABCDEF"; // Section 13.9 -- "Reserved Words" keywords = "ABS" , "ACCESS", "AFTER", "ALIAS", "ALL", "AND", "ARCHITECTURE", "ARRAY", "ASSET", "ATTRIBUTE", "BEGIN", "BLOCK", "BODY", "BUFFER", "BUS", "CASE", "COMPONENT", "CONFIGURATION", "CONSTANT", "DISCONNECT", "DOWNTO", "ELSE", "ELSIF", "END", "ENTITY", "EXIT", "FILE", "FOR", "FUNCTION", "GENERATE", "GENERIC", "GUARDED", "IF", "IN", "INOUT", "IS", "LABEL", "LIBRARY", "LINKAGE", "LOOP", "MAP", "MOD", "NAND", "NEW", "NEXT", "NOR", "NOT", "NULL", "OF", "ON", "OPEN", "OR", "OTHERS", "OUT", "PACKAGE", "PORT", "PROCEDURE", "PROCESS", "RANGE", "RECORD", "REGISTER", "REM", "REPORT", "RETURN", "SELECT", "SEVERITY", "SIGNAL", " SUBTYPE", "THEN", "TO", "TRANSPORT", "TYPE", "UNITS", "UNTIL", "USE", "VARIABLE", "WAIT", "WHEN", "WHILE", "WITH", "XOR"; __________________________________________________________________________