COBOL is believed to be still one of the most widely used programming languages. It is a high-level programming language designed for business data processing. When COBOL was developed, one of the design goals was to make it as English-like as possible. As a result, COBOL uses structural concepts normally associated with English prose such as section, paragraph and sentence. However, as computer programs have become more complex, today's programming languages have become “object-oriented” to reduce the complexity of the programming process. For example, C++, an object-oriented extension of the C language, has achieved widespread acceptance today.
There is a need to gain the advantages of object-oriented programming that modern software provides yet still not lose the benefits of the large base of legacy COBOL code that has already been written. One method to accomplish this is a source code language translator. However, due to the inherent differences in the data structures of COBOL and object-oriented languages such as C++, the code that is generated by current source code translators is disorganized, hard to read, and difficult to maintain. A number of patents and published applications exist which relate to source code translators, including, U.S. Pat. Nos. 6,002,874, 6,269,474, 6,453,464, 6,467,079, 6,523,171, and 6,526,569; all of which are incorporated herein by reference in their entirety to the extent they are not inconsistent with the explicit teachings of this specification.
COBOL's unique English-like structure can be seen in its organized divisions. Specifically, a typical COBOL program consists of four divisions: Identification Division, Environment Division, Data Division, and Procedure Division. The Identification Division is used mostly for documentation. It contains paragraphs specifying the program name and optionally author, date and other information. The Environment division contains information linking the logical file and device names used in the program with their physical counterparts. The Data Division contains the definitions, data-names for all file, record, group and single data items, and (optionally) initial values for them. All data-names referenced in the program must be defined in this division. All control and logical instructions for the program are contained in the Procedure Division. It normally consists of a series of paragraphs/procedures, each of which performs a logical function. The two divisions we are mainly concerned with here are: (1) the Data Division for data and (2) the Procedure Division for instructions that operate on the data. In object-oriented programs, however, structures which include data and the code operating on that data are combined into “objects”. Manipulations of the internal data of the object are carried out by executing member functions of the object.
Specifically, the Procedure Division in a COBOL program contains the statements (instructions) that convert input data into output. Statements are either action statements (those that perform a specific processing action) or control statements (those that determine the sequence in which statements are executed). Statements are organized into named paragraphs/procedures, each of which is designed to perform a specific task. In a COBOL program, execution begins with the first statements in the first paragraph of the Procedure Division. The first paragraph is generally referred to as the “main module”. The main module provides the top-level logic flow-of-control for the program as a whole. Program execution should both begin and end in the main module. Other paragraphs are invoked as needed, and may themselves invoke other paragraphs, in a hierarchical fashion.
The Data Division is an important division in a COBOL program. The Data Division defines all input and output data structures, as well as those used internally for data manipulation and program control. There are two sections of the Data Division common to most COBOL programs: the File Section that defines file I/O (input/output) buffer areas and the Working-Storage Section where internal data structures are defined. There is optionally a third section called the Linkage Section which is the same as the Working-Storage Section but specifies the data structures which may be used to pass data from one COBOL program to another.
COBOL data items are references to memory locations that can hold a value that may change during program execution. Every data item must be assigned a name, type, and length in the Data Division. The Working-Storage section is used to define data-names not associated with an input/output buffer area (a file description). These data-names are commonly those required for temporary data storage, calculations and switches for program control. Both group and single items can be defined, each with an associated type and length. In addition, an initial value can optionally be assigned to each single item with a COBOL ‘VALUE’ clause.
One difficulty of source code translation of COBOL programs arises with respect to references to variables. COBOL provides a consistent way of referencing variables regardless if it's a group variable or single variable. COBOL can access variables with a single data-name, while most modern languages will use a single data-name (or variable name) to access a single variable but will have to use multiple data-names to access a single variable if it exists inside of a group variable (or data structure). Each level of grouping requires another data-name to be specified when accessing the single internal variable. Many source code translators translate variables from COBOL to C on a ‘one-for-one’ basis and therefore have to compensate for the syntax differences thus losing COBOL's simplicity.
For example, in one-to-one translation, COBOL data structures become C language ‘struct’s and COBOL ‘REDEFINES’ become C language ‘union’s. In addition, when a COBOL data structure of strings is translated into a C language ‘struct’ data structure, C functions like ‘memcpy( )’ and ‘memset( )’ are called upon to access the data because normal strings in C are null-terminated and in COBOL they are not. The resulting code is so different from the original that the ability to read and understand the intention of the initial COBOL programming becomes unduly difficult and may be lost.
In addition, many of the existing ‘COBOL to C++’ Translators are really ‘COBOL to C’ Translators. While the output produced can be compiled with a C++ compiler they do not utilize any of the extensions provided by C++. These translators are not really producing C++ code, but are producing C code.
It is a well-known feature of object-oriented languages that classes can be used to create new data types. While some source code translators create or use object classes that emulate individual variable types, they do not use an object class that emulates the entire COBOL's Working-Storage Section and therefore produces code that is much more difficult to read and understand than the original code.
Accordingly, there is a need in the art to create a method for translating COBOL source code to source code of an object-oriented language such as C++ that will result in code with the advantages of being object-oriented and still maintain the readability, structure, and functionality of the original COBOL program.
The present invention is designed to address these needs.