To develop software applications in a procedural language code such as COBOL, it appears relevant to create design models including a representation for data manipulated by the programs of the application. There are three categories of data that a program manipulates: (1) the resources of a program that can be files, tables, message queues etc. . . . , (2) they're are data that the program manipulates and which come from external sources. The external data are manipulated by a program, they are the parameters that the program can accept or exchange and which are passed in memory by one other program which is called. The representation of an external data will provide, for instance, the definition of a linkage section in COBOL language. (3) the local data; local data definition strongly reflects the detailed implementation of the program.
Traditional methodologies, such as Merise [1], that are dedicated to procedural programming, describe the definition of the data, both at the logical and physical layer. It also describes nested models for the logical processing. With Merise, the resources and external data are satisfactorily modelized with the physical layer. The behavior of the processing includes the use of the data by the concept of a task. A task can be mapped to an abstraction of a program.
Software applications can also be modelized using a standardized specification language such as Unified Modeling Language (UML), a visual specification language. The models, such as the class diagrams, created with UML are then used to derive the actual programming code. UML is often used for Object Oriented development but is more commonly adopted by the industry for high level software design in general. These design elements are further transformed into UML logical artifacts that will target artifacts of a programming language, commonly, Object Oriented language. The benefits of using UML during the design phases and describing UML logical artifacts include the ability to conduct design review with a commonly accepted, high design language. It allows capturing the critical information that describes the application without detailing it down to the implementation level, and applying Model Driven Architecture techniques that provide code generation.
Procedural languages are still widely used in the mainframe world. One example is COBOL. It is different from object oriented language knowing that UML modelization particularly fits object oriented languages. As a matter of fact, an object oriented language is constituted by a set of classes which can be directly modelized by UML. On the contrary, a procedural language includes operations on resources and data which cannot be directly represented as a class diagram. The traditional representation of a procedural language is a flow chart.
Being able to extend UML logical artifacts for a procedural language such as COBOL would unify asset representation in an enterprise. There is no direct, obvious use of UML graphical modelization today to represent the logical artifacts of a COBOL program.
Use of profiles in UML 2.0 allows, for instance, defining a database schema using class diagrams. However, the use of the tables in the database by a procedural program cannot be directly and obviously represented by UML 2.0.
Existing literature describes some techniques using UML diagrams to visually exhibit resources and data manipulated by existing COBOL programs. This visualization can be used as design documentation of the COBOL program. However, this design documentation is close to the COBOL code, it cannot be used by UML to specify a data structure independently from its physical implementation. These techniques do not provide a method to define the resources and data manipulated by a program in a way that is consistent with the UML design models which capture logical definitions of both program and data.
One illustration of the technique of visually exhibiting existing COBOL program code is provided in the Web page at the following address:
ibm.com/developerworks/rational/library/content/RationalEdge/sep02/Auto matedSep02.pdf [2]
This article focuses on how to best represent in UML the elements of the COBOL language. The representation of the COBOL language elements allows documenting, enhancing or refactoring the COBOL code. The documentation design is close to the code and does not provide design level which is independent from the language. FIG. 4 on page 11 defines the “participating tables” of a program. UML 2.0 allows definition of stereotypes which are a new type of model element, to characterize a specialized usage and attach specific properties. A stereotype can be defined for various UML elements such as class, operation, attribute, relationship etc. . . . In this prior art a program is represented as a UML 2.0 stereotyped class and the participating tables as stereotyped classes, relationships being established between the program class and the table classes. Such a method works well in a reverse engineering scenario where you want to capture all the details of existing code and expose them in UML.
To create a UML design of a COBOL program we proceed from architectural design to COBOL specific design (what is commonly called top-down approach: from design to code). Not all the details of COBOL language are relevant to be captured at this stage of the development. It is critical in a top down scenario to have representations that are consistent with the logical definitions that have been captured at the design stages and to extend these representations rather than use new ones that follow a different logic.
There is thus a need to capture at a high level of abstraction the definition of a program and the resources it manipulates, their types and their logical definition, that integrates nicely with the UML representations used during design phases. Capturing the meaningful set of information regarding the program definition enables architecture reviews and code generation in a way that is consistent with the existing high level diagrams.
ego.developpez.com/uml/tutoriel/cobol/uml-cobol_v 1.0. pdf [3]
proposes a method to use UML and specifically targets COBOL. This method explains how to represent data in UML that is COBOL oriented and how to define programs. Programs are described from the Operations of a Class, which are close in concept. Representing the program through the means of an operation of a class is key in the goal of extending a general architectural scheme to add COBOL specific extensions. Since relationships can only be drawn from a class to another class in a UML class diagram, it is not possible with this representation to use relationships in order to describe the “participating tables” of a program as done with prior art [2]. The signature of the program and its parameters are the concepts that define the data manipulated by an operation in UML. In this prior art, the parameter data points to class definitions that are used to define the external data of the program. The external data will provide the definition of the linkage section of a COBOL program. The linkage section is a dedicated construct to describing the parameters that a program can accept or exchange when called from another program.
This method is a good starting point to describe COBOL programs in a way that is consistent with the high level UML design techniques. However, it does not allow the capturing of some important information about a program: it does not provide a method to capture all the resources that a program manipulates, such as files or relational tables.
This is a limitation of the method since a significant amount of classes defined in an architectural class diagram are designed to describe data definition and will be used as persisted data, such as files, tables of a database (hierarchical or relational) message queues, etc. Capturing only the data that programs share with other programs is not enough. Thus this method lacks completeness to fully describe the data that the program manipulates, when such information is likely to be already captured in the class diagram.
Applying the techniques of prior art [2] to prior art [3] would lead to apply a dedicated stereotype to select Data Definition classes. Doing this is not satisfactory for several reasons:                the logical definition of business data needs to be preserved and separated from its physical description. Stereotyping the logical definition of data definition classes as a physical Resource mixes the two layers.        It is very common for a program to receive from another program a data structure of a given type and to manipulate a Resource based on the same type (for example, receive the current customer data and have access to the customer table). There is duplication when the same logical definition needs to be expressed as several physical representations.        
There is thus a need for a designer tool that would extend the common UML representations and describe in a consistent way with prior high level UML design the data that a program manipulates, whether it is a resource coming from external sources or is an external data passed in-memory from another program. This representation should preserve the definition of logical data that comes from the first step of UML modelization.