1. Field of the Invention
The present invention generally relates to a system, method, and computer program for reverse engineering a program application to obtain a markup language template that was used in generating the program application. More particularly, the present invention relates to reverse engineering a program application in a legacy programming language, where the program application was specifically designed using a markup language template to process documents in the markup language, in order to obtain a markup language template substantially identical to the one originally used to create that program application.
2. Related Art
Markup languages are commonly used in programming, particularly with newer programming formats and web based programming. There are several types of markup languages, including, but not limited to, HTML (Hypertext Markup Language), XML (Extensible Markup Language), XHTML (Extensible Hypertext Markup Language), etc. Generally, markup languages are programming languages in which the content of a document is marked with tags that provide information indicative of formatting, structure, font, content type, etc. More specifically, the markups (tags) are provided to the content of a document to indicate relevance or purpose of that content, or of portions thereof. Thus, when the document is read by a computing system designed to handle (i.e., process) the markup language, a program known as a parser can identify and extract the relevant content for which the type and/or purpose has been indicated by the tags. Thus, markup languages provide a simple and convenient way to represent data to be read and processed by a computer.
The examples provided herein will be discussed with respect to Extensible Markup Language (XML). One of ordinary skill in the relevant art will understand that the description of the invention also pertains to other markup languages.
While markup languages are gaining in popularity, many legacy program languages (e.g., COBOL, FORTRAN, BASIC, etc.) still used by programmers and institutions are not designed with built-in functionality for understanding markup language documents. These older programming systems could be replaced with object oriented program applications and/or web-based applications to solve the incompatibility between older legacy programs and modern markup language documents. However, older legacy programs still form a vital part of many programming systems, and the replacement of the same could be expensive and complicated. Consequently, these legacy program systems are likely to last into the foreseeable future.
These facts have led to the generation of program applications that run in legacy program environments and are specifically tailored to read markup language documents and convert the data contained therein to a form suitable for the legacy program language. Similar programs are available for generating markup language documents in legacy programming environments.
Programs for providing such functionality to legacy systems are described in, for example, U.S. patent application Ser. Nos. 10/906,020 and 10/906,018, both of which were filed on Jan. 31, 2005. Both of these applications are incorporated by reference herein.
Those applications describe systems for generating program applications in a legacy program environment, such as COBOL, in order to process a markup language document, such as an XML document. The methods for generating such program applications start with a markup language template which preferably includes all or most of the relevant markup language indicators (i.e., the tags which are used to identify content in a markup language document). Typically, the markup language template provides a description of all of the tags that the subsequently generated program application running on the legacy system can expect to encounter in processing documents in the markup language. The template operates as an example document which provides the necessary information to build a program application for processing future documents.
Preferably, using the template, a generation tool is used to create a copy book in the legacy language. A copy book is a file structure outside of the program which is copied into the program. Such copy books are understood by one of ordinary skill in the relevant art. The copy book is used to create an intermediate application programming interface (API). This interface is the program application which acts as a bridge between the markup language and the legacy program and, at run time, converts the tags of the markup events into a format which the legacy environment understands.
Ultimately, the program application written in the legacy language includes fragments of the original markup language template, which have been parsed out in order to create the necessary data structure of the program application. In essence, the parsing involves breaking down the hierarchy of the tags of the template into simple events, with the events being used to write the program application in the legacy environment.
With the necessary program application developed, the application can be implemented in the legacy system in order to allow the legacy system to read, process, and/or generate documents in the defined markup language. This gives the legacy system the ability to converse with more modern data formats and process documents it would otherwise not understand.
Once implemented, the program application can continue to provide the newfound functionality to the legacy system. There are, however, instances in which upgrades are necessary. In upgrading a system, it may be necessary to provide additional or alternative markup tags, which the original template did not address, or otherwise alter the data structure in the program application pertaining to the markup language. Such upgrades can be handled easily if the original template is available for modification. Problems arise, however, in that the original template is manually managed, and it is up to the developer to ensure that it is properly retained for later use. If the original template is not properly retained, the upgrading of the system can become complicated and time consuming.
Thus, what is needed is a system and method for reverse engineering a program application in the legacy environment to parse out and reconstruct the original template in the markup language, when the original template is not otherwise available.