This invention relates in general to the field of computer systems, and more particularly a method and system for reporting XML data from a computer system.
The Internet and e-commerce are rapidly reshaping the way that the world does business. In addition to direct purchases made through the Internet, consumers increasingly depend upon information available through the Internet to make purchasing decisions. Businesses have responded by allowing greater access of information through the Internet both directly to consumers and to other businesses such as suppliers. One result of the increased access to electronic information through the Internet is a decreased dependency and desire for printed xe2x80x9chard copyxe2x80x9d information.
Extensible Mark-Up Language (xe2x80x9cXMLxe2x80x9d) provides an excellent tool for business-to-business electronic commerce and publication of data via the Internet. XML specifies a format that is easily adapted for data transmission over the Internet, direct transfer as an object between different applications, or the direct display and manipulation of data via browser technology. Currently, complex transformations are performed on data output in legacy computer system formats in order to put the data in XML format.
One example of the transformation from written reports typically output by legacy computer systems to electronic reports is the telephone bill. Historically, telephone companies have relied on mainframe or legacy computer systems running COBOL code to track and report telephone call billing information. Typically, these legacy computer system reports are printed, copied and distributed to those who need the information. However, conventional legacy computer system report formats are difficult to transmit to manipulate electronically. Yet, the electronic distribution of bills, such as through e-mail, a biller""s web site or at a bill consolidator chosen by the consumer, enhances flexibility and control of bill payment, especially with complex business invoices.
Generally, in order to make conventional legacy reports available in different formats, a complex transformation of the data is performed based on a report print system. One transformation technique is to write a xe2x80x9cwrapperxe2x80x9d around the legacy computer system. The wrapper includes parsers and generators that transform legacy computer system reports into XML formatted output. Parsers apply a variety of rules to identify and tag data output in a legacy report. For example, a parser might determine that a data field of a telephone bill represents a dollar amount based on the presence of a dollar sign or the location of a decimal point in the data field, or that a data field represents a customer name due to absence of numbers. Once the parser deciphers the legacy report, a generator transforms the legacy computer system data into appropriately tagged XML format.
Although the end result of the parsing and transforming process is data in an XML format, the process itself is difficult and expensive to implement and cumbersome to maintain. Without careful study of underlying program logic, it is generally not possible to reliably determine all potential outputs from the legacy computer system. In particular, even a fairly large output sample is almost certain to be incomplete in that some program logic is only rarely exercised. Another difficulty with the parsing and transforming process is that, as changes are made to the underlying program applications of the legacy computer system, the parsing and transforming systems generally require updates that mirror the underlying changes. These downstream changes increase the time and expense associated with maintaining the legacy computer system, and also increase the likelihood of errors being introduced into the XML formatted output.
Another difficulty associated with the use of XML is that, although XML dramatically improves the utility of output data, the generation of XML output depends upon underlying programs that adhere to an exacting data structure. For instance, the generation of syntactically correct XML requires adherence to a rigid labeled tree structure so that output data is identified by xe2x80x9ctagsxe2x80x9d and xe2x80x9cend tagsxe2x80x9d associated with the XML data structure as defined by an XML schema. When writing a deeply embedded element of an XML tree, such as a subschema within a defined XML schema, tags corresponding to all of that element""s ancestor elements must also be written. When writing another element, not part of a current XML subschema, the current subschema must be closed off to an appropriate level with balancing closing end tags for the ancestor elements. XML schema also specify type and cardinality constraints on their elements. Thus, substantial and exacting bookkeeping of programs that output XML is necessary with respect to the XML schema in order to minimize errors on the part of programmers.
Therefore, a need has arisen for a method and system which rapidly and automatically modifies legacy computer systems to produce output in an XML format.
A further need exists for a method and system which modifies legacy computer systems to produce output in XML format without altering the underlying legacy computer system program logic or business rules.
A further need exists for a method and system which determines write operations of a legacy computer system to allow modification of those nodes so that the legacy computer system outputs data in XML format.
A further need exists for a method and system which generates syntactically correct XML output with automated bookkeeping to minimize programming errors.
In accordance with the present invention, a method and system is provided that substantially eliminates or reduces disadvantages and problems associated with previously developed methods and systems that transform the output from legacy computer systems into an XML format. The present invention provides XML output by modifying the underlying legacy computer system program applications to report data in XML format instead of transforming the output from the legacy computer system after the data is reported in the format of the legacy computer system.
More specifically, a code generation engine automatically modifies legacy computer system program applications to create modified legacy program applications. The modified legacy program applications are run on the legacy computer system so that the data output from the legacy computer system is in XML format. The modified legacy program applications are written in the computer language of the legacy computer system so that the legacy computer system directly produces an XML version of its output without the need to alter the logic or business rules embodied in the unmodified program applications of the legacy computer system.
The code generation engine creates the modified program applications in accordance with a modification specification created by a mapping engine. The mapping engine generates the modification specification and context table by mapping a model of write operations of the legacy computer system to an XML schema. The mapping engine provides the modification specification to the code generation engine. The code generation engine creates modified legacy computer system program applications for use on the legacy computer system. A writer engine is an application program loaded on the legacy computer system and written in the language of the legacy computer system. The writer engine is called by the modified program applications to write XML output in the format of the XML schema encoded by the context table.
The model used by the mapping engine is generated by a modeling engine which analyzes the legacy computer system to identify and model the write operations, such as with a report data model. The modeling engine determines a list of legacy computer system program applications that report data. The program applications that report data are further analyzed to determine the incidents within each program application at which a write operation exists. A report data model is then compiled with a value and/or type for the data fields of each incident. The report data model is augmented by a formal grammar that simplifies the process of relating write operations to execution paths of legacy computer system program applications.
Once the modified program application is loaded on the legacy computer system, the legacy computer system continues to perform its functional operations without change to the underlying business or program logic. When a legacy computer system program application commands the reporting of data, modified instructions provided in the modified program application call the writer engine to output syntactically correct XML data. The writer engine determines the current context of XML output and opens appropriate schema element data structures in conjunction with the context table. The writer engine then analyzes the current schema element data structure and the called schema element to determine the relationship of the called schema element with the current schema element. If the called schema element is a descendant of the current schema element, the writer engine opens the schema element ID tags down through the called schema element and outputs the data from the schema element in syntactically correct XML format. If the schema element is not a descendent of the current schema element, the writer engine finds a mutual ancestor having consistent cardinality, closes the schema element ID tags up to the ancestor schema element and proceeds to open the schema element ID tags down through the called schema element to output data in syntactically correct XML format. In addition, the writer engine supports delayed printing of tags and attributes until such time as a complete syntactic unit is available.
The present invention provides a number of important technical advantages. One important technical advantage is the ability to rapidly and automatically modify legacy computer system program applications to enable them to directly produce an XML version of their data output. By modifying the underlying legacy computer system program applications, XML output is made available directly from the legacy computer system without a transformation of the data itself from a legacy computer system format. Further, the underlying program logic and business rules remain unaltered so that the substantive functions of the legacy computer system need not change. Thus, a business enterprise using a legacy computer system is provided with the greater accessibility to data provided by output in XML format without affecting computed values.
Another important technical advantage of the present invention is that modification of the underlying legacy computer program applications is operationally less expensive, complex and time-consuming than transformation of legacy computer system output to an XML format. For instance, once modified program applications are running on the legacy computer system, XML formatted output is available without further action to the data. By comparison, transformation of output to an XML format after the data is reported by the legacy computer system requires action with each data report. Thus, if any changes are made to the underlying legacy program applications, changes must also generally be made to transformation applications that mirror the underlying changes. This further complicates the maintenance of the legacy computer system.
Another important technical advantage of the present invention is that, whether or not used with a legacy computer system, the writer engine and context table aid in the generation or syntactically correct XML output. For instance, the writer engine ensures that a command to write an embedded XML element will include tags corresponding to all of the embedded element""s ancestor elements. Also, when an XML element is written that is not part of the current XML subschema, the writer engine will close off the current XML subschema to an appropriate level of an ancestor schema element. Automation of the bookkeeping involved with the XML schema eliminates the risk of syntactic errors associated with XML reports. The delayed printing features provides a mechanism whereby a program can generate correct XML data even when the sequence of print commands in the original legacy system application program does not map directly onto the order of XML elements prescribed by the XML schema.
Another important advantage of the present invention is that tool support manages the complexity of modeling the underlying program logic, resulting in substantially reduced time and expense for modification of a legacy computer system to output XML formatted data. Tools aid in: the determination of the control flow graph of legacy applications; the abstraction out of this graph of a subgraph specifically related to the writing of report lines; the identification of constants and data items that flow into print lines so that the elements that need to be written as tagged XML can be readily identified; and the identification of domain specific information such as locations of headers and footers. Automation through tool support greatly enhances management of program complexity.