The Internet and e-commerce are rapidly reshaping the way that the world does business. In addition to direct purchases made through the Internet, consumers increasingly depend upon information available through the Internet to make purchasing decisions. Businesses have responded by allowing greater access of information through the Internet both directly to consumers and to other businesses such as suppliers. One result of the increased access to electronic information through the Internet is a decreased dependency and desire for printed “hard copy” information.
Extensible Mark-up Language (“XML”) provides an excellent tool for business-to-business electronic commerce and publication of data via the Internet. XML specifies a format that is easily adapted for data transmission over the Internet, direct transfer as an object between different applications, or the direct display and manipulation of data via browser technology. Currently, complex transformations are performed on data output in legacy computer system formats in order to put the data in XML format.
One example of the transformation from written reports typically output by legacy computer systems to electronic reports is the telephone bill. Historically, telephone companies have relied on mainframe or legacy computer systems running COBOL code to track and report telephone call billing information. Typically, these legacy computer system reports are printed, copied and distributed to those who need the information. However, conventional legacy computer system report formats are difficult to transmit or manipulate electronically. Yet, the electronic distribution of bills, such as through e-mail, a biller's web site or at a bill consolidator chosen by the consumer, enhances flexibility and control of bill payment, especially with complex business invoices.
Generally, in order to make conventional legacy reports available in different formats, a complex transformation of the data is performed based on a report print stream. One transformation technique is to write a “wrapper” around the legacy computer system. The wrapper includes parsers and generators that transform legacy computer system reports into XML formatted output. Parsers apply a variety of rules to identify and tag data output in a legacy report. For example, a parser might determine that a data field of a telephone bill represents a dollar amount based on the presence of a dollar sign or the location of a decimal point in the data field, or that a data field represents a customer name due to absence of numbers. Once the parser deciphers the legacy report, a generator transforms the legacy computer system data into appropriately tagged XML format.
Although the end result of the parsing and transforming process is data in an XML format, the process itself is difficult and expensive to implement and cumbersome to maintain. Without careful study of underlying program logic, it is generally not possible to reliably determine all potential outputs from the legacy computer system. In particular, even a fairly large output sample is almost certain to be incomplete in that some program logic is only rarely exercised. Another difficulty with the parsing and transforming process is that, as changes are made to the underlying program applications of the legacy computer system, the parsing and transforming systems generally require updates that mirror the underlying changes. These downstream changes increase the time and expense associated with maintaining the legacy computer system, and also increase the likelihood of errors being introduced into the XML formatted output.
Another difficulty associated with the use of XML is that, although XML dramatically improves the utility of output data, the generation of XML output depends upon underlying programs that adhere to an exacting data structure. For instance, the generation of syntactically correct XML requires adherence to a rigid labeled tree structure so that output data is identified by “tags” and “end tags” associated with the XML data structure as defined by an XML schema. When writing a deeply embedded element of an XML tree, such as a subschema within a defined XML schema, tags corresponding to all of that element's ancestor elements must also be written. When writing another element, not part of a current XML subschema, the current subschema must be closed off to an appropriate level with balancing closing end tags for the ancestor elements. XML schema also specify type and cardinality constraints on their elements. Thus, substantial and exacting bookkeeping of programs that output XML is necessary with respect to the XML schema in order to minimize errors on the part of programmers.