The illustrative embodiments described in the present application are useful in systems for providing language neutral data exchange and more particularly are useful in systems including those for customizable electronic bill presentment and payment (EBPP) systems.
Several markup languages are known for the representation of information. For example, the Standard Generalized Markup Language (SGML) is a general-purpose markup language and has been standardized by the International Standards Organization. The Extensible Markup Language (XML) is another markup language derived from SGML (ISO 8879). The World Wide Web Consortium has published reports, standards and recommendations in these and other areas.
XML is a metalanguage that is a subset of SGML and that can be used to define the syntax of documents containing structured data. XML provides a language neutral data exchange format with nested tags that can be used to represent complex data structures in a text file. XML documents obey syntax rules. XML is extensible and can be used to create its own markup languages. Hyper-Text Markup Language (HTML) is a non-extensible markup language used with the World Wide Web (WWW) that includes syntax and presentation information. HTML uses loose structures that makes it difficult to process HTML documents effectively. However, XML documents are well structured. Each XML document has a root element and all elements must be nested within other elements.
XML and HTML are both markup languages, where tags are used to annotate data. In HTML, the syntax and semantics of the document are defined. HTML alone can be used to create a visible presentation to the user. XML allows you to define document syntax.
XML documents include elements that provide the logical structure of the document and entities that provide the physical structure of the document. The document will include markup tags having delimiters to separate the markup from the character text. XML text sometimes refers to character data and the markup information, not the character text alone. XML documents are characterized as a valid document or a well-formed document. A Document Type Definition (DTD) or XML Schema is used to define a valid XML document. The XML syntax allows the definitions of elements that have attributes and links. The DTD defines structural constraints and defines element types, attributes, entities and notations. The DTD defines the order as well as the occurrence of elements.
While HTML has presentation information embedded, XML uses Stylesheets such as extensible Stylesheet Language files (XSL) to define the presentation of the data. For example, one XML may have structured data that can be presented differently depending on the stylesheet used. XSL transformations may be performed using XSL Transformations (XSLT). Accordingly, AML can be transformed into other formats such as a different XML or HTML. While HTML supports hyperlinking, XML uses an Xlink standard that provides notation for how XML links may be implemented.
A well-formed XML document does not have to adhere to a DTD. However, a well-formed XML document must have one root element that contains all other elements. Additionally, each element must have an open tag and a close tag. XML is used to define syntax only. It is used to define content. XSL is used to define the semantics, style, or presentation of a document.
Many organizations are using Electronic Bill Presentment and Payment (EBPP) and Electronic Statement Presentment applications. To implement such applications, traditional paper documents may be converted to electronic form to be processed electronically and exchanged over the Internet, or otherwise, with customers, suppliers, or others. The paper documents will typically be re-formatted to be presented electronically using Hypertext Markup Language (HTML) Web pages, e-mail messages, Extensible Markup Language (XML) messages, or other electronic formats suitable for electronic exchange, processing, display and/or printing.
XML manipulator programs and parsers have been developed. There are two parsing systems in wide use. First, a Document Object Model (DOM) XML parser API is available. DOM is a tree based API that is used to build an in-memory tree representation of the XML document. As the entire XML document is loaded in memory as a document, object, XML manipulating programs that use this API may be useful for reordering, adding or deleting elements or attributes of the XML file. There is a second parsing API named the Simple API for XML (SAX). The SAX API is an event based API that uses callbacks to the manipulating program to report parsing events to the application, much in the way that a GUI interface reports events. The SAX API is useful for searching as it traverses the document without loading it into a memory object. The DOM parser requires more memory, but provides random access to the in-memory XML document object. It is more useful when using attributes rather than pure text element. The SAX parser uses fewer memory resources, but does not provide random access. The SAX parser may be useful in processing streams of data.
The traditional DOM parser will construct the whole document in the memory no matter whether the user application needs to access it. However, such DOM memory objects may not be possible for XML files that can be as large as 1 Gigabyte or larger.
The traditional SAX parser traverses the document only once and does not keep the document in memory. The traditional XLink technology for XML is designed to link external resources and show how they are related, but it does not solve the memory issue for large XML documents and it does not maintain the parent-child relationship between the entities.
As discussed, XML has become a universal format for using structured documents and data on the World Wide Web. It has been used widely in business software and enterprise applications. When an XML document is extremely large, it is impossible to hold the entire document in memory in a DOM object. While the SAX parser API could be used to parse the document, the application would not have random access to the document. It is inefficient to load an entire XML document in memory when certain portions of the document are infrequently accessed.
The D3, Digital Document Delivery system, version 2.0, is an enterprise solution for presenting bills, statements and invoices on the Internet. D3, version 2.0 is available from Pitney Bowes, Inc. of Stamford, Conn. In D3 version 2.0, an XML document could be broken down into small components that were stored in an archive file. A file-offset location would be used to locate the child components in the parent XML document.