Data exchange between independently developed applications running on different families of processors is often accomplished with a published (open) markup language. For example, the HyperText Markup Language (HTML) is used for exchanging data over the Internet between resources of the world wide web. HTML is one example of a markup language derived from Standard Generalized Markup Language (SGML). Another language derived from SGML that is more powerful and flexible than HTML has been defined and has gained popularity for providing information across networks. The new standard, developed and promoted by the World Wide Web Consortium (W3C), is called the Extensible Markup Language (XML). XML provides a common syntax for expressing structure in data. Structured data refers to data that is tagged for its content, meaning, or use. XML provides an expansion of the tagging that is done in HTML. Whereas HTML focuses on format or presentation, XML focuses on content or use.
The elements of an XML document are defined in an XML grammar that is defined in a document type definition (DTD) document or XML schema. An XML grammar is a set of syntax rules for elements in SMGL and XML documents. The grammar defines what elements can be used in a document, what order they should appear in, if any, which elements can appear inside other elements, which elements have attributes, and what those attributes are. A grammar can be part of an XML document, or can be a separate document or series of documents in one or more files. XML allows documents to contain elements from several distinct DTD documents or schemas by the use of namespaces.
Data exchange also occurs between independently developed applications running on different families of processors and a database server. Such data exchange is often accomplished with a published (open) database query language. For example, the Structured Query Language (SQL) is used for exchanging data with database servers of several database systems. As used herein, a data exchange format includes either a markup language or a database query language or both.
Software applications execute on computing devices to perform a variety of functions related to processing and presenting information to users. Software applications often read, process and write information in the form of data objects. An object is a data structure that includes values for one or more attributes and may have one or more methods that provide behavior associated with those attributes. For example, when invoked, one method of a particular data object returns a current value of one of its attributes. A data structure that lists the attributes and methods of one or more similar objects is called the type or class of the objects. Each individual object, with specific values for the attributes, is an instance of the type or class. One object may have another object as an attribute. A programming language variable is a simple class of data objects, which support only one instance at a time; at different times the variable can contain different values, each considered a different instance of that variable class.
An application is typically developed as one or more source code modules, each written in a high-level programming language designed to facilitate human understanding of the steps involved in each module. Example high-level programming languages include FORTRAN, Visual Basic, C, C++, and JAVA®. Statements in such languages are combined to write each module. One or more source code modules are stored together in a source code file. The statements are converted to instructions for a processor on a computing device, which instructions are sometimes called machine instructions, by a particular process, such as a compiler. A compiler generates machine instructions based on several statements or an entire module. The machine instructions are peculiar to particular families of processors.
Currently, when a developer writes source code for an application, data structures such as data objects are defined and manipulated internally. If some data is to be imported to one or more of these data structures from other independently-developed applications, or exported from one or more of these data structures to other independently-developed applications, or both, then it is common to convert data between the data structures used internally and an exchange format, such as a markup language like XML, used for exchanging data. Converting data from an internal data structure to an exchange format is called marshalling the data to the exchange format. Converting data from an exchange format to the internal structure is called de-marshalling the data from the exchange format.
In past approaches, a developer is required to write and maintain data models redundantly. The developer implements the data model in program source code using data structures defined by the programming language, such as JAVA classes, and also separately implements the data model in some exchange format, such as an XML DTD document or schema. In addition, the developer is forced to write extra source code modules to marshal and de-marshal data with the exchange format.
The popularity of JAVA as a platform-independent programming language and the proliferation of XML for data representation has created a strong need for a convenient way of developing JAVA data objects that can easily be marshaled to XML and de-marshaled from XML. The marshaling and de-marshaling of JAVA data objects to XML is termed “XML/JAVA Data Binding” in this document.
One conventional approach to XML/JAVA Data Binding examines an XML DTD document and generates JAVA code defining object classes based on the XML DTD, and also generates methods for the classes that marshal and de-marshal data between an XML document and data objects of the classes. Example products that employ this approach include JAVA Architecture for XML Binding (JAXB) from Sun Microsystems, Inc., XML Studio from Breeze Factor LLC, and “Zeus” from Enhydra.org of Lutris Technologies, Inc. The product from Sun Microsystems is based not only on the XML DTD but also a second XML document that describes an association between XML elements/attributes and JAVA classes/attributes.
Another conventional approach to XML/JAVA Data Binding examines an XML grammar, as defined in a schema or DTD, and generates JAVA code defining object classes based on the grammar and also generates methods for the classes that marshal and de-marshal data between an XML document and data objects of the classes. An XML Schema document is an XML document with a limited number of elements for defining the elements of other XML documents. An XML Schema is a more easily understood way to define the components of an XML document than is a DTD document.
Another conventional approach to XML/JAVA Data Binding examines, at execution time, JAVA classes that support introspection, automatically develops an XML Schema consistent with the data model of the JAVA classes, and automatically marshals and de-marshals data between objects of those classes and XML documents using the developed XML Schema. JAVA classes that support introspection, such as JavaBeans, include methods that provide a list of attributes and attribute types for the class. Example products that employ this approach include “Castor” from Exolab.org of Intalio, Inc. Castor provides for schema-less binding. A developer merely invokes the tool at runtime on a set of JAVA objects, without providing any mapping to XML. The marshaling and unmarshaling is done automatically, using a generic mapping. A similar approach is taken by the “Long Term JavaBeans™ Persistence” framework described in the document “beans.html” in the online Web folder java.sun.com/xml/.
While suitable for many purposes, the conventional approaches suffer some disadvantages. The approaches that base JAVA code on XML DTD force a developer to accept data objects defined by the data exchange format instead of the other way around. For the developer to determine the object classes, the developer must first compose the XML DTD, then run the tool to generate the JAVA classes, and then return to complete the programming in JAVA. This sequence interrupts program development and is inconvenient. Further, the class definitions and class hierarchy are defined prior to invocation and regardless of the binding mechanism. Binding tools that generate classes do not take the existing class structure into account. Tools such as Castor and JavaBean long-term persistence that generate the XML based on the class structure leave too little control on the XML schema or DTD in the hands of the developer. There is a need for a tool to bind a given schema to a given class structure. It would be preferable for the developer to develop the application in the programming language and deduce the exchange format from the classes defined for the application. Also, the approach is specific to JAVA and XML and does not work if the developer is using a different programming language or exchange format.
The approach that bases an XML DTD on the JAVA code is rule based and automatically organizes and names the XML elements and attributes without control of the developer. The rules may not be consistent with the developer's intentions for data exchange. For example, this approach does not allow a developer to distinguish whether one data object should be an XML attribute or an XML sub-element of another XML element. Also the developer may wish to associate two data items by giving them related names, like “salt” and “pepper;” the conventional approach automatically names the data items and denies the developer the opportunity to assign more meaningful names. Also the developer may wish to restrict the number of sub-elements contained in an XML element. The conventional approach does not permit the developer to impose such restrictions between XML elements. In addition, the developer may wish to exchange data in only a subset of the classes defined in the source code; however, the conventional tool automatically generates XML DTD statements for all the classes.
Furthermore, this approach executes more slowly. The XML schema is deduced at runtime, which takes extra processing time. Also, the extra logic to determine how to marshal and de-marshal, given the deduced XML schema, also takes additional time to execute. The slower execution of this approach can become a hindrance to performance, especially as the source code becomes larger and more complex.
Based on the foregoing, there is a clear need for techniques that automatically employ an open exchange format that is configured for data structures internal to the source code and that is responsive to developer choices for options in employing the open exchange format.
Furthermore there is a need for techniques that additionally can automatically generate instructions to marshal and de-marshal data between the open exchange format employed and the internal data structures.
In particular, there is a need for techniques that automatically produce an XML DTD document that is based on JAVA data objects defined by a developer and that is responsive to developer choices for options in the XML DTD.