This invention relates to application-specific object-oriented processing of markup, including but not limited to the Extensible Markup Language (XML).
Object-oriented programming has been embraced by many programmers seeking to enhance their productivity. A useful introduction to object-oriented programming may be found in the book “Object-Oriented Analysis and Design with Applications, 2nd Edition,” by Grady Booch, Benjamin/Cummings, 1994, ISBN 0-8053-5340-2. Another useful object-oriented programming text is “Design Patterns: Element of Reusable Object-Oriented Software,” by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Addison-Wesley, 1995, ISBN 0-201-63361-2. For the language programming language C++, which will be utilized in this disclosure, a useful reference is “The C++ Programming Language, 3rd Edition,” by Bjarne Stroustup, Addison-Wesley, 1997, ISBN 0-201-88954-4. Each of these reference is incorporated herein by reference.
In object-oriented programming, an object encapsulates both data and operations. An object is an instance of one or more classes. A class defines data and operations which are available for objects which are instances of the class. An element of data defined in a class is denoted a member. Each distinct object from a class has its own distinct instance of each member, except where members are explicitly denoted as class members, in which case all class instances share the same data instance. An operation defined for an object from a class is denoted a member function. A member function may be invoked for an object which is an instance of the class in which it is defined; the member function may make use of the member data which is specific to the particular object for which the member function was invoked.
An object-oriented programming language includes facilities for class definition as well as facilities for creating objects, invoking operations on objects, and destroying objects. Some popular object-oriented programming languages are C++, Java, and C#.
An application is a computer program that carries out some useful task on behalf of a user. Applications are oriented towards such fields as business, engineering, entertainment, and media production. An object-oriented application uses application-specific classes to represent entities that are meaningful in the context of the application. Thus a business application might use application-specific classes to represent customers, purchase orders, inventory items, and shipments. In an object-oriented application, class facilities are used to instantiate application-specific objects, which are then utilized to carry out operations which are meaningful in the context of the application.
Among the important characteristics of application-specific objects are associations which represent relationships between application-specific objects. For example, in a business application, an application-specific object representing a purchase order might be associated with an application-specific object representing a customer. Similarly, a customer object could be associated with a plurality of purchase-order objects. Such associations may be realized by use of pointer or references members. For example, a purchase order object could contain a pointer member which indicates the subject object. A collection of application-specific objects configured to represent application-specific relationships is denoted an application-specific object-oriented data structure. An application-specific object-oriented data structure is a useful component in an application. Indeed an application is often most easily understood as a process in which an application-specific object-oriented data structure is created from an stored representation, operations reflecting meaningful activities in the application context are performed on the data structure, and a stored representation is written. The stored representations which are read and written by an application are often in the form of markup.
An application-specific data structure typically includes numerous application-specific objects, organized according to one or more schemes that reflect the requirements of the application. It is often convenient to encapsulate the application-specific object-oriented data structure in a single model object. The model object's class or classes may define one or more members which reference application-specific objects singly or in collections, and one or more member functions, which facilitate random access to particular application-specific objects. For example, a model class in a business application might define a member function in which a customer object is provided in response to a textual customer number. Such a model class might further provide the entire collection of customers, or a collection of purchase orders which have been received but not yet shipped. The particular members and member functions of a model class are designed to facilitate the performance of the tasks that are the purpose of the application.
The stored representations which are read and written by applications are often in the form of markup. Of particular importance for markup is XML, which is in wide use. A useful reference for XML is the book “XML In a Nutshell, 3rd Edition”, by Elliotte Rusty Harold and W. Scott Means, published by O'Reilly, 2004, ISBN 0-596-00764-7, incorporated herein by reference. Many applications are required to read or write XML or other markup languages.
Markup consists of hierarchically organized tagged elements. A tagged element typically consists of a start tag, an optional body, and an end tag. Where the body is absent, the start tag and end tag may be combined into a single tag. The start tag includes a textual tag name and optional attributes. The tag name describes the tagged element. Each attribute includes a textual key and a textual value. Attributes may provide additional descriptive information about the tagged element. The body of the tagged element may contain both instances of textual content and nested tagged elements. The end tag concludes the tagged element.
The hierarchical organization of markup is reflected in the nesting of tagged elements. The body of a tagged element may contain nested tagged elements as well as textual content. The containing tagged element is denoted the parent. The nested tagged element is denoted the child. A tagged element which lacks a parent is denoted a root. In XML, a well-formed document is required to contain exactly one root tagged element.
An application programming interface (APT) specifies an interface to computational services. An API permits decomposition of a programming task between the provider of the API and the consumer of the API. As long as clients and providers adhere to the API, diverse clients may make use of a single provider. Likewise, providers may be freely interchanged without affecting clients. APIs are available which facilitate markup processing. Although markup may be processed by any programming language, object-oriented languages including Java, and scripting languages includes Perl and Python have been most widely used. However C and especially C++ are also well-suited to markup processing. The aforementioned XML book covers Java programming interfaces. A reference for C++ programming interfaces may be found in the book “C++ XML”, by Fabio Arciniegas, published by New Riders, 2002, ISBN 0-7357-1052-X, incorporated herein by reference.
The oldest XML API is the Document Object Model (DOM). DOM processes markup to a tree-like data structure. DOM is a W3C standard which is documented online at <http://www.w3.org/DOM/>. DOM presents markup as an object-oriented data structure; however the objects of the presentation faithfully reflect the structure and properties of the markup. DOM does not provide application-specific objects for business, engineering, entertainment, or artistic applications.
An alternative API to DOM is SAX, the Simple API for XML. SAX processes markup to a series of event notifications, where the event notifications correspond to particular subelements of the processed markup. Expat is a open-source implementation of the SAX API, originally written by James J. Clark, with contributions by David Megginson and David Brownell. Expat is in wide use. Expat materials may be found online at the official Expat website, http://sax.source-forge.net/. Expat does not provide object-oriented facilities for markup processing other than the SAX processor itself, the operation of which is controlled using an object-oriented interface.
SAX and DOM are of limited benefit to a programmer who desires an application-specific object-oriented data structure, consisting of application-specific objects interconnected to reflect properties and associations that are natural to the application. To build an application-specific object-oriented data structure from SAX notifications, a programmer must maintain complex context to interpret the events in terms of the ongoing construction of the desired structure. In DOM, the programmer must systematically traverse a complex tree structure, generating a parallel structure consisting of application-specific objects. In both SAX and DOM, the programmer must invest significant additional effort to construct the desired application-specific object-oriented data structure.
Thus, it would be advantageous to reduce the effort required for the construction of an application-specific object-oriented data structure from markup. It would also be advantageous to maximize flexibility in the structure and function of the application-specific objects which are constructed corresponding to tagged elements.