1. Field of the Invention
The invention relates to the field of data representation and, more particularly to a system and methods for generating data representations in a standard markup language using matrix definitions and programmatic mapping.
2. Description of the Related Art
The design and use of structured documents has become an important aspect to the development of mechanisms for distributing data and information in a rapid and reliable manner. Structured documents are commonly used for the storage and transmission of information over the Internet and the World Wide Web (WWW). Most documents on the Web utilize a form of a generalized markup language that is universally recognized and is well-suited for numerous data formats including: text, hypertext, multimedia, and the like.
Recently, the design specifications for markup languages have developed to contain numerous sophisticated features that make it possible to define custom formats for documents that represent complex information structures that may be used in the management of large information repositories. The Extensible Markup Language (XML) specification is one such markup language that is commonly used in the formation of structured documents for both simple and complex data representations. Originally designed to accommodate the needs of web development, this language specification has become widely used in numerous other areas as well. Of the many reasons that XML has become so widely accepted is its mechanisms for controlling the structure and content of documents, as well as, standardizing document linking and display functions.
XML is a derivative language from Structured Graphics Markup Language (SGML) and permits the definition of custom data representations, similar to database representations, within each document developed using the language. These document representations or structures are called Document Type Definitions (DTDs). DTDs are commonly associated with one or more structured documents known as stylesheets which define visual representations of the DTD and are used in organizing and presenting the information contained in the DTD. Stylesheets may be adapted to display information using numerous approaches including web-browsers, printers, handheld computers, or other electronic devices.
Unlike less sophisticated markup languages such as Hypertext Markup Language (HTML) where it is possible to create documents with many embedded errors, XML data structures and documents are desirably validated to insure consistency. Type-validation of the contents of an XML document and the associated DTD can be a complex and time consuming task. DTD validation defines the legal building blocks of an XML document and document structure using a list of legal elements. Type-validation insures that the structured document conforms to the open standards set by the World Wide Web Consortium (W3C). This means all data definitions conform to a specific syntax outlined by the W3C standard.
Conventional approaches to type validation map DTDs and the associated XML information into standard hierarchical data structures (or tree structures). These approaches create a problem in that the use of hierarchical data structures for XML mapping results in the limitation of the data schema based on the constraints of the hierarchical representation of the data. As a result, hierarchical data representation limits flexibility in the definition of the DTDs and inhibits the efficient formation of DTDs with significant complexity. One particular problem associated with conventional parsing and mapping techniques which use hierarchical data structures is that they fail to provide sufficient flexibility to permit the incorporation of recursive and repetitive data structures within the data schema of the DTD. As a result, conventional DTD definition is limited with respect to these characteristics which further limits the ability to generate structured documents.
Conventional methods used to construct relational data structures for elements of a DTD typically use numerous tables containing fields to store information (attributes) about each element in a data set. Relationships between elements are defined by key references (primary and foreign) which are further stored in fields within the tables for each element. A problem with this method of data organization is that it leads to highly complex data structures that contain many tables and references between tables. As the size of the DTD to be represented in the relational structure increases, a difficulty arises in maintaining a coherent data schema. Furthermore, as DTD complexity increases, a problem arises in validating the data schema and insuring that all of the relationships defined in the data schema are appropriately defined in each table for all required elements. Invalid or missing relationships within the data schema can lead to improper DTD representation and subsequent corruption of the data stored in the data structure representing the DTD. Furthermore, certain relationships such as recursion and replication are not efficiently supported using conventional data representations which lack the ability to easily define these relationships without invalidating the data schema or adding undue complexity to the data representation.
Another limitation of conventional approaches is the focus on allowing only a hierarchical structure for XML and mapping this structure directly into a relational database. This hierarchical structure approach to mapping is insufficient to achieve complex DTD representations in XML of the type needed to provide functionality in many business settings. As a result, mapping a DTD structure into a relational database using a hierarchical table structure imposes limitations in the ability to create the DTD using W3C standards, which do not impose hierarchical limitations.
Accordingly, it is desirable to have XML DTD representations to be developed that have complex relationships between elements of the DTD without the limitations imposed by conventional approaches. Furthermore, it is desirable to have a system and method for generating structured documents that permits the use of repeating and recursive data structures within the DTD representation. Use of repeating and recursive data structures is important as it permits the formation of data representations that are not otherwise possible using hierarchical structures with standard markup language elements and allows these elements to be transformed into standard relational database tables.