The present invention relates generally to data processing and more particularly to the transformation of large data structures, which may facilitate easier display, analysis or communication of the data structure.
The modern world generates a vast amount of data everyday, including sales information, stock quotes, news, consumer demographics, and much more. Consumers, scientists, business management, politicians, and the computers and other electronic devices with which they work are some of the entities that request the vast amount of data. The requests include actions such as displaying, communicating, and analyzing the data. It can be problematic for the data to be communicated, displayed, and accessed easily by each entity, particularly when the amount of data is large.
The data is often structured or organized hierarchically, such as in a tree or node structure or in a parent-child relationship. There are many different computer languages and formats that allow a user to define, classify, organize, and/or store the structure of the data, such as SGML, XML, HTML, C++, Javascript, PHP, ASP, and many more. In object-oriented programming, the data may be organized by one class of data containing another class of data. Equivalently, in markup languages or syntax, such as XML, data is typically defined with tags or semantics within a document.
FIG. 1A illustrates the hierarchy of the data in an example XML document 100 as expressed in the hierarchical tag structure, and FIG. 1B graphically illustrates the hierarchy as a tree of nodes. Each box in FIG. 1B represents a node. The root (or document) node, which may have one or more element nodes, is <bookstore> 110. Element node <book> 120 is a data type that can be found within a “book store.” Each node has a single parent, and <bookstore> is the parent of <book>. Nodes may have zero, one or more children. The nodes <title> 130, <author> 140, <year> 150, and <price> 160 are the children of the <book> node 120. There can be additional relationships as well. For example, siblings are nodes that have the same parent. The nodes <title> 130, <author> 140, <year> 150, and <price> 160 are siblings to each other. Ancestors of a node are the node's parent, parent's parent, etc. Descendants of a node are the node's children, children's children, etc. Each node is essentially an instance of a data variable type, including the root node. The variable name of a node is sometimes called a tag.
One problem with hierarchical data structures is that data is often stored in a structure that is not useful to an end user, such as a server, a client, a program, or an individual. The existing data structure may need to be transformed into a new data structure for purposes of an end user's need to view, analyze, and/or store the data in a different format or data structure. Additionally, the existing data structure often must be presented on different electronic devices, such as a computer monitor, television, a PDA, or a cell phone. Any of the different electronic devices may require different formats, such as HTML, PDF, etc. XML provides for simplified data interchange, and XSL is one tool that can handle this data manipulation for XML documents.
FIG. 2 illustrates the use of XSL to transform an XML document 230 into any number of other documents, such as another XML document 240, an HTML document 250, or any other such structured document 260. The XSL processor 210 takes as input the XML document 230 and an XSL style sheet (file) 220 that defines how to do the transformation. The XSL file 220 specifies how each node of the source tree for a first XML document 230 should appear in the result tree of a second XML document 240. XSL uses a sublanguage called XPath to refer to nodes in the input tree. The structure of the result tree can be completely different from the structure of the source tree. In constructing the result tree, elements from the source tree can be filtered and reordered, and an arbitrary structure can be added. Thus, the XSL processor 210 typically holds three trees: one tree for the source XML 230; one tree for the destination structure 240; and one tree for the XSL file 220.
Because the XSL processor 210 holds three trees, the transformation of one data structure into another data structure may take up a large amount of computing resources, such as memory and processor time. The amount of memory used often is more than a user has available, which may cause the computer system to crash, or otherwise become unstable. Also, the time to transform one data structure to another data structure may be so long that the transformation is impracticable.
It is therefore desirable to have an efficient method of transforming large data structures.