1. Field
Embodiments of the invention relate to pipeline optimization based on polymorphic schema knowledge.
2. Description of the Related Art
An Extensible Markup Language (XML) Pipeline is formed when XML processes, sometimes called XML transformations or processing components, are connected together. For instance, given two transformations T1 and T2, the two transformations may be connected together so that an input XML document is transformed by T1 and then the output of T1 is fed as input document to T2 (See wikipedia.org on the World Wide Web). Each of the XML transformations works on some portion of an XML document.
XML processing is conveniently expressed by use of XML Pipelines, where each transformation is simple and focuses on a small part of the overall document being transformed. However, the over-all transformation carried out by the whole pipeline may be quite complex.
When processing XML data in pipelines like this, if the transformations are individually simple, then naïve implementations may spend far more time carrying around (e.g., transferring between transformations), parsing, and serializing the parts of the XML documents that they are not transforming, than on performing the transformations on the parts of the XML documents they care about.
Polymorphic schemas may be described as schemas with wildcard schema nodes. Polymorphic schema interfaces are used in programming languages (Luca Cardelli, Peter Wegner, On Understanding Types, Data Abstraction, and Polymorphism, from Computing Surveys, (December, 1985)). The IBM® WebSphere® DataStage® engine uses polymorphic transformation operators on relational data, where wildcard schema nodes are used to match up a list of columns that are just being passed to the output (IBM, WebSphere, and DataStage are trademarks of International Business Machines Corporation in the United States, other countries, or both).
Extensible Stylesheet Transformations (XSLT) version 2.0 allows schemas to be associated with a stylesheet (w3.org/TR/xslt20 on the World Wide Web).
However, there is still a need in the art for pipeline optimization based on polymorphic schema knowledge.