1. Field
Embodiments of the invention relate to an optimization model for processing hierarchical data in stream systems.
2. Description of the Related Art
A continuous process may be described as a process that reads from data sources and generates result data corresponding to input data as the input data becomes available. A system that runs as a continuous process is a “stream system”. A stream system may be represented by a data flow diagram. The data sources may be continuous (i.e., data sources that provide data continually) or non-continuous (i.e., data sources that do not provide data continually).
Most data flow systems (also called flow-based programming systems (MORRISON,
J.P., “Flow-Based Programming: A New Approach to Application Development”, Van Nostrand Reinhold, New York, 1994)) use the relational model (also called relational data model) when processing information. The relational model defines a relation as a set of data items, and each data item is composed of a set of scalar named attributes. Relational query processors define relational operators. The query also regards the data flow as a directed graph of operators in which one operator consumes the output relations of other operators and produces new relations. The relational model has many advantages that commercial databases, among others exploit. The relational model is leveraged to optimize query processing by rewriting the graphs, introducing parallelism, eliminating unnecessary computations, etc.
However, the relational model is ill-equipped to deal with hierarchical data, where items in a relation can contain non-scalar attributes such as an attribute that contains a list of attributes or an attribute that by itself is a relation.
Several languages have been designed to address hierarchical data. XQuery is a language that was built on the foundation of Structured Query Language (SQL) to support the processing of eXtensible Markup Language (XML) hierarchical documents. XML documents are a type of hierarchical data. There are many implementations that support the processing of XQuery, but only a few of them (KOCH, C., S. SHERZINGER, N. SCHWEIKARDT, and B. STEGMAIER, “FluXQuery: An Optimizing XQuery Processor for Streaming XML Data”, Proceedings of the 30th VLDB Conference, 2004; FLORESCU, D., C. HILLERY, D. KOSSMANN, P. LUCAS, F. RICCARDI, T. WESTMANN, M. J. CAREY, A. SUNDARAJAN, and G. AGRAWAL, “The BEA/XQRL Streaming XQuery Processor”, Proceedings of the 29th VLDB Conference, 2003; PAL, S., I. CSERI, O. SEELIGER, M. RYS, G. SCHALLER, W. YU, D. TOMIC, A. BARAS, B. BERG, D. CHURIN, and E. KOGAN, “XQuery Implementation in a Relational Database System”, Proceedings of the 31st VLDB Conference, 2005) extend the relational model in order to leverage the optimization knowledge of the relational model. Exstensible Stylesheet Language Transformations (XSLT) is a transformation language that is able to transform hierarchical XML documents. XSLT is not built on top of the relational model and does not benefit from its ability to support optimizations.
However, both XSLT and XQuery are lacking the definition of a component model that allows new user-supplied operators to be added to the process designs and an assembly model that allows the creation of complex data transformations. In the data flow graphs, the operators are the components, and the flow-graphs are the “assembly” The component model may be described as the principles and software by which one can create new operators (new components) that are then assembled the same as any other operators (components). When applied to complex data transformations, languages like XSLT or XQuery tend to become very complex to understand and hence become error-prone.
Thus, there is a need in the art for optimization when processing hierarchical data.