1. Technical Field
The present invention relates to composing stream processing applications, and more particularly, to a method and system for composing stream processing applications according to a semantic description of a processing goal.
2. Discussion of the Related Art
Stream processing systems are information processing systems that operate on data at high input data rates. When under heavy load, these systems generally do not have the resources to store all arriving data, and thus, must perform some amount of processing before storing a smaller subset of the incoming data and/or presenting result data to end users. Generally, stream processing systems execute stream processing applications that are an assembled collection of data components (e.g., data sources and processing elements) interconnected by communication channels (e.g., streams). At run time, the assembly of data processing components, which together constitute a stream processing graph, are deployed to one or more computers connected by a network. The data leaving one or more data sources is then sent to one or more components, and the data produced by the components is sent to other components, according to the configuration of the processing graph.
The stream processing system can produce a variety of different results depending on how components of the application are interconnected, which components and which data sources are included in the processing graph, and how the components of the processing graph are configured. Generally, end users working with the system can easily describe their requirements on the outputs produced by the application, but the same users do not have the expertise required to select the components and connect them such that the resulting stream processing application would produce the required results.
Recent advances in Semantic Web technologies have provided formal methods and standards for representing knowledge. Resource Description Framework (RDF), W3C Recommendation 10 Feb. 2004, and more recently, Web Ontology Language (OWL), are standards that are used for describing ontologies. OWL is an extension of RDF that in addition to basic RDF includes inferencing capabilities provided by reasoners, for example, a Description Logic (DL) reasoner.
The knowledge represented in RDF or OWL can be queried using SPARQL Query Language for RDF, W3C Candidate Rec., which is a language for expressing queries against semantically described data (e.g., data described using RDF graphs). SPARQL queries are stated by designating result variables and by describing, using semantic graph patterns, the characteristics of things (e.g., RDF resources or OWL individuals) that could be suitable values for the results. The descriptions are expressed as a graph comprised of RDF triples, depicting the relationships connecting these variables with other variables of with other resources. If any subgraphs of the RDF graph are found to match the desired relationships, the corresponding assignment of variables is included in the result set of the query, with each assignment constituting a row in the result set.
Various stream processing architectures and systems exist or are being developed that provide a means of querying ephemeral streaming data. However, most of these systems assume that the input streams contain structured data. In addition, most of these systems focus on conventional relational operators and sliding-window operators. Relational and time-windowed analyses are necessary in a streaming environment. However, stream processing applications may need to perform other kinds of operations in order to process the likely unstructured, streaming data (e.g., raw sensor data, video or audio signals, etc.) into a meaningful response. Such operations include annotation, classification, transformation, aggregation, and filtering of specific kinds of data in the streams. While some of these operations are expressible in relational algebra, expressing all of the needed stream processing functions would require a user with needed deep knowledge of both problem and solution domains and could result in extremely detailed, possibly over-constrained queries/procedures that combine problem and solution descriptions.
Another challenge for stream processing systems lies in the construction of processing graphs that can satisfy user queries. With large numbers of disparate data sources and processing elements to choose from, we cannot expect the end-user to craft these graphs manually. The set of processing elements and data sources can also change dynamically as new sources are discovered or new processing elements are developed. Different end-users express widely varying queries, requiring a large number of different graphs to be constructed. Since there is an exponential number of possible graphs for a given number of data sources and processing elements, it is not feasible to pre-construct all the graphs, manually, to satisfy the wide variety of end-user queries.