1. Technical Field
The present invention relates to the assembly of parametric information processing applications.
2. Discussion of the Related Art
Configurable applications for automating processing of syndication feeds (i.e., Atom and RSS) are gaining increasing interest and attention on the Web. There are over 30,000 customized feed processing flows (referred to as “pipes”) published on Yahoo Pipes, the most popular service of this kind. Yahoo Pipes offers hosted feed processing and provides a rich set of user-configurable processing modules, which extends beyond the typical syndication tools and includes advanced text analytics such as language translation and keyword extraction. The Yahoo Pipes service also comes with a visual editor for flows of services and feeds. In an example of a flow of feeds and services shown in FIG. 1, the feeds are Yahoo Answers and Yahoo News, which can be parameterized, with truncate, union and sort being services. There exist similar frameworks that are provided as a hosted service (e.g., IBM DAMIA) or as a downloadable server-side software (e.g., /n software's RSSBus, IBM's Mashup Starter Kit and IBM's Project Zero).
Automatic service discovery and composition is one of the promises of Service Oriented Architecture (SOA) that is hard to achieve in practice. Currently composition is done with graphical tools by manually selecting services and establishing their interactions. Business Process Execution Language (BPEL)-WS has been developed to describe composite services. However, this process is tedious and requires extensive knowledge of services being composed. Automatic composition methods aim to provide a solution to this.
Automatic composition work has been focusing on composition using simple compatibility constraints, as well as semantic descriptions of services, such as Ontology Web Language (OWL)-S. A drawback of these approaches is that they do not provide an easy way of interacting with a composer/user. For example, even if the user is goal-oriented and does not require knowledge of services, the user must be familiar with the ontology that was used to describe the services. Furthermore, it is difficult for novice users to create goal specifications, since that requires studying the ontology to learn the terms the system uses. Also, the ontology does not automatically provide a method for verifying the requests. Hence, users do not have any guidance from the system that could help in specifying requests. This turns service composition into a tedious trial and error process.
Similarly to how programs can be composed of operators and functions, composite services describe service invocations and other low-level constructs. Composite services are processing graphs composed of smaller service components. A service component can be an invocation of an existing service, an external data input (e.g., a user-specified parameter or data source), a data processing operator (e.g., an arithmetic operator), or an other (smaller) composite service specified as a processing graph of service components.
While many execution environments include tools that assist users in defining composite services, these tools typically require a detailed definition of the processing flow, including all service components and communication between the components. One example of this type of tool is IBM WebSphere Studio. An example of an execution environment is a stream processing environment, such as Stream Processing Core (SPC), described in N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo and C. Venkatramani, “Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core”, Proceedings of ACM SIGMOD 2006.
In contrast, methods such as planning can be used to automatically compose new composite services based on a high-level input provided by the user, since automatic composition methods require less knowledge about the service components and in general only require the user to specify the composition goal in application domain terms.
For purposes of automatic composition, in many scenarios the service components can be described in terms of their data effects and preconditions. In particular, we assume that a description (such as Web Services Description Language (WSDL) or Java object code with optional metadata annotations) of each service component specifies the input requirements of the service component (such as data type, semantics, access control labels, etc.). We refer to these input requirements as preconditions of service invocation, or simply preconditions. The description also specifies the effects of the service, describing the outputs of the service, including information such as data type, semantics, etc. In general, a component description may describe outputs as a function of inputs, so that the description of the output can only be fully determined once the specific inputs of the component have been determined. Note that in practical implementations the invocations can be synchronous, such as subroutine or Remote Procedure Call (RPC) calls, or asynchronous, such as asynchronous procedure calls or message exchange or message flow. In stream processing applications the communication between components requires sending data streams from one component to another in the deployed processing graph.
Under these assumptions, an automated planner can then be used to automatically assemble processing graphs based on a user-provided description of the desired output of the application. The descriptions of the components are provided to the planner in the form of a domain description. The planner can also take into account the specification of available primal inputs to the workflow, if not all inputs are available for a particular planning request.
The planner composes a workflow by connecting components, starting from the primal inputs. It evaluates possible combinations of components, by computing descriptions of component outputs, and comparing them to preconditions of components connected to the output. More than one component input can be connected to one component output or one primal input. Logically, this amounts to sending multiple copies of data produced by the component output, with one copy sent to each of the inputs. In practical implementation these do not have to be copies, and it is possible to pass data by reference instead of by value. The process terminates when an output of a component (or a set of outputs taken together) satisfies the conditions specified in the user goal requirement. Note that all conditions are evaluated at plan time, before any applications are deployed or executed.
If multiple alternative compositional applications can be constructed and shown to satisfy the same request, the planner may use heuristics and utility functions to rank the alternatives and select the highest ranked plans.
The application, once composed, is deployed in an execution environment and can be executed one or more times.
Examples of a planner and an execution environment are described in Zhen Liu, Anand Ranganathan and Anton Riabov, “A Planning Approach for Message-Oriented Semantic Web Service Composition”, in AAAI-2007, and in commonly assigned U.S. application Ser. Nos. 11/872,385 and 11/970,262.
Similar work has been done in the contexts of stream processing, web services and grid computing.
A difficulty in planner-based composition involves providing assistance to users when specifying requests. Here, the system must provide its own capabilities information to the user to indicate which requests can be processed, and which changes are allowed to a last submitted request.
These changes can be specified by the user, when the user chooses one modification of the previous request from a set of possible modifications proposed by the system after analyzing the request, for example as described in commonly assigned U.S. application Ser. Nos. 11/872,385 and 11/970,262.
This approach, however, limits the requests proposed by the system to the set of requests that can be specified by choosing from a finite set of discrete options, and requires a hierarchy of options that helps structure the choices so that the set of options reviewed by the user at each step can be understood by the user. It makes it difficult for the user to specify the parameters of the request that are continuous in nature, even if those values are internally represented by discrete values, such as ‘float’ or ‘real’ data type. It also makes it difficult to specify parameters that are chosen from very large non-hierarchical lists of discrete options, for example choosing a state from a list of 50 states.