Systems that process large volumes and varying velocities of messages continuously may do so using a stream-processing paradigm that defines the processes as distributed data flow topologies interconnected by streams. These streams move data, for example, messages as data objects representing tuples (information packets that have predefined schemas). Schemas can be sets of operations allowed to operate on the data objects or tuples. The systems typically offer an execution environment that requires developers to create their distributed processes as custom topologies written on top of the system's run-time frameworks and APIs. In doing so, the developers need to write custom code for each task in the distributed process and work out the custom stream tuple schemas that interconnect the tasks. Given that users may want to quickly develop new processes for consuming and analyzing continuous streams of data, a more general mechanism is required to remove the need for developers to create new processes, and give enterprises the ability to quickly create complex processes without developing new code.
A system can execute distributed processes created automatically from provided process specifications that utilize reusable tasks with generalized task configurations that allow methods to automatically resolve the interconnecting stream schema. As such, users can orchestrate their processes by generating process specifications in web-based high-level visual editors, and have the system create, deploy and execute the distributed data flow topologies. However, to do so, the users must be assisted 1) in providing the necessary information to the selected tasks and 2) by removing the need for the user to resolve the schemas across interconnected streams.