For complex systems, such as those designed for multimedia intelligence and knowledge management applications, the current ‘control flow’ based design methods are totally unsuitable. Once a system is broadened to include acquisition of unstructured, non-tagged, time-variant, multimedia information (much of which is designed specifically to prevent easy capture and normalization by non-recipient systems), a totally different approach is required. In this arena, many entrenched notions of information science and database methodology must be discarded to permit the problem to be addressed. We shall call systems that attempt to address this level of problem, ‘Unconstrained Systems’ (UCS). An unconstrained system is one in which the source(s) of data have no explicit or implicit knowledge of, or interest in, facilitating the capture and subsequent processing of that data by the system. The most significant challenges that must be resolved with the UCS is based on the following realities:                a) Change is the norm. The incoming data formats and content will change. The needs and requirements of the users of the data will also change. This will be reflected not only in their demands of the UI to the system, but also in the data model and field set that is to be captured and stored by the system.        b) An unconstrained system usually only samples from the flow going through the information pipe. The UCS is neither the source nor the destination for that flow, but simply a monitoring station attached to the pipe capable of selectively extracting data from the pipe as it passes by.        c) In a truly unconstrained system, the information can only be monitored and the system may react to it—it cannot be controlled.        
This loss of control over data is one of the most difficult challenges in the prior art. The prior art clearly suggests that software consists of a ‘controlling’ program that takes in inputs, performs certain predefined computations, and produces outputs. Nearly every installed system in the prior art complies with this approach. Yet it is obvious from the discussion above that this model can only hold true on a very localized level in a UCS. The flow of data through the system is really in control. It is illustrative to note that the only example of a truly massive software environment is the Internet itself. This success was achieved by defining a rigid set of protocols (IP, HTML etc.) and then allowing Darwinian-like and unplanned development of autonomous but compliant systems to develop on top of the substrate. A similar approach is required in the design of unconstrained systems.
In the traditional programming world, a programmer would begin by defining certain key algorithms and then identify all of the key inputs into the system. As such, the person or entity supplying the data is often asked to comply with very specific data input requirements impacting the format, length, field definitions, etc. The problem with this approach, however, is that predicting needed algorithms or approaches that are appropriate to solving the problem of ‘understanding the world’ is simply too complex. Once again, the conventional approach of defining processing and interface requirements, and then breaking down the problem into successively smaller and smaller sub-problems becomes unworkable. The most basic change that must be made, then, is to create an environment that operates according to data-flow rules, not those of a classic control-flow based system.
In spite of the prevalence of control based programming frameworks, various data-flow based software design and documentation techniques have been in usage for many years. In these techniques, the system design is broken into a number of distinct processes and the data that flows between them. This breakdown closely matches the perceptions of the actual system users/customers and thus is effective in communicating the architecture and requirements. Unfortunately, due to the lack of any suitable data-flow based substrate, even software designs created in this manner are invariably translated back into control-flow methods, or at best to message passing schemes, at implementation time. This translation begins a slippery slope that results in such software being of limited scope and largely inflexible to changes in the nature of the flow. This problem is at the root of why software systems are so expensive to create and maintain.
At the most fundamental operating system scheduling level, we need an environment where the presence of suitable data initiates program execution, not the other way round. More specifically, what is needed is a substrate through which data can flow and within which localized areas of control flow can be triggered by the presence of certain data. Additionally, such a system would ideally facilitate easy incorporation of new plug-in control flow based functions or routines and their interface to data flowing through the data-flow based substrate so that it will be possible for the system to ‘evolve’. In essence, the users, knowingly or otherwise, must teach the system how they do what they do as a side effect of expressing their needs to it. No two analysts will agree completely on the meaning of a set of data, nor will they concur on the correct approach to extracting meaning from data in the first place. Because all such perspectives and techniques may have merit, the system must allow all to co-exist side by side, and to contribute, through a formalized substrate and protocol, to the meta-analysis that is the eventual system output.