Current data processing systems can be classified by the level at which they choose to employ parallelism. The degree of parallelism in a system is to a great extent more a product of how a system is programmed than its inherent characteristics. In single-instruction-single-data stream (SISD) systems, only one instruction is executed at a time, and only one data stream is used. The amount of parallelism in execution of instructions in such systems is minimal. Single-instruction multiple-data stream (SIMD) systems also execute only a single instruction at a time; however, they have multiple data streams and thus can act on multiple operands in parallel. As a result, they experience a higher level of parallelism in execution than SISD systems. An even higher level of parallelism is achieved in multiple-instruction-multiple-data-stream (MIMD) systems for both data and instructions are processed in parallel.
MIMD systems can be further classified by the level at which they extract parallelism, specifically, they can be classified by the size of the units of computation that they perform in parallel. Coarse grained systems are those in which relatively large units of computation are performed in parallel. Although the units of computation are performed in parallel, execution still proceeds sequentially within each unit. Fine grained systems, in contrast, perform small units of computation in parallel. Since the units of computation that may be performed in parallel in fine-grained systems are much smaller than in coarse-grained systems, fewer operations are performed sequentially and the inherent potential for parallelism is much greater.
Given this classification scheme, the greatest inherent potential for parallel activity rests with MIMD systems. Moreover, amongst MIMD systems, the greatest inherent potential for parallel activity is found with fine-grained systems. Such systems have small sized units of computation and provide for easy computation. Furthermore, processing elements that operate on fine grains of computation are readily composed to multiprocessor systems. The problem, however, in efficiently implementing such fine-grained systems is effectively controlling the extraordinary amount of parallel activity of the systems, particularly since the increase in parallelism results in a dramatic rise in inter-task communication. The increase in parallelism also presents complex computation scheduling synchronization problems.
Efforts to date at MIMD machines have produced only mildly encouraging results. Generally, such machines have used software as the primary tool for instilling parallelism. In programming such software, programmers have decided where to install parallelism and have had to account for the difficult interactions of the programs with the machine. As a result, the programmer has born the brunt of the burden for deciding how to bring about parallel execution. Given the complex and confusing nature of these decisions. MIMD machines have proven unappealing to most users. Moreover, the resulting software has tended to be difficult to debug, unreliable, and not portable. To make matters worse, these machines have not performed as well as expected, despite the level of effort required to operate them.
Data flow machines are a special variety of fine-grained machines that attempt to execute models of data flow through a data processing system. These models are known as data flow diagrams Data flow diagrams are comprised of nodes and edges. The nodes represent instructions, and the edges represent data dependencies. To be more precise, the nodes represent operators and the edges represent operands. Data flow machines operate by processing the operands.
The data flow diagrams impose only a partial order of execution. The instructions are executed in data flow machines whenever the operands they require are available. Data flow machines are, thus, not constrained by the rigid total order of execution found in sequential machines. This flexibility allows data flow machines to schedule execution of instructions asynchronously. In contrast, sequential machines based on the von Neumann model execute instructions only when the instruction pointer is pointing to the instructions. The primary benefit of asynchronous scheduling is the greater exposure of latent parallelism.
Data flow machines can be further classified into two categories: dynamic and static. Dynamic data flow machines are those machines which can dynamically allocate an instance of a function. In other words, the memory required for an instance of a function need not be preplanned, rather it can be allocated when the function is executed. Static data flow machines, in contrast, can only statically allocate an instance of a function. As such, they must preplan the storage required for each instance of the function prior to execution.
The Tagged-Token Data Flow Architecture developed at the Massachusetts Institute of Technology is a leading example of a dynamic data flow machine. It is described in Arvind, S. A. Brobst, and G. K. Maa, "Evaluation of the MIT Tagged-Token Data Flow Project", Technical Report CSG Memo, MIT Laboratory for Computer Science, 1988. The tagged token data flow machine allows simultaneous applications of a function by tagging each operand with a context identifier that specifies the activation of the function to which it belongs.
In a tagged-token data flow architecture, the combination of the tag and the operand constitutes a token. The tags of two tokens must match if they are destined to the same instruction. Hence, the tagged token architecture must have a means for correctly matching the tags. An associative memory has been relied upon as the matching mechanism.