This invention relates generally to pipeline data processing systems, and more particularly, to asynchronous pipeline systems. Simple asynchronous pipelines are known, see, for example, U.S. Pat. No. 4,837,740, xe2x80x9cAsynchronous First-In-First-Out Register Structurexe2x80x9d; U.S. Pat. No. 4,679,213, xe2x80x9cAsynchronous Queue Systemxe2x80x9d; U.S. Pat. No. 5,187,800, xe2x80x9cAsynchronous Pipelined Data Processing Systemxe2x80x9d; and R. F. Sproull, I. E. Sutherland, and C. E. Molnar, Counterflow Pipeline Processor Architecture, Sun Microsystems Laboratories Publication No. SMLI TR-94-25, April 1994.
Complex pipelines are also possible. Such pipelines may branch and rejoin in many ways, or even be arranged in multi-dimensional structures. Data flowing through them may meet and interact with data items that precede or follow in sequence, or with data items flowing in a separate pipeline. For an example of a multi-issue pipelined processor, see U.S. patent application Ser. No. 08/853,970, filed May 9, 1997, and entitled xe2x80x9cMulti-Issue/Plural Counterflow Pipeline Processor.xe2x80x9d
It has proven difficult to design complex asynchronous pipeline systems. The difficulty comes not only from their complex arrangements of circuits, but also from their complex behavior. One might deal with the circuit complexity alone; in other fields designers deal with circuits at least as complex. In an asynchronous system, however, any signal may occur at any time, constrained only by the explicit limitations placed on it by particular circuits. There is no arbitrary xe2x80x9ctimekeeperxe2x80x9d or xe2x80x9cclockxe2x80x9d by which to measure circuit performance. Rather, the designer must account for all the possible sequences of behavior that may occur, assuring that no such sequence can cause a fault. Of course, this can be difficult.
The present invention provides techniques for the design of such asynchronous systems. The design is embodied, in part, as a set of modules which are rich enough to encompass a large range of systems, but simple enough to enable relatively easy use in design. The modules described herein are generic in the sense that they provide for a variety of a practical implementations, including combinatorial logic components, data pathways of desired width, and many different interfaces. Selection of xe2x80x9cworking setsxe2x80x9d of modules is straightforward using known methods. Furthermore, each module is asynchronous. Each module starts the task for which it has been designed when instructed to do so by an adjacent module, and each module gives completion signals to adjacent modules to coordinate their actions. The modules fit together to form pipeline systems which provide particular utility in signal processors and general purpose microprocessors.
While one could assemble, with known Macromodules such as described in W. A. Clark, and C. E. Molnar, xe2x80x9cMacromodular Computer Systems,xe2x80x9d Computers in Biomedical Research Vol. IV, Chap. 3, Academic Press, New York (1974), many different pipeline systems, systems designed with prior art modules are intrinsically slower. In addition, the large number of macromodules of prior art provided many more opportunities for implementation error. The present invention provides a set of modules adapted to assembling the most useful forms of a pipeline system. Compared to the macromodules, the present invention provides simplicity of design and ease of understanding, yet does not unduly limit the range of systems that can be assembled.
One project which employed modules for the design of processors is the TANGRAM design system. This system was developed in the Netherlands in the late 1980s and early 1990s. See, for example, Kaes van Berkel, TANGRAM; Asynchronous Architecture for VLSI Programming, Cambridge University Press (1993). TANGRAM modules directly implement the syntactic primitives appearing in statements written in the TANGRAM programming language for describing asynchronous systems. As with other modular structures, systems designed using the TANGRAM modules are considerably slower than desired.
The routing of data from a source pathway to selectable alternative output pathways according to data values found in the source pathway has been employed in prior art systems. One system which used this self-routing of data appears in the processor-to-memory switch of the BBN Monarch Multi-Computer. See, for example, Randall D. Rettberg, et al., xe2x80x9cThe Monarch Parallel Processor Hardware Design,xe2x80x9d Computer (April 1990), pp. 18-30. In the BBN system, address bits within packets control the routing of the entire packet containing those bits. Successive address bits control the routing at successive routing stages.
Another alternative pathway routing scheme was developed for the Mosaic system, see Charles L. Seitz, et al., xe2x80x9cThe Design of the CalTech Mosaic C Multicomputer,xe2x80x9d Research on Integrated Systems; Proceedings 1993 Symposium, MIT Press (1993), pp. 1-22. The Mosaic system differs from the BBN system in that although the routing information is contained within the packets themselves, it is encoded relative to the location of the switch node, rather than as an absolute destination address. In the Mosaic system each node increments the encoded information as it passes through. Only when the encoded value has achieved a certain net value is the entire packet switched to the alternate pathway. Neither the BBN Monarch system, nor the Mosaic system, used the principle of controlling data routing in one pipeline by control bits carried in another pipeline.
There are several aspects to the present invention. A first aspect deals with control of the flow of data in one pipeline system on the basis of control information flowing in another pipeline system. It is often important to modulate the flow of data items in a pipeline. For example, one may wish to eliminate certain data items from the stream flowing through a pipeline according to their values. Alternatively, one may wish to steer certain data items into one branch of a pipeline system and other data items into another branch, again according to their values. For example, one may wish to process positive numbers in one branch and negative numbers in another branch. Prior pipeline systems have been able to eliminate or steer values in a pipeline according to information traveling within the pipeline itself, as in the Monarch and Mosaic systems. The present invention provides an additional capability to enable control of the flow of data items in one pipeline according to the values of control elements in another pipeline. As will be described, in embodiments of the invention, both the pipeline being controlled as well as the pipeline providing the control are asynchronous pipelines in the sense that events and operations occur in the pipelines whenever they are ready, not in accordance with externally supplied clock signals.
In one embodiment according to our invention, a system includes a first composition of places and paths to form a first pipeline having information flowing therethrough, and a second composition of places and paths to form a second pipeline also having information flowing therethrough. The terms xe2x80x9cPlacesxe2x80x9d and xe2x80x9cPathsxe2x80x9d have a special meaning as will be described below. The second pipeline has at least one place with a special connection to at least one place in the first pipeline. In such a system the information flowing through the first pipeline is used to control the disposition of information flowing through the second pipeline.
A second aspect of the present invention involves the control of data latches in the primary data paths of an asynchronous pipeline. It has been common practice to include the latch control circuits inside the asynchronous control loop of each stage of the pipeline. See, e.g., I. E. Sutherland, xe2x80x9cMicropipelines,xe2x80x9d Communications of the ACM (June 1989). A system with latch control circuits inside the asynchronous control loop follows a known xe2x80x9cbundled data convention.xe2x80x9d According to the bundled data convention a xe2x80x9cbundlexe2x80x9d consisting of data signals and a validating event signal, often called xe2x80x9crequest,xe2x80x9d are designed to have controlled delay such that the data signals always reach their final logic levels prior to arrival of the request. Thus, arrival of the request guarantees validity of the data signals. For a further discussion of bundled data conventions, see Introduction to VLSI Systems, C. Mead and L. Conway, Addison-Wesley Publishing Co. (1980), pp. 252-254.
To achieve greater speed, the present invention places such latch control circuitry outside the asynchronous control loop, thereby increasing throughput. Placing latch control circuitry outside the loop increases throughput, not only by reducing the amount of circuitry inside the loop, but also by permitting the latch control logic to operate concurrently with the asynchronous control loop. Thus, the present invention modifies the bundled data convention to guarantee only that the data signals will be valid a known interval after arrival of the request. The request signal thus becomes the herald of data to come, rather than a certification of data already present.
In this case an embodiment according to our invention includes a system in which there is a control path wherein request signals and acknowledge signals flow in a control loop, the request signals flowing in one direction, and the acknowledge signals flowing in an opposite direction, and a data path in which data flows. A control element is provided in the data path to regulate the flow of data therethrough, and a control circuit external to the control loop is connected to provide signals to the control element and connected to receive the request signals and the acknowledge signals from the control path as those signals flow through the control path, and in response thereto control the control element.
To illustrate these aspects of the invention, a complete set of modules, symbols for representing them, and rules for connecting the modules to each other are described herein. As will be described, the modules feature high-speed operation. This high speed results, in part, from removing from the asynchronous control loop much of the logic required to control the latches in their primary data paths. The asynchronous control circuits in these modules can act slightly in advance of the data transfer operations, enabling the data transfer to occur more rapidly than would be possible for systems with latch control logic inside the asynchronous control loop. Were the latch control logic inside the loop, further actions in the loop would have to await the actions of the latch control.
The set of modules described may be flexibly configured. Their flexibility comes from inclusion of modules specifically intended to control the flow of data in one pipeline according to xe2x80x9ccommand bitsxe2x80x9d carried as data in another pipeline. Because control of flow in a pipeline is an explicit task centralized in a specific module, systems that would otherwise be difficult to analyze and design become easily defined arrangements of interconnected modules. This makes a large range of modular systems possible, adding to the range of designs for which the modules are suitable. Interlocks included within the modules provide correct relative timing of operations to control which events occur, and to ensure that operations occur in proper sequence.
The family of modules described below can be broadly classified as three main module types: Places, Paths and Ports. For this reason we term the design system and notation we have developed for representing the modules as P**3, and pronounced xe2x80x9cP cubed.xe2x80x9d The particular module set described herein is intended to be exemplary. Those of ordinary skill will be able to design other sets of modules encompassing the concepts disclosed herein.
Another important feature of the modules describe herein is the one-to-one correspondence between the symbolic representation of a system and its circuit diagram. Each of the symbols representing a module describes a specific circuit. The symbolic representation has the advantage that where symbols connect in the symbolic representation, circuits connect in the physical implementation. Thus, the translation from symbolic representation to circuit topology can be more reliably achieved. This enables automated design techniques to be used in creating systems employing the modules.
xe2x80x9cTransition logicxe2x80x9d appears in the control circuits of the family of modules described. Each event is represented as a change in a logic level, also known as a xe2x80x9ctransition,xe2x80x9d independent of the actual logic level involved. Rising transitions from LO to HI carry the same meaning as falling transitions from HI to LO. Although transition logic is well known, it is not the only possible representation of events. Other representations of events may also be employed. Corresponding to each such representation, a designer skilled in the art might equally well practice the present invention.