1. Field of the Invention
The present invention relates to data processing systems, and in particular, to a scheduler and method for a digital processor.
2. Background of the Related Art
A processor such as a microprocessor, micro controller or a digital signal processor (DSP) processor includes of a plurality of functional units, each with a specific task, coupled with a set of binary encoded instructions that define operations on the functional units within the processor architecture. The binary encoded instructions can then be combined to form a program that performs some given task. Such programs can be executed on the processor architecture or stored in memory for subsequent execution.
To operate a given program, the functional units within the processor architecture must be synchronized to ensure correct (e.g., time, order, etc.) execution of instructions. "Synchronous" systems apply a fixed time step signal (i.e., a clock signal) to the functional units to ensure synchronized execution. Thus, in related art synchronous systems, all the functional units require a clock signal. However, not all functional units need be in operation for a given instruction type. Since the functional units can be activated even when unnecessary for a given instruction execution, synchronous systems can be inefficient.
The use of a fixed time clock signal (i.e., a clock cycle) in synchronous systems also restricts the design of the functional units. Each functional unit must be designed to perform its worst case operation within the clock cycle even though the worst case operation may be rare. Worst case operational design reduces performance of synchronous systems, especially where the typical case operation executes much faster than that of the worst case criteria. Accordingly, synchronous systems attempt to reduce the clock cycle to minimize the performance penalties caused by worst case operation criteria. Reducing the clock cycle below worst case criteria requires increasingly complex control systems or increasingly complex functional units. These more complex synchronous systems reduce efficiency in terms of area and power consumption to meet a given performance criteria such as reduced clock cycles.
Related art self-timed systems, also known as asynchronous systems, remove many problems associated with the clock signal of synchronous systems. Accordingly, in synchronous systems, performance penalties only occur in an actual (rare) worst case operation. Accordingly, asynchronous systems can be tailored for typical case performance, which can result in decreased complexity for processor implementations that achieve the performance requirements. Further, because asynchronous systems only activate functional units when required for the given instruction type, efficiency is increased. Thus, asynchronous systems can provide increased efficiency in terms of integration and power consumption.
By coupling such functional units together to form larger blocks, increasingly complex functions can be realized. FIG. 1 shows two such functional units coupled via data lines and control lines. A first functional unit 100 is a sender, which passes data. The second functional unit 102 is a receiver which receives the data.
Communication between the functional units 100, 102 is achieved by bundling data wires 104. Self-timed or asynchronous methodology uses functional units with an asynchronous interface protocol for the passing of data and control status. A request control wire REQ is controlled by the sender 100 and is activated when the sender 100 has placed valid data on the data wires 104. An acknowledge control wire ACK is controlled by the receiver 102 and is activated when the receiver 102 has consumed the data that was placed on the data wires 104. This asynchronous interface protocol is known as a "handshake" because the sender 100 and the receiver 102 both communicate with each other to pass the bundled data.
The asynchronous interface protocol shown in FIG. 1 can use various timing protocols for data communication. One related art protocol is based on a 4-phase control communication scheme. FIG. 2 shows a timing diagram for the 4-phase control communication scheme.
As shown in FIG. 2, the sender 100 indicates that the data on the data wires 104 is valid by generating an active request control wire REQ high. The receiver 102 can now use the data as required. When the receiver 102 no longer requires the data, it signals back to the sender 100 an active acknowledge control wire ACK high. The sender 100 can now remove the data from the communication bus such as the data wires 104 and prepare the next communication.
In the 4-phase protocol, the control lines must be returned to the initial state. Accordingly, the sender 100 deactivates the output request by returning the request control wire REQ low. On the deactivation of the request control wire REQ, the receiver 102 can deactivate the acknowledge control wire ACK low to indicate to the sender 100 that the receiver 102 is ready for more data. The sender 100 and the receiver 102 must follow this strict ordering of events to communicate in the 4-phase control communication scheme. Beneficially however, there is no upper bound on the delays between consecutive events.
A first-in first-out (FIFO) register or pipeline provides an example of self-timed systems that couple together a number of functional units. FIG. 3 shows such a self-timed FIFO structure. The functional units can be registers 300a-300c with both an input interface protocol and an output interface protocol. When empty, each of the registers 300a-300c can receive data via an input interface 302 for storage. Once data is stored in the register the input interface cannot accept more data. In this condition, the register 300a input has "stalled". The register 300a remains stalled until the register 300a is again empty. However, once the register 300a contains data, the register 300a can pass the data to the next stage (i.e., register) of the self-timed FIFO structure via an output interface 304. The registers 300a generate an output request when the data to be output is valid. Once the data has been consumed and the data is no longer required, the register 300a is then in the empty state. Accordingly, the register 300a can again receive data using the input interface protocol.
Chaining the registers 300a-300c together by coupling the output interface 304 to the input interface 302 forms the multiple stage FIFO or pipeline. Thus, an output interface request and acknowledge signals, Rout and Aout, are respectfully coupled to the following register 300a-300c (stage) input interface request and acknowledge signals, Rin and Ain. As shown in FIG. 3, data passed into a FIFO input 306 will be passed from register 300a to register 300c to eventually emerge at a FIFO output 308. Thus, data ordering is preserved as the data is sequentially passed along the FIFO. The FIFO structure shown in FIG. 3 can use the 4-phase control communication scheme shown in FIG. 2 as the input and output interface protocol.
To implement an asynchronous processor, a more complex array of functional units is required. Further, to process an instruction, the instruction must be decoded to activate the functional units required to perform the corresponding instruction task. However, to execute the instruction, the functional units may have dependencies such as data dependencies so that the functional units can not merely operate concurrently (e.g., within a clock cycle as in synchronous systems). Such dependencies enforce sequential operations on the functional unit activity to correctly execute each instruction.
An asynchronous processor is disclosed in "A Fully Asynchronous Digital Signal Processor Using Self-Timed Circuits" by Jacobs et al., IEEE Journal of Solid-State Circuits, Volume 25, Number 6, 1990 (hereafter Jacobs). However, the asynchronous processor in Jacobs merely initiates a preset activation order of all functional units regardless of the instruction. Accordingly, the asynchronous processor in Jacobs has disadvantages in that inefficiencies occur because unnecessary functional units are activated for a given instruction. Further inefficiencies occur because the ability to exploit potential concurrent operations by functional units that do not have data dependencies is lacking. In addition, Jacobs can not individually control the order and execution of the functional unit activity for each instruction to increase concurrency and efficiency.
The above references are incorporated by reference herein where appropriate for appropriate teachings of additional or alternative details, features and/or technical background.