1. Field of the Invention
This invention is directed to a power reduction apparatus, and in particular, to a self-timed power reduction apparatus that reduces power consumption.
2. Background of the Related Art
A processor such as a microprocessor, micro controller or a digital signal processor (DSP) processor includes of a plurality of functional units, each with a specific task, coupled with a set of binary encoded instructions that define operations on the functional units within the processor architecture. The binary encoded instructions can then be combined to form a program that performs some given task. Such programs can be executed on the processor architecture or stored in memory for subsequent execution.
To operate a given program, the functional units within the processor architecture must be synchronized to ensure correct (e.g., time, order, etc.) execution of instructions. "Synchronous" systems apply a fixed time step signal (i.e., a clock signal) to the functional units to ensure synchronized execution. Thus, in related art synchronous systems, all the functional units require a clock signal. However, not all functional units need be in operation for a given instruction type. Since the functional units can be activated even when unnecessary for a given instruction execution, synchronous systems can be inefficient.
The use of a fixed time clock signal (i.e., a clock cycle) in synchronous systems also restricts the design of the functional units. Each functional unit must be designed to perform its worst case operation within the clock cycle even though the worst case operation may be rare. Worst case operational design reduces performance of synchronous systems, especially where the typical case operation executes much faster than that of the worst case criteria. Accordingly, synchronous systems attempt to reduce the clock cycle to minimize the performance penalties caused by worst case operation criteria. Reducing the clock cycle below worst case criteria requires increasingly complex control systems or increasingly complex functional units. These more complex synchronous systems reduce efficiency in terms of area and power consumption to meet a given performance criteria such as reduced clock cycles.
Related art self-timed systems, also known as asynchronous systems, remove many problems associated with the clock signal of synchronous systems. Accordingly, in asynchronous systems, performance penalties only occur in an actual (rare) worst case operation. Accordingly, asynchronous systems can be tailored for typical case performance, which can result in decreased complexity for processor implementations that achieve the performance requirements. Further, because asynchronous systems only activate functional units when required for the given instruction type, efficiency is increased. Thus, asynchronous systems can provide increased efficiency in terms of integration and power consumption.
A related art asynchronous systems use functional units having an asynchronous interface protocol to pass data and control information. By coupling such asynchronous functional units together to form larger blocks, increasingly complex functions can be realized. FIG. 1 shows two such functional units coupled via data lines and control lines. A first functional unit 100 is a sender, which passes data. The second functional unit 102 is a receiver, which receives the data.
Communication between the functional units 100, 102 is achieved by bundling data wires 104 with control wires. A request control wire REQ is controlled by the sender 100 and is activated when the sender 100 has placed valid data on the data wires 104. An acknowledge control wire ACK is controlled by the receiver 102 and is activated when the receiver 102 has consumed the data that was placed on the data wires 104. This asynchronous interface protocol is known as a "handshake" because the sender 100 and the receiver 102 both communicate with each other to pass the bundled data.
The asynchronous interface protocol shown in FIG. 1 can use various timing protocols for data communication. One related art protocol is based on a 4-phase control communication scheme. FIG. 2 shows a timing diagram for the 4-phase control communication scheme.
As shown in FIG. 2, the sender 100 indicates that the data on the data wires 104 is valid by generating an active request control wire REQ high. The receiver 102 can now use the data as required. When the receiver 102 no longer requires the data, it signals back to the sender 100 an active acknowledge control wire ACK high. The sender 100 can now remove the data from the communication bus such as the data wires 104 and prepare the next communication.
In the 4-phase protocol, the control lines must be returned to the initial state. Accordingly, the sender 100 deactivates the output request by returning the request control wire REQ low. On the deactivation of the request control wire REQ, the receiver 102 can deactivate the acknowledge control wire ACK low to indicate to the sender 100 that the receiver 102 is ready for more data. The sender 100 and the receiver 102 must follow this strict ordering of events to communicate in the 4-phase control communication scheme. Beneficially however, there is no upper bound on the delays between consecutive events.
A first-in first-out (FIFO) register or pipeline provides an example of self-timed systems that couple together a number of functional units. FIG. 3 shows such a self-timed FIFO structure. The functional units can be registers 300a-300c with both an input interface protocol and an output interface protocol. When empty, each of the registers 300a-300c can receive data via an input interface 302 for storage. Once data is stored in the register, the input interface cannot accept more data. In this condition, the register 300a input has "stalled". The register 300a remains stalled until the register 300a is again empty. However, once the register 300a contains data, the register 300a can pass the data to the next stage (i.e., register) of the self-timed FIFO structure via an output interface 304. The registers 300a generates an output request when the data to be output is valid. Once the data has been consumed and the data is no longer required, the register 300a is then in the empty state. Accordingly, the register 300a can again receive data using the input interface protocol.
Chaining the registers 300a-300c together by coupling the output interface 304 to the input interface 302 forms the multiple stage FIFO or pipeline. Thus, an output interface request and acknowledge signals, Rout and A out, are respectfully coupled to the following register 300a-300c (stage) input interface request and acknowledge signals, Rin and Ain. As shown in FIG. 3, data passed into a FIFO input 306 will be passed from register 300a to register 300c to eventually emerge at a FIFO output 308. Thus, data ordering is preserved as the data is sequentially passed along the FIFO. The FIFO structure shown in FIG. 3 can use the 4-phase control communication scheme shown in FIG. 2 as the input and output interface protocol.
The FIFO register of FIG. 3 can include logic processing. In this case, data passes through processing logic between stages of the FIFO register. As shown in FIG. 4, data passes through processing logic 402a-402b between registers 300a-300c. Since the processing logic 402a-402b takes time to determine an output value, control signals (e.g., the output interface request signal Rout) are delayed to corresponding match the logic delay. The coordinated control signal delay and processing logic delay ensures the 4-phase communication protocol is satisfied. In other words, the data arrives and then the request Rout signals its validity.
As shown in FIG. 4, the delay in the request path lengthens the time taken for the handshake to complete, which allows the data computation in the processing logic to complete. The control signal delay can be any value that is appropriate to match the logic data delay. Further, the delay 404a-404b can be variously implemented. For example, a simple matched path, a variable delay or function of the data presented can be used as the delay 404a-404b. However, an increase in the delay reduces the throughput and performance of the self-timed system because a delay in the handshake request/acknowledge loop decreases the data transfer rate.
An object of the present invention is to substantially obviate the above described problems and disadvantages of the prior art.
Another object of the present invention is to reduce the power consumption of a semiconductor device.
A further object of the present invention is reduce power consumption of an asynchronous system by determining an operational speed based on load requirements.
In order to achieve at least the above objects in a whole or in part, there is provided an asynchronous system according to the present invention that includes a plurality of functional units intercoupled to perform at least one task and a power control circuit coupled to a selected one of the plurality of functional units to determine at least one of a first and a second operating speed of the selected functional unit.
To further achieve the above objects in a whole or in part, there is provided a data processing apparatus according to the present invention that includes a plurality of functional units, an asynchronous controller that decodes a current instruction to perform a corresponding instruction task using a group of the plurality of functional units, a power determination device, wherein the data processing apparatus operates at one of a plurality of power levels selected by the power determination device and a communication device coupling the functional units, the power determination device and the controller.
To further achieve the above objects in a whole or in part, there is provided a method for operating an asynchronous system having a plurality of intercoupled functional units according to the present invention that includes determining an operating criteria of the asynchronous system and determining one of a plurality of a power consumption levels based on the operating criteria of the asynchronous system.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and advantages of the invention may be realized and attained as particularly pointed out in the appended claims.