The present invention relates to information processing, and more specifically to architecture and operation of asynchronous circuits and processors.
Many information processing devices operate based on a control clock signal to synchronize operations of different processing components. Different processing components usually operate at different speeds due to various factors including the nature of different functions and different characteristics of the components or properties of the signals processed by the components. Synchronization of these different processing components requires the clock speed of the control clock signal to accommodate the slowest processing speed of the processing components in these xe2x80x9csynchronousxe2x80x9d processing devices. Thus, some processing components may complete respective operations earlier than other slow components and have to wait until all processing components complete their operations. This is not an efficient way of utilizing available resources.
Most commercial digital processors are such synchronous processing devices, including various microprocessors used in personal computers and other devices. Speed of a synchronous processor is usually increased by increasing the clock speed. This forces the instructions to be executed faster, since an instruction is executed based on each clock cycle. The maximum clock speed can be limited by various factors such as the processing speed of a slow processing component, the way that a clock signal is generated, or various effects caused by miniaturization of the integrated circuits within a processor.
An alternative approach, pioneered by Alain Martin of California Institute of Technology, eliminates synchronization of different processing components according to a clock signal. Different processing components simply operate as fast as permitted by their structures and operating environments. There is no relationship between a clock speed and the operation speed. This obviates many technical obstacles in a synchronous processor and can be used to construct an xe2x80x9casynchronousxe2x80x9d processor with a much simplified architecture and a fast processing speed that are difficult to achieve with synchronous processors.
U.S. Pat. No. 5,752,070 to Martin and Burns discloses such an asynchronous processor, which is incorporated herein by reference in its entirety. This asynchronous processor operates without a clock and goes against the conventional wisdom of using a clock to synchronize the various parts and operations of the processor. The instructions can be executed as fast as the processing circuits allow and the processing speed is essentially limited by only gate and interconnection delays.
Such an asynchronous processor can be optimized for fast processing by special pipelining techniques based on unique properties of the asynchronous architecture. Asynchronous pipelining allows multiple instructions to be executed at the same time. This has the effect of executing instructions in a different order than originally intended. An asynchronous processor compensates for this out-of-order execution by maintaining the integrity of the output data.
The present disclosure describes improved devices and processing methods for asynchronous processing. The disclosed architecture, circuit configurations and processing methods can be advantageously used to construct high-speed asynchronous digital processors.
One embodiment of the asynchronous system for information processing which is independent of a clock signal, comprises:
a plurality of execution units including a program counter unit, a memory unit, and at least one arithmetic logic unit, said execution units connected relative to one another in parallel;
a register unit having registers, connected to said execution units;
a fetch unit, connected to said program counter unit to receive a program counter signal and configured to retrieve instructions from an instruction memory unit according to said program counter signal;
a decoder connected to receive said instructions from said fetch unit and configured to decode said instructions to generate decoded instructions, wherein said decoder is connected to communicate with each of said execution units and said register unit;
a writeback unit communicating with said execution units and register unit to filter and route information from one member of said execution units and register unit to another member;
a first queue disposed between said decoder and said writeback unit to store and transfer ordering information to said writeback unit to indicate an order in which said decoder dispatches said decoded instructions to said execution units; and
a second queue disposed between said program counter unit and said writeback unit to store and transfer said program counter signal to said writeback unit,
wherein said program counter unit, said fetch unit, said instruction memory unit, and said decoder form a pipelined fetching loop operable to simultaneously transfer at least two instructions unsyncrhonized with respect to each other.
One aspect of the invention for exception handling is the write-back unit with which each execution unit that can cause an exception communicates.
Another aspect of the invention is pipelining of the completion mechanism to improve the throughput of the system.
These and other aspects and advantages will become more apparent in light the following accompanying drawings, the detailed description, and the appended claims.