Various techniques have been employed to increase computing performance and efficiency of a processor. For example, superscalar processors have been developed to achieve execution of multiple instructions at one time. Other techniques have also been employed to attain parallelism in operation, called multi-tasking, that increases processor performance. Multiple processors have been combined into a single system, each processor being capable of executing a particular sequence of instructions in a program or program segment. A program sequence of instructions is called a thread and the multiple processors execute a plurality of threads in a system with horizontal multi-threading.
An alternative multi-tasking technique is vertical threading, a promising way to enhance multi-tasking database performance of a processor. Vertical threading is a technique in which a single processing pathway is used by more than one program thread. A capacity for vertical threading exists because a program thread is not always actively executing. A program may be in a wait state awaiting either data or an event, such as a trap or interrupt. For example, some applications have frequent cache misses that result in heavy clock penalties. A most desirable condition is that a second process may utilize the processor while a first process is waiting for the arrival of data.
For example in data processing applications with frequent cache misses, data is accessed from a secondary storage structure, resulting in heavy clock penalties. During data accessing delays when a first process is idle, a beneficial usage of the idle pipeline would be to allow a second process to execute. The second process can take over the idle pipeline by saving all useful states of the first process in some location and assigning new states to the second process. When the second process becomes idle and the first process returns to processing, saved states are returned to the pipeline and the first process proceeds as usual. Vertical threading in this manner requires that states for the first process be saved in some location before beginning execution of the second process, and that states for the second process be saved in some location before returning to execution of the first process. The saving of states and switching between states is typically termed context switching.
Context switching using software techniques is time-expensive and does not enhance performance.
In one technique for vertical threading, multiple processes share a processor pipeline. To share the pipeline, all useful states of an inactive process must be saved during inactivity, and new states must be assigned upon process switching. Saving of the inactive process states requires duplication of storage structures. Assignment of new states requires switching logic. Unfortunately duplication of resources is costly in terms of integrated circuit space consumption and performance.
What is needed is a vertical multi-threading technique and structure that reduce the amount of additional resources for storing inactive states and switching between states.
A vertical multi-threading processor includes one or more execution pipelines that are formed from a plurality of multiple-bit pipeline register flip-flops. The multiple-bit pipeline register flip-flops supply multiple storage bits. The individual bits of a multiple-bit pipeline register flip-flop store data for one of respective multiple threads or processes. When an executing (first) process stalls due to a stall condition, for example a cache miss, an active bit of the multiple-bit register flip-flop is stalled, removed from activity on the pipeline, and a previously inactive bit becomes active for executing a previously inactive (second) process. All states of the stalled first process are preserved in a temporarily inactive bit of the individual multiple-bit register flip-flop in each pipeline stage.
Vertical threading, a capability of switching among a plurality of separate processes when one of the processes is stalled, is attained by inserting a multiple-bit register flip-flop at sequential stages of a pipeline.
A multiple-bit register flip-flop is an integrated circuit device for synchronization of data in a data path and includes a driver and a plurality of storage elements coupled to the driver that drives data to the plurality of storage elements. The plurality of storage elements are coupled to the data path without delaying the data path.
According to one aspect of the processor, a pipeline register for synchronizing data in a data path includes a driver and a plurality of switched storage elements coupled to the driver. The driver is capable of driving a storage element selected by the switch while data in one or more storage elements that are not selected by the switch is held in the storage element. The plurality of storage elements are coupled to the data path without delaying the data path.
The pipeline register employs a method of operation including passing a time pulse, sampling data during the time pulse, and passing the data along a data path. The method further includes selecting a storage element from a plurality of storage elements, and storing the sampled data in a storage element connected to but outside the data path. The plurality of storage elements are capable of storing a respective plurality of execution thread.
In accordance with an aspect of usage of the integrated circuit device, a processor includes a control logic for executing computational and logic operations and a memory coupled to the control logic. The control logic and the memory include a plurality of flip-flops for synchronization of data in a data path. The flip-flops include a driver and a plurality of switched storage elements coupled to the driver. The driver is capable of driving a storage element selected by the switch, while data in one or more storage elements that are not selected by the switch are held in the storage element. The plurality of storage elements are coupled to the data path outside the data path.