1. Technical Field
The invention relates to the field of multiprocessor systems, and more specifically to a method, apparatus, and product for providing an efficient virtualized time base in a scaleable multi-processor computer system.
2. Description of Related Art
A symmetric multiprocessing (SMP) data processing system has multiple processor cores that are symmetric such that each processor core has the same processing speed and latency. An SMP system has one operating system that divides the work into tasks that are distributed evenly among the various cores by dispatching one software thread of work to each processor core at a time. Thus, a processor core in an SMP system executes only one thread at a time.
A simultaneous multi-threading (SMT) data processing system includes multiple processor cores that can each concurrently execute more than one thread at a time per processor core. An SMT system has the ability to favor one thread over another when both threads are running on the same processor core.
Known systems can include one or more shared processor cores where the shared processor cores are shared among the various processes that are being executed by the system. The processor core may be an SMT processor core. A shared processor core may be part of a logically partitioned system and shared among the various partitions in the system. The number of virtual partitions and the amount of system capacity allocated to each partition may be defined or modified at boot time by the system operator.
These systems typically include firmware, also called a hypervisor, that manages and enforces the partitioning and/or sharing of all the processor cores in the system. For example, a hypervisor may dispatch a virtual partition to one or more physical processor cores. The virtual partition includes a definition of the work to be done by each physical processor core as well as various settings and state information that are required to be set within each physical processor core in order for the physical processor core to execute the work. Thus, each virtual partition can be a “virtual” SMP system.
In known shared processor systems, the hypervisor supervises and manages the sharing of each physical processor core among all of the logical partitions. The hypervisor assigns a dispatch time slice to each logical partition. The hypervisor will service all of the logical partitions by dispatching logical partitions to the physical processor cores. The hypervisor services the logical partitions by granting time to each logical partition during which the logical partition will be executed on one or more of the physical processor cores. The hypervisor may dispatch more than one logical partition at the same time to different groups of physical processor cores.
Each logical partition will be defined by particular configuration data that is needed by a physical processor core to process that logical partition. The configuration data includes particular data, register values, states, settings, and information. All of the configuration data is stored by the hypervisor in the hypervisor's memory. When a particular logical partition is to be dispatched to a physical processor core, the hypervisor will retrieve the configuration data for that partition, restore all the settings to the registers and state in the processor core, and resume processing from the point that the partition was last suspended. Once the time slice that was granted to that physical partition has expired, the hypervisor will save the current values for all of the configuration data back into its memory to be retrieved at a later time for further processing of the logical partition.
Each processor core includes its own Time Base (TB) register. The TB register is a free-running 64-bit register that increments at a constant rate so that its value represents relative time. The TB registers are initially synchronized across all processors in an SMP system so that all processors in the system have the same relative time. The time indicated by the TB register is the time that has elapsed since the machine was restarted.
The TB register is a shared resource across all threads in a multi-threaded processor, and the constant rate that it increments, is known to software executing on each thread. The hypervisor also maintains a Real-Time Offset (RTO) for each logical partition. The RTO is the wall-clock time at the time when the TB started incrementing from zero. The RTO remains static until updated by the hypervisor.
The hypervisor can convert the TB value to a current wall-clock time by multiplying the period of the TB increment by the TB value, and adding the RTO. The hypervisor maintains the RTO as part of the configuration data for each partition. When operating system software wants to change its wall-clock time, it informs the hypervisor, and the hypervisor simply modifies the RTO for the partition, it does not modify the hardware TB. Often software tasks only require relative time, which can be determined simply by reading the current value stored in the TB.
A logical partition may be dispatched simultaneously to one or more physical processor cores. These processor cores may be located within the same chip, or within different chips in the same machine. In addition, a logical partition may be dispatched to a particular processor core at one time and then dispatched to a completely different processor core at a later time. The two processor cores may be located within the same chip, or within different chips in the same machine.
A partition obtains the current relative time by reading the value that is currently stored in the TB register of the processor core that is currently executing the partition. If a partition is suspended and then resumed on a second processor core, the partition will obtain the current relative time using the value that is stored in the TB register included within the second processor core.
Software running in different logical partitions is allowed to have different wall-clock times, but all threads in all partitions must always observe time to be advancing, both wall-clock time and relative time. Small forward jumps in time are allowable, but backward jumps are not.
The TB registers must be synchronized across all physical processors which are executing the same logical partition, since the operating system or dispatched processing threads could read the TB from the different processors running the partition, in any order, and must always observe time to be advancing. Since all processors running the same partition must have their TB registers synchronized, and the hypervisor can dispatch any logical partition to any physical processors in the machine, the TB registers must be synchronized across all processor cores in the entire machine.
Since the TB registers are synchronized across all processors within the same machine, a suspended logical partition which is resumed on a different processor in the same machine would see a forward jump in time by the amount of time that the partition was suspended. The time slices allocated for dispatching partitions will keep this forward jump acceptably small.
If the TB registers were not synchronized, then resuming a suspended partition on a processor with a different TB value could appear as a large forward or backward jump in time, which is not allowed by the architecture. Because the TB in processor cores in different machines generally have different values, a limitation of the prior art is that logical partitions are limited to running on processors within the same machine. It is desirable to be able to suspend and resume logical partitions on the same or a different machine.
With the prior art, permitting each logical partition to change the value of the TB in all of its processor cores every time it is dispatched is not an acceptable situation. If a TB value in one core in a partition is changed, the values of all of the TBs in that partition will no longer all be synchronized to each other. The values in the TBs would need to be resynchronized to each other by suspending all partitions that are executing in the machine, stopping all TB values from moving forward, updating the required TB values in the other cores running that partition, restarting counting by all time bases, then resuming all partitions. This approach is unacceptable because it affects the performance of the machine since the processing of the logical partitions is suspended, and it results in an apparent drift in time between this machine and other machines and clocks due to the frequent pausing of the TB registers.
One possible solution would be to have the hypervisor firmware intercept all software accesses to the TB via an interrupt, and apply an additional offset which is maintained by the hypervisor for each partition. However, this would negatively impact performance due to the overhead of handling an interrupt for every access to the TB.
Therefore, a need exists for a method, apparatus, and product for a virtualized time base in a scalable multiprocessor system which provides precise monotonically non-decreasing time synchronization across all software threads in a logical partition when suspended and resumed on the same or a different machine, while maintaining real time correlation to other machines or clocks.