One or more aspects relate, in general, to processing within a computing environment, and in particular, to extending data range addressing within the computing environment.
Different computer system architectures offer data addressing with different displacements providing varying sizes of directly addressable data ranges. For instance, the Power Architecture, offered by International Business Machines Corporation, Armonk, N.Y., provides data addressing with a 16-bit displacement providing a 64 KB (kilobyte) directly addressable data range, while the x86 architecture, offered by Intel Corporation, provides data addressing with a 32-bit displacement providing a 4 GB (gigabyte) directly addressable data range.
The size of the directly addressable data range impacts and limits software applications. For instance, the size of global data areas, such as a Global Offset Table or Table of Contents used by software applications to locate global variables, is limited by the size of the directly addressable data range. As an example, if 16 bits is the maximum size of an immediate offset from a base register, then the size of the directly usable global data area is limited to 64 KB.
Previously, attempts have been made to overcome this limitation. In one approach, a 16-bit offset is used and an overflow of the global data area (e.g., GOT) is handled as a linker correction step. For instance, when an instruction to access a GOT entry overflows the 16 bits, an access to the GOT with a load is replaced by a branch to a subroutine, where each load has one separate subroutine that has a hardcoded return to the place where the subroutine was invoked to improve flexibility and performance. However, even with software optimizations, this can be prohibitive and lead to penalties in excess of 10% of the overall runtime.
In a further approach, a compiler generates a two instruction sequence for all accesses to handle the overflow. As an example, the load (ld) instruction of the following code fragment (where insn 0 and insn 1 represent arbitrary instructions preceding and following the ld instruction) may be replaced by a two instruction sequence.
insn 0ldr4, offset_of_a(r2)insn 1=>                In one example, the ld instruction is replaced by a two instruction sequence of addis and ld, where ld is a load instruction and addis is an add immediate instruction.        
insn 0addisr4, r2, offset_of_a @ha;; high bitsldr4, offset_of_a(r4);; low bitsinsn 1
A two instruction sequence is generated by the compiler for all accesses, since the compiler does not know which GOT slots will be assigned by the linker outside of the 64 KB.
In yet a further approach, fusion is used, which enables the above sequence to be executed as three internal operations (iops), an example of which is:
insn 0iop: insn 0addisr4, r2, offset_of_a @ha;; high bits \ldr4, offset_of_a@l(r4);; low bits / -- fuse toiop: ld r4, offset_of_a(r2)insn 1iop: insn 1
However, this may not be used for various types of instructions, including float, vector and/or store instructions, as examples, where the result of addis in R4 is not to be overwritten by the second operation. Consequently, the first instruction computing an intermediate result may not be optimized away.