Compilers are software programs that transform computer code in a source language into a different computer language. Many compilers are designed to take source computer code written in a high-level programming language, such as C or C++, and generate corresponding code in a lower-level language, such as assembly language. Assembly language code provides a close representation of the elemental machine language instructions that are executed by a processor.
In most 64-bit computing systems, to add a signed 8, 16, or 32-bit integer to a 64-bit integer, the smaller-sized integer must first be converted to 64 bits. This occurs, for example, during the execution of the instructions that correspond to the C code shown in Code Listing One 100 in FIG. 1.
In Code Listing One 100, the comments in lines one through four explain how the variables shown in lines 6-8 are declared. Integer i is a signed 32-bit integer. Because 64-bit addressing is used in the example of Code Listing One 100, the address to integer array a is stored as a 64-bit value. To use integer i to index into array a, the value for i must be added to the address for array a during each execution of the while loop. Because a 32-bit value must be converted into a 64-bit value to be added to a 64-bit value, integer i must be converted to a 64-bit value during each execution of the while loop. This is reflected in further detail in Code Listing Two 200, shown in FIG. 2.
Code Listing Two 200 shows AMD64 assembly code that corresponds to the while loop shown in Code Listing One 100. Lines one through three of Code Listing Two 200 include comments that explain which registers are used to store values that correspond to the variables shown in Code Listing One 100. In line five, a 32-bit addition operation is performed to add the value for signed integer stride (stored in 32-bit register r9d) to the signed integer i (stored in 32-bit register edx). In line six, the movsxd instruction moves the value for signed integer i (stored in register edx) to 64-bit register rdx. The movsxd instruction also sign-extends the most significant bit of the integer i value as stored in edx into the upper 32 bits of rdx. In line seven, the signed 64-bit value of integer i (stored in rdx) is multiplied by four. Four is a scale factor used in this instance because each element of array a is a four-byte (32-bit) integer. The product of this multiplication is then added to the 64-bit base address for array a (stored in rcx). The sum of this addition operation is used as the input address to the memory reference portion (“dword ptr”) of the cmp instruction. The cmp instruction compares the referenced element of array a to the value for integer ref (stored in 32-bit register r8d). In line eight, the jl instruction either terminates the loop or returns the instruction flow to the top of the loop (line four), based on the outcome of the cmp instruction.
Integer wraparound semantics (also referred to as “overflow” semantics) define how an integer behaves when its value is set such that it would exceed its maximum possible value or go below its minimum possible value. According to common integer semantics, adding 1 to an integer when its value is at its maximum possible value will result in the integer being set to its minimum possible value. Similarly, subtracting 1 from an integer when its value is at its minimum possible value will result in the integer being set to its maximum possible value. Most computer language specifications define integer wraparound semantics. Compilers are designed to generate target code that reflects the integer wraparound semantics specified in the source code.
In most computing systems, signed integers are stored using two's complement representation. Using binary positional notation, each bit in a sequence of bits that is allocated to store a number has a weight which is a power of two. The weights increase from right to left. For example, in the binary sequence 0001, the 1 has a weight of 1; in the sequence 0010, the 1 has a weight of 2; in 0100, the 1 has a weight of 4; and so on. In two's complement representation, the most significant bit (the bit furthest to the left in the sequence) has a negative weight. For example, in two's complement representation the sequence 0111 has a decimal value of 7, and the binary sequence 1111 has a decimal value of −1. Binary sequence 1111 has a decimal value of −1 because, according to two's complement representation, the most significant bit in the sequence has a value of −8. When −8 is added to the value of +7 represented by the three least significant bits in the sequence, the resulting value for the sequence is −1.
For a signed N-bit integer in two's complement representation, the maximum value that the integer can represent is 2N-1−1. The minimum value that a signed N-bit integer can represent is −2N-1. An N-bit two's-complement numeral system can therefore represent every integer in the range −2N-1 to +2N-1−1.
For unsigned integers, no bits are required to represent a sign. Therefore, every bit used to represent the value of an unsigned integer can be used to represent magnitude. The minimum value unsigned integers can represent is 0, and the maximum value that an unsigned N-bit integer can represent is 2N−1.
As signed and unsigned integers of the same bit size have different maximum and minimum values, signed and unsigned integers of the same bit size have different wraparound semantics. To illustrate this, FIG. 3 shows integer wraparound behavior in signed 4-bit and unsigned 4-bit integers. Integer 300 is a 4-bit signed integer made up of bits 310, 312, 314, 316. Bit 316 is the most significant bit of integer 300 and bit 310 is the least significant bit of integer 300. The other integers in FIG. 3 (302, 304, 350, 352, and 354, discussed in further detail below) are arranged with the same layout as integer 300, with their most significant bit to the left and their least significant bit to the right.
Integer 300 stores numbers according to two's complement representation. As shown in FIG. 3, the binary value for integer 300 is 0111 (which is 7 in decimal). This is the maximum value that a 4-bit signed integer can represent using two's complement representation.
Integer 302 is also a signed 4-bit integer that uses two's complement representation. The least-significant bit of integer 302 is set to 1 while all of the other bits of integer 302 are set to 0, giving integer 302 a value of 0001 as represented in binary and 1 in decimal. Performing a binary addition of integer 300 and integer 302 results in the binary value of 1000, which is shown in bits 320, 322, 324, 326 of integer 304. Integer 304 is also a signed 4-bit integer that uses two's complement representation. According to two's complement representation, the binary value 1000 indicates decimal value −8. This shows that adding 1 to the maximum possible value stored in a signed 4-bit integer (+7) results in a wrap around to −8, which is the minimum possible value that a signed 4-bit integer is capable of representing using two's complement representation.
Integer 350 is an unsigned 4-bit unsigned integer and shows different wrap around/overflow behavior from signed integers 300, 302, 304. Each of the bits 360, 362, 364, 366 of integer 350 is set to 1, giving integer 350 a decimal value of 15. This is the maximum value that a 4-bit unsigned integer can represent.
Integer 352 is a signed 4-bit integer that uses two's complement representation. The least-significant bit of integer 352 is set to 1 while all of the other bits of integer 352 are set to 0, giving integer 352 a value of 1.
Performing a binary addition of integer 350 and integer 352 results in a value of 0. This result is shown as a 0 in each of bits 370, 372, 374, and 376 of integer 354. Integer 354 is an unsigned 4-bit integer. Zero is the lowest value that an unsigned 4-bit integer is capable of representing. This shows that adding 1 to the maximum possible value stored in an unsigned 4-bit integer (+15) results in a wrap around to 0, which is the minimum possible value that an unsigned 4-bit integer is capable of representing.
As shown in FIG. 3, signed 4-bit integers wrap around from 23−1 to −23, while unsigned 4-bit integers wrap around from 24−1 to 0. According to the same principles, signed 32-bit integers wrap around from 231−1 to −231, and unsigned 32-bit integers wrap around from 232−1 to 0. Expressed in hexadecimal form, signed 32-bit integers wrap around from 0x7fffffff to 0x80000000, and unsigned 32-bit integers wrap around from 0xfffffff to 0x0.
The above-described integer wraparound semantics are preserved when, for example, a compiler generates the assembly code of Code Listing Two 200 from the C code of Code Listing One 100. In Code Listing Two 200, the movsxd instruction of line six is necessary to convert the 32-bit integer (integer i) to a 64-bit integer so as to perform the 64-bit addition inside the loop. To make execution of the loop more efficient, it would be desirable to change the code such that the conversion to 64 bits is not performed during every loop execution. However, no such solutions are available in the current technology. Therefore, a new approach is required for improving the efficiency of loops that include addition of a 64-bit integer and a smaller-length integer, while correctly preserving integer wraparound semantics.