The present application relates generally to data processing, and more specifically, to processor architecture. Binary data is organized in memory as 8-bit units called “bytes,” while the registers implemented by a processor may be larger than a single byte. The term “endian” refers to how bytes of a multi-byte element are ordered within memory as data is moved between registers and memory.
Individual bytes of a multi-byte element are generally stored in consecutive memory addresses (e.g., 4 consecutive addresses for a 32-bit element). A big-endian processor stores the most significant byte of the multi-byte element in the lowest address of the consecutive range, and stores the least significant byte in the highest address. In contrast, a little-endian processor stores the least significant byte in the lowest address. Put another way, bytes of increasing numeric significance are stored to increasing memory addresses by a little-endian processor, while a big-endian processor stores decreasing numeric significance with increasing memory addresses.
Consider, as an example, the 4-byte element “0A 0B 0C 0D” and a memory range with offsets 0-3. A big endian processor places the first byte (“0A”) in offset 0, the second byte (“0B”) in offset 1, the third byte (“0C”) in offset 2, and the last byte (“0D”) in the last offset, 3. A little-endian processor uses the reversed order, placing the first byte (“0A”) in offset 3, the second byte (“0B”) in offset 2, the third byte (“0C”) in offset 1, and the last byte (“0D”) in the first offset, 0.
A further complication arises in processing vectors of multi-byte elements. A 128-bit vector could contain a set of eight 2-byte halfwords, or a set of four 4-byte words, or a set of two 8-byte doublewords, or even any combination of these elements that add up to a total of a quadword (128-bits) in length. A vector of eight halfwords, four words, or two doublewords can all be loaded using the same load vector instruction, which loads a quadword. A big-endian processor would most efficiently load the vector as a monolithic quadword in big-endian byte-ordering, having the effect that vector element 0 would be placed into the leftmost element of the target vector register. On the other hand, a little-endian processor would most efficiently load the vector as a monolithic quadword in little-endian byte-ordering, having the effect that vector element 0 would be placed into the rightmost element of the target vector register.
As such, big-endian processors will define vector instructions that process vector data assuming vector elements are mapped in the vector register in left-to-right order. Likewise, little-endian processors will define vector instructions that process vector data assuming vector elements are mapped in the vector register in right-to-left order.
Even if a processor supporting both endian modes handles the byte-ordering differences between big-endian and little-endian data, a different problem arises with a class of vector instructions that process vector data and are sensitive to the ordering of the vector elements in the vector registers. Such element-ordering-sensitive vector instructions include (but are not limited to) element permute operations, element extract operations, element insert operations, pack operations, unpack operations, multiply even/odd operations, some cryptographic operations, string operations, encoding operations, decoding operations, and scalar operations. When pairs of vector registers are concatenated to form a double-wide source operand, these operations can be sensitive to the order of these vector registers (i.e., which is concatenated on the left and which on the right).
Such element-ordering-sensitive instructions as implemented on a big-endian processor will not be capable of correctly processing vectors that are loaded using little-endian byte-ordering. Likewise, element-ordering-sensitive instructions as implemented on a little-endian processor will not be capable of correctly processing vectors loaded using big-endian byte-ordering.