Modern CPUs have physically word-wise organized memories. A word consists of multiple bytes. Usually the byte number is a power of two. However the programming model organizes memory byte-wise. Therefore subword accesses to memory must extract and insert subwords in the correct position of a memory word.
Words are organized using two different byte orders, little (least significant byte first) and big (most significant byte first) endian.
The size of subwords typically is also a power of two. Access to words and subwords is simpler, if the word is partitioned in subwords of equal size, thus access happens at aligned addresses.
Positions in the word (subword address) and addresses of the word in memory (memory address or word address) should be distinguished from each other.
Conventionally, access of subwords is done using multiplexers on the read side, and demultiplexers on the write side. Bi-endian usually is supported by binary inverting the subword's address part. This changes the order of the bytes stored in memory (while words are stored the same in both byte orders).
The disadvantage of this approach is that programs using different byte orders to represent data may not share their (mostly byte-wise organized) data. Therefore, some CPUs use byte swap operations to load and store data in the other endianess, or even include byte swap operations in the load/store path. However, this lengthens the time needed for reads and writes, thus eventually lengthens the critical path. Furthermore more gates are required.