A typical processing architecture includes a processing element (e.g., a processor core) adapted to execute software instructions that may result in the transfer of data between the processor core and a data register of a peripheral module. In some cases, an architecture may include a higher-bit-width processor core and a peripheral module with a lower-bit-width data register. For example, a particular processing architecture may be designed to include a 32-bit processor core that can execute software for writing 32-bit data values to an 8-bit wide write register of a peripheral module. In such a case, the software may decompose a 32-bit data value into four 8-bit bytes, and may perform four consecutive write data transfers of one byte each to the peripheral module address. Similarly, in order to read a 32-bit data value from a peripheral module register with an 8-bit wide read register, the software may perform four consecutive read data transfers of one byte each from the peripheral module address, and may concatenate the four bytes to produce the 32-bit data value.
Each access instruction consumes processor core cycles, and therefore performing multiple accesses in order to transfer data between a higher-bit-width processor core and a peripheral module with a lower-bit-width data register is inherently inefficient. However, certain design parameters (e.g., backward compatibility for software executed on new and older processor cores, and the relatively small physical size of peripheral modules with lower-bit-width data registers, for example) continue to compel architecture developers to incorporate slave peripheral modules with lower-bit-width data registers into processing architectures with higher-bit-width processor cores. With a desire to increase processing architecture performance, what are needed are methods and apparatus adapted to enable more efficient data transfers between higher-bit-width processor cores and lower-bit-width data registers of peripheral modules, while providing for backward compatibility for software that may be executed on new and older processor cores.