In attempting to improve performance in data transfers in complex machines which can have both Quad Word (QW) and Double Word (DW) stores, in the past the apparent solution for using a QW store to cache instead of two double word stores in order to double the data stored per cycle was to increase the data-path size within the execution unit (E-unit) and also the data-path from the E-unit to the BCE (Buffer Control Element which provides a level 1 cache). However, increasing the data-path is expensive in the use of chip area and complexity and results in a much bigger and more complex CPU. We thought it would be desirable to achieve a QW store with a single DW wide data-path and determined how this could be achieved.