1. Field of the Invention
The present invention relates to the technical field of data processing and, more particularly, to a processor device capable of cross-boundary alignment of plural register data and the method thereof.
2. Description of Related Art
While a processor performs data processing, data alignment may affect the performances of many key operations, such as the operations of string, array and the like. As shown in FIG. 1, data to be processed, such as ‘ABCDEFGHIJKL’, normally exceeds the store boundary. As such, before a processor performs any string or array operation on the data, the data must be restored to the aligned format by executing many additional operations firstly.
Upon this problem, a typical scheme is that after the data is loaded to the processor, various instructions in the processor are applied for obtaining required data. As shown in FIG. 2, partial data ‘ZABC’ at address 100h is loaded to register R16 to shift left by eight bits and remove the letter ‘Z’, then partial data ‘DEFG’ at address 104h is loaded to register R17 to shift right by 24 bits and remove letters ‘EFG’, and finally an OR operation is applied to registers R16 and R17 to obtain a result to be stored in register R16. At this point, the content of register R16 is a required data ‘ABCD’. Accordingly, as the cited steps are repeated, partial data ‘EFGH’ and IJKL are loaded to registers R17 and R18.
As cited, if a required length of unaligned data to be loaded is n words (each having 32 bits), the typical scheme requires 5n instructions to describe load operation and at least 5n instruction cycles to complete the load operation, which needs large memory space for storing required program codes and also increase processor load so as to result in poor performance.
Upon this problem, U.S. Pat. No. 4,814,976 granted to Hansen, et al. for a “RISC computer with unaligned reference handling and method for the same” performs the alignment as loading unaligned data and reads a data exceeding the boundary completely by two times. As shown in FIG. 3, data ‘ABC’ at addresses 101h to 103h is loaded to bytes 0, 1 and 2 of register r16. In this case, byte 3 of register 16 is X (don't care). Next, data ‘D’ at address 104h is loaded to byte 3 of register R16. At this point, data ‘ABCD’ to be processed is in register R16. Accordingly, as the cited steps are repeated, data ‘EFGH’ and ‘IJKL’ are loaded to registers R17 and R18.
As cited, if a required length of unaligned data to be loaded is n words, it needs 2n instructions to describe load operation and at least 2n instruction cycles to complete the load operation. Since read and write are repeated at the same memory position and register, the processor pipeline stall can be increased and the bus bandwidth is wasted. Especially to some systems without cache, delay can be obvious.
Therefore, it is desirable to provide an improved processor device and method to mitigate and/or obviate the aforementioned problems.