The present invention relates generally to microprocessor or microcontroller architecture, and particularly to an architecture structured to handle unaligned memory references.
In computer architecture over the past decade RISC (Reduced Instruction Set Computer) devices, in which each instruction is ideally performed in a single operational cycle, have become popular. The RISC architecture has advantages over computers having standard architecture and instruction sets in that they were capable of much higher data processing speeds due to their ability to perform frequent operations in shorter periods of time. The RISC devices began with 16-bit instruction sets, and grew to 32-bit instruction set architectures having graphics capabilities. With such thirty-two bit instruction set architectures and more complex applications, there was a requirement for larger memory sizes, e.g., words two, four, or eight bytes in length (i.e., words of 16, 32, or 64 bits each). However, certain peripheral devices and applications generate or accept data of only one or two bytes. One result of this type of data is that it produces an unaligned word reference. Other examples, include some compressed data streams, which may pack data in ways that require access to unaligned data.
To understand what an unaligned word reference is, there needs to be a description of an aligned word reference. If a data object is of size N bytes at address A, then the object is aligned if A mod N=0. Table 1 shows examples of aligned and unaligned accesses of data, were the byte offsets are specified for the low-order three bits of the address (Computer Architecture A Quantitative Approach, John Hennessy and David Patterson, Morgan Kaufmann, Publishers, Inc., Copyright 1990, page 96, herein referred to as “Hennessy”).
TABLE 1Object AddressesAligned by byte offsetsUnaligned at byte Offsetbyte (8-bits)0, 1, 2, 3, 4, 5, 6, 7(never)word (16-bits)0, 2, 4, 61, 3, 5, 7long word (32-bits)0, 41, 2, 3, 5, 6, 7quad-word (64-bits)01, 2, 3, 4, 5, 6, 7
Hence, for a machine capable of handling 4 byte long words, if incoming data is loaded sequentially as 2 bytes of data followed by 2 more bytes of data, the 4 bytes of data cannot be retrieved or stored in a single cycle because it would overlap a word boundary within memory. Thus, some prior art RISC devices either do not accept data in this form, in which case special procedures must be used to ensure that all data is aligned at word boundaries, or programming is required which uses up at least two consecutive instruction cycles. One way to ensure, for example, that all data is aligned in word boundaries would be to add extra bits to data of shorter length usually known as bit stuffing. Whether bit stuffing is used or the programming is altered, the unaligned references degrade the performance of these prior art RISC devices.
To handle the loading and storing of unaligned data words in a system, i.e., a data word which straddles a word boundary in memory (Table 1), prior art machines have also used either an alignment network to load or store bytes in a word or a shifter, which shifts the data only in those cases where alignment is required (Hennessy, ibid., pages 95-97).
FIG. 1 illustrates a prior art alignment network 114. In FIG. 1, memory 100 shows eight consecutive bytes (i.e., a byte equals 8 bits): Y3, Y2, Y1, D4, D3, D2, D1, and X4. Each byte in memory 100 is given an address which ranges from 0 to 7. For example, address 2 in memory 100 has memory contents Y1. The desired data bytes that are used in this and the following examples are D4 at address 3, D3 at address 4, D2 at address 5, and D1 at address 6. Each of these desired data bytes are to be loaded and stored to and from register R 110. Register R 110 has 4 byte positions: P4, P3, P2, and P1. Memory slice 112 of memory 100 shows a desired data byte D4 at address 3. D4 could be loaded from memory slice 112 through the alignment network 114 into register R 115 at positions P4, P3, P2, or P1. In this case D4 is loaded from memory slice 112 at address 3 to P4 in register R 115 through alignment network 114. Similarly, desired data bytes D3, D2, and D1 located in memory 100 addresses 4, 5, and 6 can be loaded through a similar alignment network to positions P3, P2, and P1 in register R 115 to give register R 110. This type of hardware alignment network 114 could be seen in Intel's 8086 and 8088 which came out in the late 1970s. The Intel 8088 was word and byte addressable. The 8088 used a cross-bar switch to swap bytes (Structured Computer Organization, 3rd Edition, Andrew Tanenbaum, Copyright 1990, pages 215-217, pages 230-237). Note that Intel 8088 instruction set had separate instructions for shifting and rotating as these were considered different operations. For example, shifting one bit left would discard the leftmost bit, while rotating left would cycle the leftmost bit around to the rightmost bit.
FIG. 2 illustrates a prior art example of aligning a misaligned data word using shifting operations. An example can be seen in U.S. Pat. No. 4,814,976, RISC Computer With Unaligned Reference Handling And Method For The Same, Hansen, et al., issued Mar. 21, 1989 (herein referred to as “Hansen”). The contents of memory 100 at address 0-3 are loaded into register 120, locations PA4 to PA1. The contents of memory 100 in addresses 4 to 7 are loaded into register B 130 at locations PB4 to PB1. Register A 120 is then shifted left three places, so that D4 is in position PA4. Register B 130 is shifted right one place so that D3 is in location PB3, D2 is in PB2, and D1 is in PB1. Register A 122 is merged 144 with register B 132 to give the desired data located in the proper position in register R 110. The merge 144 was done by either overwriting locations PA3 to PA1 in register A 122 with locations PB3 to PB1 in register B 132 or the appropriate positions in register B 132 were overwritten by the appropriate places in register A 122. In the alternative, the merge 144 may copy the contents of PA4 in register A 122 to position P4 in register R 110 and may copy the contents of PB3, PB2, and PB1 of register B 132 into locations P3, P2, and P1 of register R 110.
Thus, unaligned words in memory were loaded and aligned in the microprocessor and aligned words in the microprocessor were unaligned and stored in memory using either an alignment network 114 of FIG. 1 or a shift left, shift right, and merge 144 of FIG. 2. These techniques were used, for example, on 32-bit words being loaded and stored from a 32-bit computer architecture. There are new problems which arise in a 64 bit architecture which loads and stores 32, 16, and 8 data bits. A 64 bit memory system requires twice as many alignment paths for bytes and half-words as a 32-bit memory system, as well as two 32-bits alignment pads for word accesses. Thus, the alignment network of the prior art becomes a complicated and expensive solution. Also, in FIG. 2, the merge 144 becomes more complicated as it must handle many more don't cares 116 that are shifted into the registers. In addition, such prior art as Hansen, et al. does not disclose how sign extension is done in going from 32 to 64 bit words. FIG. 2 either has two M-bit shifters or a shift left and a shift right or a more complicated M-bit bi-directional shifter. Thus, as computer architectures go from 32 bit to 64 and maybe 128 bits, there needs to be a better method of handling unaligned data, which includes proper sign extension.