1. Field of the Invention
The present invention relates to the field of processing devices and, more particularly to an adder for manipulating unaligned data.
2. Background of the Related Art
The use of semaphore operations in a multiple processor environment is generally known. Where multiple processing devices attempt to acquire a shared device, such as memory, semaphores are used to control the access. Without this control, a second processor may acquire the shared device while a first processor is performing operations with the shared device. Data corruption can result from such conflicts.
A semaphore is generally utilized to control the accesses to the shared device to prevent data corruption. Typically, a semaphore associated with the shared device is checked by a processor attempting to gain access (Read). If a value of the semaphore indicates that the access is permitted (Conditional Modify), the processor updates the value of the semaphore to indicate to other processors that the shared device is in use (Write). This manipulation of the semaphore must be performed atomically by the processor to guarantee that multiple processors cannot see the device as available at the same time.
Another method of controlling access to devices is performed by using a semaphore as a number for order of access. A processor reads the semaphore (gets its number), adds a number to the value (such as 1), and then writes it back (creates the number for the next processor). Manipulation of the semaphore must be atomic to guarantee that multiple processors do not obtain the same access number. The processor then checks a separate memory location which contains the number of the processor currently using the desired device. When its number comes up, it is able to access the device. Once it has completed its accesses, it updates the memory location to point to the next processor.
A commonly used semaphore instruction is a fetch-and add instruction. A fetch-and-add instruction fetches a semaphore value, places a copy of the fetched semaphore in a CPU (central processing unit) register, modifies the semaphore value by adding a number to it and then writes the resulting sum back to the semaphore location as an atomic Read-Modify-Write operation.
FIG. 1 illustrates a typical prior art procedure for performing a fetch-and-add operation when the semaphore is established in some memory device. The semaphore value stored in memory may not correspond to boundaries which are processor (CPU) aligned. Thus, the semaphore value may be memory aligned, but not CPU aligned. When the value is not CPU aligned, the retrieved data will need to be adjusted by rotating or shifting the data until it is CPU aligned (as shown in block 10) and stored in a register 11. The data to be added to the semaphore value is CPU aligned, since this data is typically defined by an immediate operand of the fetch-and-add instruction. An adder 12 performs the addition of the two CPU aligned data, resulting in a sum which is also CPU aligned.
Subsequently, the sum (representing the modified semaphore value) will need to be returned to the semaphore location. However, in order to return the modified value back to the original memory location, the data may need to be unaligned from the CPU alignment. Accordingly, the modified data will need to be adjusted (rotated or shifted, as shown in block 13) to realign the modified data to the original memory alignment for write back to the memory location.
As noted in FIG. 1, the typical prior art implementation optimized to minimize area would utilize existing CPU resources at the expense of performance by causing a long serial path to execute the semaphore. The data would be initially fetched, rotated, operated upon, the result rotated again, and finally written back to memory. This serial execution model results in lowered system performance because of lost bandwidth through the CPU resources, which the semaphore is occupying/reserving for use and the inability of other processors to access the semaphore (amount of time taken by a processor to complete the Read-Modify-Write cycle before a second processor can read the semaphore).
Furthermore, one optimized alternative prior art implementation maintains performance by an increase in the area cost. In this case, additional dedicated logic would be created on the CPU to replace the use of existing CPU resources. Thus, the semaphore would still be executed by the same fetch, rotate, operate, result rotate, write-back sequence, but lost bandwidth due to occupied CPU resources would be removed. However, lost system performance caused by the inability of other processors to access the semaphore (amount of time taken by a processor to complete the Read-Modify-Write cycle before a second processor can read the semaphore) would still exist.
The present invention describes an adder for receiving a first data from a storage location in which the first data is stored in byte format, but in which the first data is not stored fully aligned within processor data boundaries for data retrieval. The adder also receives a second data having its byte alignment adjusted to correspond to a byte alignment of the first data as received by the adder and adds corresponding bytes of the first data and the second data. A carry control circuit coupled to the adder determines which bytes are selected for transfer of a carry from one byte to the next for calculating a sum of the two data.