1. Field of the Invention
The present invention relates generally to programmable digital processors and, more particularly, to data storing instructions used in processor systems.
2. Description of the Background Art
Data transfers at arbitrary alignments, or of arbitrary size, are used in the performance of certain software functions. In the area of data communications, such transfers may be encountered, for example, in dealing with sequences of fixed-size cells such as 53-byte ATM cells, with blocks of data forming Reed-Solomon code words that can be of any size between 3 and 255 bytes, or with streams of variable-sized packets, where individual packets may range in size from a few bytes up to a thousand bytes or more.
In a typical programmable digital processor system that includes a memory, the smallest individually accessible unit of data storage in the memory is of a first size (e.g. a byte holding 8 bits), while the primary access mechanism for the memory is able to transfer data into or out of that memory in a single access unit of larger second size (e.g. as a word of 32 bits, or a long-word of 64 bits, etc.). In many such systems, storing data units larger than a byte into memory by the processor can only be accomplished at all, or may only be fully efficient, at certain alignments, dependent on the processor's data addressing scheme.
For example, consider a typical system that includes both a processor and a memory to which it is interfaced. For illustrative purposes (though the principles apply independently of the specific details), the processor chosen is a 64-bit machine. That is, the basic size of data value it manipulates, and holds within an individual register of the processor, is a 64-bit unit, equivalent to eight 8-bit bytes, or one full storage unit of the memory. The memory is constructed as an array of 64-bit wide (long-word) storage units, where each 64-bit unit can be written to in a single store operation. The memory is also accessible to store an individual byte, or a 16-bit half-word, or a 32-bit word, in a single store operation.
In order to allow most common access patterns to be used without undue complication in the logic, the interface between processor and memory in a system of this type typically provides access only using “natural alignment.” That is, a byte can be freely stored at any of the eight different byte positions in a given 8-byte long-word storage unit, at offsets {0, 1, 2, . . . 7}, while a half-word can be stored only at one of the four even offsets {0, 2, 4, 6} (i.e. half-word aligned), and a word can be written only at either of the two word-aligned offsets {0, 4}, word-aligned.
As an additional consideration, it is typical for a store instruction that stores out a data unit smaller than the full size of the processor's general registers—say, one that stores a half-word (16 bits, two 8-bit bytes) from a 64-bit register—to support the storing of only the least-significant such sub-unit in the source register. This restriction is often imposed in order to either reduce complexity in the processor's memory interface circuitry, or to prevent the number of distinct instructions from becoming overly large, or for both reasons. Considering the latter aspect for the example system, a total of 15 distinct store instructions would be needed to allow all naturally aligned sub-units within a register to be directly stored for all 4 sizes (byte, half-word, word and long-word, respectively at 8, 4, 2 and 1 possible naturally aligned locations in the source register). Even then, that does not allow non-naturally-aligned units within the source register to be stored for the half-word and word-sized cases. In contrast, the restricted case needs only four distinct instructions to support all four sizes of store operation. A programmer can implement any of the other cases by combining instructions (e.g. by using a right-shift instruction followed by a store instruction).
In this type of system, as previously indicated, it is commonly required to be able to store a single byte at any byte location within the memory. Such a store (implemented, for example, by a “store byte” instruction) implicitly refers to whichever full-sized (64-bit) storage location in the memory includes the particular byte location, since the basic access unit is 64 bits in width. However, the store operation must be implemented in such a way as to avoid storing any data to other byte locations in the same long-word storage unit. For that reason, many memory systems are implemented using “byte-enable” signals. A “byte-enable” signal is defined for each byte lane over the full width of the memory (e.g., eight signals for an 8-byte (long-word) wide memory). These signals select for each byte lane whether or not the byte at that part of the selected long-word memory storage location will be overwritten with new data supplied via an access path in the corresponding lane when a store operation is performed. Typically, byte-enable signals are generated within memory access logic of the processor in accordance with the specific details of each store operation performed. One byte-enable signal is transmitted over the access path for each data byte, during the operation of a store instruction of the processor as it writes data into the memory.
In order to accomplish data storage at an arbitrary alignment using such a system (e.g. to store a 4-byte word starting at offset 3), or to store an arbitrary sized unit of up to the basic storage unit size (e.g. a 5-byte section of data to be stored at offsets 2.6), a programmer must develop an algorithm using the available memory store instructions of the processor. One way to do this involves reading data currently stored at a target memory location and merging it with source data to fill in any gaps caused by the arbitrary alignment. The merged data, of the full storage unit size, would then be written to the target memory location. However, such an algorithm may be relatively slow because there may be a delay reading the data from the target location. The algorithm may also be complex and involve several instructions, especially if the size and relative alignment of the storage operation are not fixed in advance. Such an algorithm may also imply certain constraints on the usage of the memory locations in the vicinity of the target memory location to be stored to.
In an alternative approach, the data to be stored at the arbitrary alignment could be broken up into multiple smaller parts, each individually sized and re-aligned to meet the constraints of the interface, and stored separately. However, this type of algorithm is also likely to be relatively complex and slow when compared to the case of a naturally aligned storage operation. A software implementation of such an algorithm is therefore likely to be less convenient, and may be undesirable to use as a general mechanism, because of its higher cost.
Therefore, what is desired is a system and method that significantly reduces the cost and complexity of performing data storage operations at arbitrary alignments, or of arbitrary sizes.