For background literature we will review some patents and IBM Technical Disclosure Bulletins, and so that the reader can more quickly follow our discussion and understand how our permutation switch differs. Initially, it will be understood that our invention provides a multifunction permutation switch which utilizes an existing ESA/370 CISC processor with a known predefined instruction set and as such is applicable to the Systems 370-390 in widespread use today by customers of International Business Machines Corporation. Other manufacturer's mainframes also use the IBM known predefined instruction set. In the past separate hardware was required to provide the functions we will provide when we describe our multifunction permutation switch which truly supports the "rotate/gather" and "spread" functions required by ICM, CLM and STCM. In addition, in our device, we not only incorporate the "gather" function into the RMU, but partition the RMU into a byte permutation switch and a bit shifter so that data can be aligned and gathered and sent to the data cache in the same cycle.
Turning now to the patents in this ad, we note that U.S. Pat. No. 4,569,016 to Hao et al illustrates a rotation/merge unit for a RISC processor in which the developers were free to architect instructions specifically for the rotate/merge unit. This is unlike our multifunction permutation switch which utilizes an existing ESA/370 CISC processor with a known predefined instruction set. Hao et al's RMU possessed no concept of partitioning the RMU into a bytewise permutation switch followed by a bitwise shift. As a result their store alignment requires rotation by the RMU, latching of the result into a staging register, followed by sending the data to the data cache. It would be desirable to send data to the data cache in the same cycle that alignment is performed, and this is not suggested by Hao et al. By dividing the RMU into a bytewise permutation switch, we are able to send data to the data cache in the same cycle that we perform the alignment. The Hao et al's RMU possesses a merge function controlled by a mask produced in parallel with the rotation. Data to be merged is contiguous and aligned by the rotator to the position into which it is to be merged. There is no concept of the "gather" or "spread" functions that are supported in our permutation switch. In addition, we not only "gather" data, but rotate the "gathered" data in accordance with aligning the data in a doubleword of storage. Hao et al can rotate data to be inserted into a data word, but cannot "rotate and gather" the data. As a result, they require only a rotator; whereas, we are disclosing a true permutation switch that supports the "rotate/gather" and "spread" functions required by ICM, CLM, and STCM. Hao et al produce a mask for controlling the insertion of rotated data into data by an insert unit. This is accomplished by decoding two indices to produce two masks that are merged into the ultimate mask for controlling an insertion unit for zeroing data for shift operations and controlling insertion of rotated data into selected data. We on the other hand use a mask to specify how data is to be "gathered" or "spread" before being inserted into a data word. Our mask is decoded, instead of generated, to produce permutation switch controls. In U.S. Pat. No. 4,569,016 Hao et al provide the shift amount either via an immediate field of the instruction or by selected bits of a GPR. There is no logic used to determine the rotation amount that is required to support alignment of storage data within a doubleword of storage. We, on the other hand, dynamically determine in hardware the rotation amount for three different positions of the storage data within the input registers provided to the permutation switch.
A Japanese abstract, JP-55-72267, appears to differ from our permutation switch. JP-55-72267 considers a device for speeding the stores for STCM. A device to handle ICM and CLM is not pad of the device. We disclose a device suitable for supporting mask operations for all of these three ESA/370 mask instructions. The device of JP-55-72267 includes a shifter and a mark generator. The shifter aligns the data for storage while the mask is used to create a write mark to be sent to storage to indicate the bytes within the data being sent to memory that are to be written. The data and the mark are sent in pairs. Thus, the device not only requires the data, but also this mark to be sent to memory. The shifter does not execute the "rotate/gather" function for the STCM instruction that our permutation switch executes. Instead, it executes only the rotate. The memory is left to perform the "gather". The device of JP-55-72267 provides no partitioning of the shifter into a byte portion and a bit portion is presented. However, it is clear that the "gather" function of the STCM instruction is not incorporated into the shifter in any fashion. We not only incorporate the "gather" function into the RMU, but partition the RMU into a byte permutation switch and a bit shifter so that data can be aligned and gathered (if required) and sent to the data cache in the identical cycle. "Spreading" of data as required by ICM and CLM is not addressed by the device of JP-55-72267.
U.S. Pat. No. 4,189,772 to Liptay relates to a device for bypassing, around a cache, multiple sublines from a cache block during a storage access. In the device, there is no concept of executing shift instructions, "rotate/gather" operations, "spread" operations or dynamically determining the rotation amount for a storage alignment within a doubleword that supports multiple positioning of operands fed to the device as are supported by our permutation switch. In fact, the functions required by the bypass unit allow a barrel shifter to be employed instead of a permutation switch. In addition, partitioning of a shifter into byte and bit units with store aligned data taken from the byte unit and fed to the cache so that alignment and storing can be achieved in one cycle are not included in U.S. Pat. No. 4,189,772.
Pogue et al in U.S. Pat. No. 4,920,483, describe a memory system for accessing (fetch or store) n contiguous bits whether or not they are aligned at a n bit boundary. As a result, they employ a barrel shifter (rotator) for aligning the bits. Their invention, however, does not support "rotate/gather", "spread" and merge operations that require our invention to use a permutation switch in the byte unit. In addition, the concept of partitioning the RMU into a byte unit and a bit unit with aligned store taken from the byte unit to allow one cycle alignment and storing is not included. Finally, their barrel shifter does not support arithmetic or logical shifts as we do.
U.S. Pat. No. 4,135,242 to Ward et al describes a method and processor scheme to rapidly and cheaply interpret multiple virtual instruction sets, i.e. sets having varying formats that target varying width operands, data paths, and functional units whose widths do not match those implemented in hardware. As pad of the architecture scheme, they disclose a bit addressable scratch pad memory followed by an aligner to align the addressed data in the scratch pad to the inputs of the functional units. This aligner uses the scratch pad address and operand length to determine the rotation amount to produce controls for the rotator. The aligner is a rotator, not a permutation switch, since it is not required to execute "gather/rotate", "spread" or merge operations that we perform. In addition, it does not support logical and arithmetic shifts. Finally, it does not possess the concept of partitioning into byte and bit units discussed earlier to speed up store alignment and storing.
Yamaoka et al in U.S. Pat. No. 4,916,606 describe a speedup mechanism for executing sequential SS instructions in which an operand of the second instruction is modified by the first instruction. In this mechanism, an aligner was used to align an operand from storage with an ALU (for executing decimal arithmetic) and an aligner was used on the output from the ALU to align the result within a storage line. The aligners do not support the "rotate/gather", "spread" or merge operations that our invention executes so it can use a barrel shifter rather than the permutation switch which we disclose. In addition, the aligner does not implement logical or arithmetic shifts as our invention does. Therefore, there is no concept of partitioning a shifter into a byte unit and a bit unit that has already been discussed.
Peng, et al in U.S. Pat. No. 4,864,527 disclose a shifter to be used in floating point execution to normalize final results and to scale numbers with differing exponents to a common exponent before executing floating point operations. This subject is totally different from our invention. There is no concept of the "gather/rotate", "spread" or merge operations that we implement; therefore, the shifter would be executed as a shifter instead of a bytewise permutation switch. Dynamic generation of a rotation amount for aligning store data within a doubleword is not considered.
U.S. Pat. No. 4,785,393 to Chu et al describes a processor that included a mask/shifter generator and a 64 bit shifter concatenated with an ALU. This invention allowed ALU execution on selected contiguous bytes within a word and merging of the results with either the source or destination operand. No concept of "gather/rotate" or "spread" operations was included; therefore, a permutation switch was not required as is used by our invention. Furthermore, there is no concept of aligning data within a double word of storage or of dynamically generating a rotation amount for this alignment based upon the least significant bits of the effective address and store operand length. Finally, the structure being presented is dubious for use in a high performance processor since it consists of a 64 bit shifter concatenated with a 32 bit ALU.
Boothroyd et al in U.S. Pat. No. 4,598,365 describe a decimal/character functional unit for use in a processor. In their invention, two "aligners" are used to pack operand formats into data upon which an ALU can work. As a result, this "aligner" is in actuality a pack unit and is functionally quite different from our invention.
International Business Machines Corporation publishes a technical disclosure bulletin of inventions known as the "TDB". There Goldberg et al in TDB 05-88 published a mechanism for using Booth encoding of the shift amount to accomplish a left or right logical shift. The masking described in the TDB is to disable at the output those positions into which a zero should be shifted in. In our invention, we using a masking technique to also zero these positions. However, the TDB considers only using the shifter to execute shift operations. There is no concept of supplying and decoding a mask from which "rotate/gather", "spread" and merge operations are executed. There is also no concept of using the shifter to align storage operands within a doubleword of storage; therefore, there is no concept of using the address and storage operand length to control the shifter (let alone supporting multiple positioning of the storage operand data being supplied to the shifter). Lastly, there is no concept of partitioning the shifter into a byte permutation switch and a bit shifter as we do to allow store alignment and storing in one cycle. In fact, the TDB article does not require a permutation switch.
Also in the IBM TDBs can be found the publication of Finney et al in TDB 07-86 which considers using a single barrel shifter for normalization, operand alignment (presumably for differing exponents) and for packing and unpacking floating point operands. They have no concept of "rotate/gather", "spread" or merging operations executing in their invention. In addition, they do not propose using the shifter for storage operand alignment with the generation of dynamic rotate amount from the address and operand length to control their shifter. Because of the above, they do not present a concept of partitioning the shifter into a byte permutation switch and bit shifter as we do as has already been discussed.
In the IBM TDBs, Brown et al TDB 08-88 also considered a shifter for support of FPU execution. It has the identical considerations and differences as just discussed.
Also in the IBM TDBs are other publications. Funk et al TDB 01-89 published a fixed point shifter modified to accelerate floating point instructions as well. We do not do this. Our FPU (floating point unit) is on a separate chip and has its own dedicated shifter. In IBM TDB 02-78 Liptay et al published a specialized shifter to support the execution of ICM and CLM when mask bits are consecutive. As such, they to not include the "rotate/gather" operation in support of the STCM instruction. They also do not execute the "spread" operation of the ICM and CLM though they do allow for some merging. They do not utilize their hardware for other executions such as shifts and alignment of storage data within a doubleword. As such, they do not partition their special shifter into a byte permutation switch and bit shifter as we do.
Farrell et al in IBM TDB 06-84 published an algorithm for determining the length of transfers that are required to reach an ultimate address boundary. As such, they are determining how to group storage access into smaller packet fetch commands. We use an address and storage length to determine a rotation amount to align store data within a doubleword boundary or multiple word boundary, and need not make a grouping determination.
Angiulli et al in IBM TDB 09-81 published a shifter for supporting shift and add functions to implement decimal multiplies. It also supports zeroing out unrequired bit positions. However, it does none of the "rotate/gather", "spread" or merge operations that we do. Neither is the shifter partitioned into byte and bit sections as ours is.
Schaughency in IBM TDB 02-81 published a partial parity prediction scheme for a rotate/merge unit. The TDB discusses how to determine which bits are lost or picked up to determine parity. The RMU is not described; only its requirements are presented. Therefore, since it publishes a partial parity predict scheme, it has nothing to do with our invention.
Holtz et al in IBM TDB 02-78 published a mechanism for marking which bytes are to be stored within a doubleword. In their TDB article, an aligner within a double word is assumed. It, however, does not support "rotate/gather", "spread", merge or arithmetic or logical shift operations that we support in one common RMU.