The invention relates generally to the field of digital computers and more specifically to functional units for processing predetermined types of instructions. The invention particularly provides a circuit or functional unit for use in connection with execution of an instruction for rearranging bits of a data word in accordance with a mask.
Computers process data in accordance with instructions. One type of instruction which has been proposed is a so-called xe2x80x9csheep and goatsxe2x80x9d instruction which accepts as operands a data word and a mask word and rearranges the bits of the data word in accordance with the mask word. In the rearranged data word, the bits of the data word in bit positions which correspond to bits of the mask which are clear, or have the value xe2x80x9czero,xe2x80x9d are shifted to the xe2x80x9cleftxe2x80x9d end of the rearranged data word with their order being preserved, and the bits of the data word in bit positions which correspond to bits of the mask which are set, or have the value xe2x80x9cone,xe2x80x9d are shifted to the right end of the data word with their order being preserved. For example, if an eight bit data word has the value xe2x80x9cabcdefghxe2x80x9d (where the letters represent binary integers having the value xe2x80x9conexe2x80x9d or xe2x80x9czeroxe2x80x9d), and the mask word corresponds to xe2x80x9c10011011,xe2x80x9d in the rearranged data word generated when the xe2x80x9csheep and goatsxe2x80x9d instruction is executed with these as operands, the bits xe2x80x9cb,xe2x80x9d xe2x80x9cc,xe2x80x9d and xe2x80x9cf,xe2x80x9d all of which are in bit positions for which the mask bits are clear would be shifted to the left, preserving their order xe2x80x9cbcf,xe2x80x9d and the bits xe2x80x9ca,xe2x80x9d xe2x80x9cd,xe2x80x9d xe2x80x9ce,xe2x80x9d xe2x80x9cg,xe2x80x9d and xe2x80x9ch,xe2x80x9d all of which are in bit positions for which the mask bits are set would be shifted to the right, preserving their order xe2x80x9cadegh,xe2x80x9d with the result being the rearranged data word xe2x80x9cbcfadegh.xe2x80x9d Essentially, the xe2x80x9csheep and goatsxe2x80x9d instruction results in a rearrangement of bits of a data word into two groups as defined by bits of a mask word, one group (the xe2x80x9csheepxe2x80x9d) corresponding to those bits for which the bits of the mask word are clear, and the other (the xe2x80x9cgoatsxe2x80x9d) corresponding to those bits for which the bits of the mask word are set, and in addition preserves order in each group.
In a variant of the xe2x80x9csheep and goatsxe2x80x9d instruction, the bits of the rearranged data word in bit positions for which the bits of the mask are either set or clear (but preferably not both) will be set to a predetermined value. Generally, it has been proposed, for example, that the bits of the rearranged data word in bit positions for which the bits of the mask are clear will be set to zero, but the variant may be used with either the xe2x80x9csheepxe2x80x9d or the xe2x80x9cgoats,xe2x80x9d and the predetermined value may be either xe2x80x9conexe2x80x9d or xe2x80x9czero.xe2x80x9d
A xe2x80x9csheep and goatsxe2x80x9d instruction can find utility in connection with, for example, performing various bit permutations, for example, using a mask consisting of alternating set and clear bits will result in a so-called xe2x80x9cunshufflexe2x80x9d permutation of a data word. In addition, the variant can be useful in connection with using a set of originally discontiguous bits to perform a multi-way dispatch, or jump, by making the bits contiguous and using the result to form an index into a jump table.
The invention provides a new and improved circuit or functional unit for use in connection with execution of an instruction for rearranging bits of a data word in accordance with a mask.
In brief summary, the invention provides a system for rearranging data units of a data word in accordance with a mask word, the mask word having a plurality of mask bits each associated with a data unit, each mask bit having one of a set condition and a clear condition. The system includes an array of interconnected swap modules organized in a series of swap stages, each swap module having two inputs and two outputs. Each swap module is configured to receive at each input a data unit and associated mask bits and couple the data units to the respective outputs in relation to the associated mask bit""s condition.