A bit matrix multiplication unit (BMM) allows a reorganization of data in a single instruction cycle. Many types of reorganization are possible, up to a reorganization of the individual bits of the processed data. The article [Yedidya Hilewitz et al. “Bit Matrix Multiplication in Commodity Processors”, IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2008] describes applications of BMM units.
In practice, a BMM operator is used with one of its operands at a constant value selected to define a particular operation on the contents of the other operand. Constants chosen for the first operand may define permutations of the rows of the matrix assigned to the second operand, i.e. permutations of words represented by the rows. Constants chosen for the second operand may define permutations of the columns of the matrix assigned to the first operand, i.e. permutations of bits according to a same pattern applied to all the rows of the matrix.
However, a BMM unit finds limits in terms of efficiency when the reorganizations mix data from multiple matrices.