A single-instruction multiple-data stream (English: single instruction multiple data, acronym: SIMD) technology is a technology of using one controller to control multiple processors and performing a same operation on each element in a group of data (also referred to as “vector data”) to implement spatial parallelism. SIMD units that support the SIMD technology are widely integrated in some existing high-performance vector processors. In multimedia data, graphic data, and digital signal processing application, to maximally enhance parallel processing efficiency of data, an SIMD unit needs to have a function of permutating vector data.
In the prior art, the SIMD unit generally permutates the vector data by using a crossbar (English: crossbar). For example, FIG. 1 shows a schematic structural diagram of an 8×8 all-route crossbar. In FIG. 1, each line of element at an output end may come from any line of element at an input end. Therefore, input elements can be permutated in any form by using the all-route crossbar together with control logic of a controller. Because crossbar implementation logic is extremely complex and implementation of one crossbar generally requires plenty of wires and a wide area, a bit width (that is, a quantity of elements that can be permutated in parallel in a crossbar) of vector data, which can be supported by a crossbar used by a permutation unit in an existing vector processor, is only 32×8 bits=256 bits. In addition, to reduce the wires of the crossbar and the area occupied by the crossbar, a customized transistor-level circuit is generally used to implement the crossbar so as to obtain a relatively normalized topology structure.
However, as the bit width of the vector data to be permutated continuously increases, the implementation of the crossbar becomes more complex. That is, more wires and a wider area may be required. Consequently, the crossbar can hardly be implemented even if the customized transistor-level circuit is used. Therefore, the crossbar is hardly applicable in a scenario of permutating vector data that has a relatively great bit width.