At present, there are many modes of communication protocols such as Global System for Mobile communication (GSM), Universal Mobile Telecommunications System (UMTS), Wireless Local Area Network (WLAN), Time Division-Synchronization Code Division Multiple Access (TD-SCDMA) and Long Term Evolution (LTE) and so on, which utilize an original design method of Application-Specific Integrated Circuit (ASIC) to realize the multimode-compatible mobile terminal chips, and this surely faces defects such as larger area and lack of flexibility, etc. The current Software-Defined Radio (for short SDR) technology is a promising technology for solving the current design problems of multimode-compatible mobile terminal chips. A programmable vector processor is a core architecture of the SDR technology. In order to support the processing of multimode baseband, a vector processor must be able to perform several gigabytes per second of operations, and as a mobile terminal needs to meet several hundreds of mW of power dissipation.
An operation unit therein is a core operation part of the vector processor, the performance of which determines the performance of the whole processor, and moreover the power dissipation of which accounts for nearly half of power dissipation of the processor, and therefore the design and implementation of this part is very critical.
There are many structures for the vector ALU, which may implement the general multiplication, addition, multiplication addition, and may also implement the complex multiplication, addition, multiplication addition and butterfly operations in a specific Fast-Fourier Transform, and all these depend on the structure of the vector ALU. However, the existing general schemes may only perform the butterfly operations in a base 2 Fast-Fourier Transform, and may not finish the butterfly operations in a base 3 Fast-Fourier Transform directly. Or the butterfly operations in a base 3 Fast-Fourier Transform may be finished by a combination of sets of complex addition and complex accumulation instructions, but this adds the number of instructions, reduces computational efficiency, and at the same time increases the difficulty of programming, thereby causing lower programming efficiency. By analyzing the above existing technologies, a main reason is in that when a traditional vector ALU performs the complex butterfly operations, it may not implement negation operation on results of multiplication flexibly, such that only fixed multiplication addition and multiplicative decrease may be performed on each butterfly branch, and as a result only the butterfly operations in a base 2 Fast-Fourier Transform may be completed.