In the mobile communication market, there exists the situation that the 2nd-Generation (2G), 3rd-Generation (3G) and 4th-Generation (4G) coexist, and communication technology protocols are also continuously developed. When faced with so many communication standards and rapidly updated protocol versions, it is a good development direction to adopt a Software Defined Radio (SDR) technology to implement baseband signal processing. The SDR technology adopts a Digital Signal Processor (DSP) soft baseband solution, which, compared with a conventional Application Specific Integrated Circuit (ASIC) implementation manner, has higher flexibility and product launching speed. 4G Long Term Evolution (LTE) and subsequent Long Term Evolution-Advanced (LTE-A) technologies all take Orthogonal Frequency Division Multiplexing (OFDM) and Multiple Input Multiple Output (MIMO) as main technical characteristics, and these technical characteristics determine that a processed baseband signal has the characteristic of more matrix operations. Therefore, it is appropriate to adopt a vector DSP processor with a vector operation function to implement LTE and LTE-A baseband signal processing. On such a basis, how to improve performance of a vector processor becomes a key for determining performance of a soft baseband chip.
In the past, performance of a processor is mainly improved by increasing a main frequency of the processor. However, along with increase of a processor frequency, this method is difficult to continue because frequency increase may bring extremely high power consumption and heat cost but may not achieve an obvious processor performance improvement. At present, processors are developed towards a multi-core direction. Multiple processor cores are integrated in a processor, and the multiple processor cores work in parallel to remarkably improve performance of the processor without increasing a frequency of the processor. Widespread use of multi-core desktop processors of the Intel company and multi-core mobile processors of the ARM company shows that a multi-core technology is an effective method for improving performance of a processor. The most common paralleling manner for a multi-core processor is task-level paralleling. As illustrated in FIG. 1, a single-core processor may only execute each task in series, while a multi-core processor may allocate tasks without any dependency to different cores to apparently improve performance. Such a paralleling manner is inapplicable to a task with a dependency on its previous task, that is, input of the task is output of its previous task. For tasks with a dependency, a pipeline paralleling manner may be adopted, that is, different tasks are allocated to different cores for processing separately, and pipeline operations are performed on the tasks.
Specifically to a vector processor, for a certain task, vector operations are not all operations because some parameter calculations are required before the vector operations. These parameter calculations belong to scalar operations, so that a certain task may be divided into two parts, i.e., the scalar operations and the vector operations. If pipeline paralleling may be implemented for the scalar operations and the vector operations, performance of the vector processor may be remarkably improved. At present, multi-core processors mainly adopt a shared memory manner to implement inter-core communication, and if an existing multi-core technology is used to implement paralleling of the scalar operations and the vector operations, parameters are stored in a shared memory, and for reasons of the access speed of the memory and the time overhead for multi-core synchronization, task switching takes a certain time, thereby offsetting part of benefits created by pipeline paralleling.