Recently, semiconductor fabrication technology has made it possible to achieve a semiconductor chip with complex functions, which includes various circuits constituting an electronic system on the single chip to considerably reduce the system size. Namely, the various independent devices constituting the control circuits of a computer system conventionally have been mounted on the main board, but the recently developed technology makes it possible to obtain the various control devices as a single chip, so that the size of the whole circuits may be reduced, and thus the size of the electronic system. This in addition reduces the production cost together with the operating power consumption. Moreover, such development of the semiconductor fabrication technology has affected the micro controller unit (MCU) and digital signal processor (DSP).
It has been generally known in this art that a DSP is suitable for a signal processing program to execute repeated computations, and a MCU for a control program. However, there frequently occurs a case in which a single information-handling task is composed of control parts and computational parts (especially repeated computations), and to use both a DSP and a MCU complicates handling of such a task. For example, one problem is to employ two different instructions, complicating the mutual interface such as data exchange. Another problem is that the development environment is complicated due to the two instruction streams to make the program debugging difficult. A third problem is that the synchronization between DSP and MCU is not easy. And another problem is an increase in the circuit size. In order to cope with these problems, the orthogonal instruction of the RISC (Reduced Instruction Set Computer) type may be used to achieve good compilation.
DSPs and MCUs have different structural characteristics. A DSP is designed to suit the algorithm to process a voice signal, an audio signal, etc. at high speed, thereby having a very irregular hardware structure and non-orthogonal instruction set. This makes it very difficult to develop a high performance compiler to compile programs for DSP. For most of the cases, the application programs for DSP are usually developed by using an assembler.
Application of DSP suffers poor locality of data owing to continuous input and output of data, compared to application of MCU, so that it is difficult for DSP to have the memory hierarchy consisting of register, cache, virtual memory, etc. The architecture of a DSP is generally based on the memory rather than the register, employing the so-called Harvard Architecture that the program memory and data memory are separated from each other to have respective program bus and data bus in order to facilitate data access.
A DSP designed for implementing a filter frequently performs multiplying operations using two operands, and therefore employs the modified Harvard architecture using the program memory bus as the data memory bus, or two data memory buses. Such a DSP employs the general purpose registers less than an MCU, but employs special purpose registers to facilitate special data process.
If there occurs an overflow in an MCU, a trap is usually generated. However, a DSP is provided with a guard bit to prevent the overflow or to become saturated without delay when there occurs an overflow.
A DSP has no cache, or otherwise has a cache structured in a manner different from an MCU. The reason is that the execution time varies with the cache hit rate. For the same reason, a page miss of the virtual memory hardly allows the memory abort in DSP. Further, a DSP is used in the application of digital signal processing, thus having many special instructions suitable therefor, while an MCU has versatility.
A 16 bit DSP has instructions of various lengths such as the basic 16 bit instruction, 32 bit instruction, and 36 to 40 bit instruction containing the guard bit, and particularly an ALU (Arithmetic Logic Unit) instruction for high speed operation to simultaneously execute both an ALU operation and multiplication, and instructions for barrel shift.
A DSP is structured to fetch two operands in a single cycle for data access, simultaneously executing both ALU operation and loading/storing data into the memory. Its hardware has a repeated loop function to support the repeated operation together with modulo addressing function. Thus, a DSP instruction may perform multiple operations in a single cycle, achieving high speed digital signal processing.
On the other hand, a 32 MCU is based on 32 bit data, performing data access operations in bytes. It employs the orthogonal instruction set using many general purpose registers to support the compiler. For example, it supports the branch instruction, relative addressing, index addressing, bit manipulation, etc. Further, it strongly supports interrupts, traps, etc. exceptionally occurring.
As described above, DSPs and MCUs have respectively inherent characteristics to support corresponding applications. These are especially applied in the form of a single chip embedded with core, memory and peripheral equipment. Electronic systems such as cellular phones, video cameras, multi media systems, etc. are equipped with both processors. A DSP is used to process digital signals such as voice signals in the cellular phone and audio and video signals in the video camera. Thus, both processors serve respective functions. Recently, 16 bit fixed point DSP and 32 bit MCU have become widely used in various electronic systems.
The complicated multi functional electronic systems developed recently should have to do with many data, especially processing data in real time, so that the general purpose MCU for control needs many repeated computations, and the DSP for processing signals also needs the control function. Namely, the situation requiring processing of greatly increased data in a short time and performing of corresponding control functions requires that a DSP have such control function as in an MCU, and an MCU have such high speed data operational function as in a DSP.
As the DSP application program increases its size to include a control program together with the program of data signal processing, it becomes difficult to develop it with an assembler. Moreover, it is important for the application program to be properly provided for the rapidly developing related technology. In this respect, a high level program language may be a proper means to facilitate the development of the application program for DSP, and the architecture of DSP must necessarily be changed so as to reduce the size of the codes generated by compiling the application program.
Meanwhile, the fact that the MCU needs the instructions required for performing such data processing function as in a DSP, and the DSP the instructions required for such control function as in an MCU blurs the boundary dividing a DSP and an MCU. Recently, studies have been made to integrate the two processors in a single chip, resulting in a unified processor integrating both MCU and DSP. Such unified processors are generally divided as follows:
One approach is to make a processor provided with both MCU instructions and DSP instructions. This may again be achieved by several ways. The first way is to add a coprocessor to an MCU so that the coprocessor may perform DSP instructions. The second is to design the MCU instruction to include the DSP instruction. The third is to design the DSP instruction to have enough orthogonal characteristics to partake with the MCU instructions. Though these ways provide a single chip processor having both DSP and MCU functions, the problem is that the processor is achieved by using two separate instruction sets. When the MCU and DSP instructions are not properly unified through the coprocessor, it is hard to determine whether the MCU or DSP instruction may be used when preparing the codes that belong to the intermediate zone between MCU and DSP. In addition, the existence of both MCU and DSP instructions increases the number of instructions, so that it becomes difficult for the compiler to effectively compile all of the instructions. Especially, it is important to use an instruction with a small bit-width for the program size stored into the embedded memory of the processor. However, it is hard to minimize the code size with the small bit-width instruction because of many kinds of the instructions contained in both DSP and MCU instruction sets. Moreover, the separate DSP and MCU instructions make it difficult to effectively use the resources of the processor. Namely, both the resources for DSP and MCU are not commonly occupied and wasted.
Secondly, a processor may be made using the superscalar method to perform multiple instructions in a single cycle, or the VLIW (Very Long Instruction Word) method. In the ordinary MCU instruction, there are many cases where the program code size is reduced to improve the performance. But, in case of a DSP, it is more effective that the instructions contained in the repeated loop are optimized instead of reducing the code size to improve the performance, for the DSP program usually contains many repeated loops, which are only parts of the overall code of the program but take a considerable part of the execution time. In a DSP, a considerable part of the instructions is allocated for the instructions frequently used in such loop as MAC (multiplication and accumulation). The MAC instruction is designed to execute an addition, a multiplication and two data loadings. Such instructions to improve the performance of DSP may correspond with a combination of several simple MCU instructions in many cases. In connection with an MCU, the instruction constituting a large part of the code need be more effectively designed to simply reduce the code size, but a DSP may be improved in performance by increasing the instructions to execute several operations in a single cycle. Considering both of these cases, the instruction set may be designed to execute several instructions in a single cycle with such simple instructions as a RISC, which both reduces the program code size of an MCU and improves the performance of a DSP. This leads to another kind of unified processor achieved by applying the superscalar or VLIW method.
The superscalar method is to schedule by means of hardware (processor) what instructions would be simultaneously executed and in what order the instructions are executed. The processor according to this method is programed as an MCU, so that at least four instructions must be executed at once in order to achieve the performance of a DSP with the MCU instructions. Scheduling this, the hardware suffers a very large overhead, so that the processor is hardly achieved with low cost and low power consumption.
The VLIW method has no such drawbacks as the previous method because the scheduling is made by the compiler, so that a small hardware may execute several instructions at once. However, the VLIW instruction has non-scheduled portions containing NOP (No-OPeration) instructions, increasing the code size and thus the bit-width of the program memory. Hence, where the program memory is not included in the processor chip, an external memory bus must be constructed with a large bit-width, increasing the production cost.
Besides all such problems accompanying the design of the unified processor by using the conventional methods, there occurs a problem owing to the structural difference between the data buses of a DSP and an MCU. Generally, an MCU is suitable for applications requiring many real-time computations and controls because it has a large memory region and performs 32 bit integer operation. However, it is sufficient for DSP to make 16 bit fixed point operation. This is the reason that DSP with a 16 bit bus width is unified with an MCU with 32 bit bus width to achieve the unified processor. In this case, the final bus width is constructed to accommodate 32 bits, so that DSP uses only 16 bits of the 32 bit bus, wasting the remaining resources.