(a) Field of the Invention
The present invention relates to a method for designing a system LSI (large-scale integrated circuit) and, more particularly, to a method for designing a system LSI having a higher design choice of the band for the communication interface for transferring therethrough data between a basic-instruction processor and a dedicated-instruction processor. The present invention also relates to a recording medium for storing therein the software for designing such a system LSI.
(b) Description of the Related Art
In recent years, a system LSI, or system-on-chip LSI, is increasingly used which realizes all the circuit functions of a system on a single chip. In addition, along with the finer fabrication process of the semiconductor devices, the number of logic gates integrated on a system LSI has increased dramatically, wherein the system LSI has a higher processing performance accordingly.
The system LSIs are used for a variety of processings such as image processing, encryption, filtering, and decoding, wherein the input/output signals have a variety of formats, and a variety of algorithms are used for processing these signals. In addition, the system LSIs have a variety of throughputs depending on the performances requested for the processing. In the recent trend, the algorithms used in the system LSI become more and more complicated, and the throughputs for the processing are significantly improved.
For the reasons as described heretofore, the recent system LSIs are designed for dedicated processing for the signals used therein.
FIG. 5 shows a flowchart for designing a system LSI by using a software-hardware-collaborated (SHC) design system. In general, in the SHC design system using a behavior synthesis, an algorithmic description D1 described in a general-purpose language such as C language, or another higher-level language such as a dedicated language used for the operational level description is translated into a lower-level-language description, such as a logic synthesis RTL (register transfer level) description D5. The RTL description can be converted into hardware by using hardware resources including a memory such as a register, and a processor such as an adder.
The algorithmic description D1 describes all the functions of the system LSI. If the most part of the functions are to be implemented by hardware, the system LSI has a larger circuit scale and is thus expensive, although the system LSI has a higher throughput for the processing. On the other hand, if the most part of the functions are to be implemented by software, the system LSI has a lower throughput, although the system LSI has a smaller circuit scale. Accordingly, in the initial stage of the design, as shown in FIG. 1, the functions described in the algorithmic description D1 are divided into two groups in consideration of the constraints (or settings) for the system LSI including circuit scale, throughput performance for processing, cost etc. (step S201). The two groups include a first group implemented by hardware resources, and a second group implemented by software resources.
If all the hardware resources are designed for all the details thereof, the development for the hardware resources will take higher cost and longer time length. For this reason, hardware macros designed in the past and stored as hardware intellectual property (IP) are reused, while utilizing the stored resources as much as possible. The hardware IP is generally designed in consideration of reuse feasibility and higher versatility, and thus is installed with ease in the structure of the system LSI.
The hardware is designed as a combination of a basic-instruction processor 11 such as a microprocessor for processing versatile calculations and a dedicated-instruction processor 12 dedicated to specific processings such as an input/output processing. A hardware IP designed before is generally used as the basic-instruction processor 11. The basic-instruction processor 11 is designed by another division in the semiconductor maker other than the division which develops the system LSI, the another division being dedicated to designing processors by using a register transfer design technique. The description for the basic-instruction processor 11 is presented together with simulation description D6a. 
After the basic-instruction processor 11 to be used in the system LSI is determined, the process advances to design for the dedicated-instruction processor 12 (step S202). Since the dedicated-instruction processor 12 and the basic-instruction processor 11 communicate data therebetween via buses, the design for the bus interface of the dedicated-instruction processor 12 is performed consistent with the bus specification of the basic-instruction processor 11. The design for the dedicated-instruction processor 12 is expressed in a high-level language.
From the description of the dedicated-instruction processor 12, an RTL description D5b of the dedicated-instruction processor 12 and a simulation description D6b are obtained (step S203). It is judged in step S204 whether or not the RTL description D5b of the dedicated-instruction processor 12 can realize a circuit scale within a setting previously established as a constraint for the system LSI. If the circuit scale is above the setting, the process returns to step S201, wherein the design for the system LSI is iterated from the start. If it is judged in step S204 that the circuit scale is within the setting, the RTL description D5a of the basic-instruction processor 11 and the description D5b of the dedicated-instruction processor 12 are coupled via the RTL description 5c of the buses (step S205).
Subsequently, in the software design, application programs and a device driver for operating thereon the dedicated-instruction processor 12 are defined in a high-level language (step S206). The application programs and the device driver are complied and translated into the machine-language instructions which the basic-instruction processor 11 or the dedicated-instruction processor 12 can directly understand by using a compiler (compiler/assembler/linker) (step S207). The machine-language instructions thus obtained (step S208), the simulation descriptions D6a and D6b and the bus simulation description D6c are combined to form an overall simulation description, which is input to an instruction set simulator, wherein a simulation is performed in an environment similar to the environment of the system LSI (step S209).
By simulating the hardware and software in the instruction set simulator, it is examined or verified that there is no error in the design of the hardware and software. In step S209, the time-domained throughput performance and power consumption of the system LSI is also measured under the actual service condition, and it is judged whether or not the throughput thus measured satisfies the performance required of the system LSI (step S210). If the result of judgement is negative, the process returns to step S201, wherein the division of functions into software and hardware groups is corrected. If the result of judgement is affirmative in step S210, the design for the system LSI is completed, followed by logic synthesis of the RTL description D5a of the basic-instruction processor 11, the RTL description D5b of dedicated-instruction processor 12, and the bus RTL description D5c is performed to determine the actual gate circuit of the system LSI.
FIG. 6 shows the configuration of the system LSI obtained by the conventional design as described heretofore. The system LSI includes the basic-instruction processor 11 and the dedicated-instruction processor 12, which are interconnected via an instruction control bus 71 and a data communication bus 72. The basic-instruction processor 11 includes: an instruction control block including an instruction decoder 21, a data memory controller 22, and an execution controller 23; a data memory block 24; an execution block 25; an instruction control interface 41a; a data memory interface 42a; an execution interface 43a; and an instruction fetch register 20. The dedicated-instruction processor 12 includes: an instruction control block including an instruction decoder 31, a data memory controller 32, and an execution controller 33; a data memory block 34; an execution block 35; an instruction control interface 41b; a data memory interface 42b; and an execution interface 43a. 
An instruction data bus 51a and an instruction control bus 52a are connected to the instruction control interface 41a. A memory selection bus 53a and a memory access bus 54a are connected to the data memory interface 42a. An execution selection bus 55a and an execution access bus 56a are connected to the execution interface 43a. 
The instruction control interface 41a delivers a control signal for controlling the data memory interface 42a via the memory control bus 61a, and also delivers a control signal for controlling the execution interface 43a via the execution control bus 62a. The data memory interface 42a selects a data memory via the memory selection bus 53a based on the control signal received from the instruction control interface 41a, thereby controlling the timing of read/write (R/W) of the data delivered via the memory access bus 54a. 
The execution interface 43a selects an execution unit 26 via the execution selection bus 55a based on the control signal received from the instruction control interface 41a, thereby controlling input/output (I/O) of the data via the execution access bus 56a and reading/writing data from/to the data memory block 24 via the read access bus 63a and the write access bus 64a. 
An instruction data bus 51b and an instruction control bus 52b are connected to the instruction control interface 41b. A memory selection bus 53b and a memory access bus 54b are connected to the data memory interface 42b. An execution selection bus 55b and an execution access bus 56b are connected to the execution interface 43b. 
The instruction control interface 41b delivers a control signal for controlling the data memory interface 42b via the memory control bus 61b, and also delivers a control signal for controlling the execution interface 43b via the execution control bus 62b. The data memory interface 42b selects a data memory via the memory selection bus 53b based on the control signal received from the instruction control interface 41b, thereby controlling the timing of R/W of data via the memory access bus 54b. The execution interface 43b selects an execution unit 36 via the execution selection bus 55b based on the control signal received from the instruction control interface 41b, thereby controlling I/O of data via the execution access bus 56b and reading/writing the data of the data memory block 34 via the read access bus 63b and the write access bus 64b. 
The instruction control interfaces 41a and 41b are connected together via an instruction control bus 71. The data memory interfaces 42a and 42b are connected together via a data communication bus 72. The dedicated-instruction processor 12 accesses the instruction fetch register 20 via the instruction control bus 71, thereby reading/writing data from/to the data memory block 24 via the data communication bus 72. The number of the data memories to which the dedicated-instruction memory 12 can simultaneously access in the memory block 24 is defined by the bus-band of the data communication bus 72.
In the conventional design as described above, the interfaces and the buses disposed between the basic-instruction processor 11 and the dedicated-instruction processor 12 are determined and fixed based on the specification of the basic-instruction processor 11, wherein the resources of the processors 11 and 12 are shared therebetween under the fixed conditions. For example, if the bus-band of the data communication bus 72 is smaller than the number of data memories that the dedicated-instruction processor 12 can simultaneously access in the data memory block 24, the access cannot be performed at a single clock cycle and takes a number of clock cycles due to the smaller bus-band.
In addition, although the basic-instruction processor 11 includes therein versatile calculation resources in the execution block 25, the execution block 25 cannot be accessed from the execution block 35, because there is no bus between the execution block 25 and the execution block 35. For this reason, even if the basic-instruction processor 11 and the dedicated-instruction processor 12 operate for calculation of the same rule of arithmetic, the calculation resources are separately disposed for both the processors 11 and 12.
The simulation descriptions input to the instruction set simulator for the verification of operation of the system LSI are created separately during the respective designs for the basic-instruction processor 11, dedicated-instruction processor 12 and the bus block. Since the instruction set simulator combines the separate simulation descriptions for verification of the operation of the system LSI, the instruction set simulator must verify the communication between the basic-instruction processor 11 and the dedicated-instruction processor 12 as well as the device driver of the dedicated-instruction processor 12 in a low-level language, after the design for the bus block is completed. This verification is complicated and consumes a long time.
An “Xtensa” processor was recently proposed which can solve the above problems in the conventional technique, by changing the resources such as the memory size based on the settings (or constraints), such as number of gates, throughput and power dissipation, in a user-specific LSI to restructure the basic-instruction processor architecture (refer to Design Wave Magazine 1999, December). In the proposed processor, an additional interface is defined for the dedicated instructions separately from the interface of the basic-instruction processor to be restructured.
Another processor called VUPU processor is also known which has dedicated instructions obtained by behavior synthesis which translates the language of the operational level description of the user-specific dedicated-instruction processor architecture into the RTL description. The VUPU processor can be operated by the instruction from the basic-instruction processor (refer to Design Wave Magazine1999, December).
In the proposed VUPU processor, the basic-instruction processor architecture and the interface for adding dedicated instructions thereto are defined, wherein the basic-instruction processor architecture and the dedicated-instruction processor architecture are separated.