This application is related to Korean Application No. 99-10693, filed Mar. 27, 1999, the disclosure of which is hereby incorporated herein by reference.
The present invention relates to digital data processing systems and, more particularly, to microcomputer systems for processing both compressed and uncompressed instructions.
Today, the computer industry is continuing to be revolutionized by the advances made in integrated-circuit technology. Particularly, single integrated circuit (IC) computers or microcomputers are rapidly advancing. The heart of a microcomputer system is a microprocessor which is usually a general-purpose processor built into a single IC chip. A microprocessor is an example of very large scale integration (VLSI) devices. The use of VLSI circuitry microprocessors (or microcomputers) have brought the benefits of smaller size, lighter weight, lower cost, reduced power requirements, and higher reliability.
Typical microprocessors, such as Intel x86 families and Motorola M68000 series, are known for their abundant instruction sets, multiple addressing modes, and multiple instruction formats and sizes. Their control is micro-programmed, and different instructions are executed within a different number of cycles. The control units of such microprocessors are naturally complex, since they have to distinguish between a large number of op-codes, addressing modes, and formats. This type of system belongs to the category called Complex Instruction Set Computer (CISC). A CISC processor system with a large menu of features typically requires a larger and more complicated decoding subsystem, preceding the complex control logic. Logic signals will usually have to propagate through a considerable number of gates, increasing the duration of delays and slowing down the system. In such a micro-programmed environment, increased complexity will directly result in longer micro-routines and therefore their longer execution to produce all necessary micro-operations and their corresponding control signals to execute an instruction.
One of the ways to increase the speed of execution on any computer is to implement pipelining. An n-stage pipelined system deals with n subsequent instructions simultaneously so that pipelining can improve significantly system performance. For many years pipelining has been implemented in numerous CISC systems. Many CISC systems, however, execute their instructions in more than one system clock cycle because of their complexity and memory access for operands during execution. To efficiently handle a pipeline, it may be necessary to enable a computer system to fetch an instruction in a single system clock cycle, and then execute it in a cycle. This requirement can easily be met in Reduced Instruction Set Computer (RISC) design approach.
In recent RISC-type processors, like SPARC, MIPS, and Alpha from DEC, PA from HP, PowerPC from IBM, i860 from Intel, practically everything (e.g., the number of instructions, addressing modes, and formats) is reduced. In such RISC processors, all instructions may have the same size and almost all of them may be executed within a single clock cycle.
With the uniform instruction size in RISC systems, all of the instructions can be fetched in a clock cycle. Because of RISC simplicity and restricted memory access, most RISC instructions are executed in a single cycle. Therefore, RISC systems can handle pipelining more efficiently than CISC systems. In general, it is very important to determine optimal instruction size upon designing microcomputer systems since the size directly affects system performance and power consumption.
When a system has a larger instruction size than a normal instruction size (e.g. system data bus width), performance of the system can be degraded because an instruction must be fetched more than twice through its system data bus, that is, due to the bottleneck of instruction fetch. A smaller instruction size than the normal instruction size can also cause the deteriorated system performance because for example, the number of registers to access at a time and the size of immediate data to be used for jump and arithmetic instructions are limited.
Considering system performance, it would be desirable to make data bus width equal to or larger than the instruction size but on the other hand, it will increase system power consumption. In the meantime, some other applications pursuing only performance will require instructions and data buses as wide as possible in order to greatly profit from such architectures regardless of power consumption. These are usually called Very Large instruction Word (VLIW) architectures. These architectures consume large amounts of power on the whole since they need larger size data buses and cache memories.
Some other applications will require instructions and data buses as narrow as possible to allow efficient operation with minimal instructions. These are usually called Minimal Instruction Set Computer (MISC) architectures. These architectures consume small amounts of power on the whole since they need smaller size data buses and cache memories. A technique of the MISC architectures is set forth in xe2x80x9cMuP21xe2x80x94High Performance MISC Processorxe2x80x9d, by Charles Moore and C. H. Ting, January 1995 issue of Forth Dimensions.
Accordingly, instruction size of microcomputers should be determined by considering its several parameters such as performance, power consumption, and applications. Examples of RISC processors, having been developed With such design targets, include SuperH (SH 7000) series from Hitachi and ARM7TDMI from Advanced RISC Machines Ltd. (ARM). The SH series microcomputer is a RISC processor for processing 32-bit data with 16-bit instructions. The processor has an efficient architecture composed of optimal instruction set. However, because of the short length of the instructions of the processor, it is hard to compose the instruction for improving performance of the processor. In addition, advantages obtained from the architecture of 32-bit microcomputer are restricted.
The ARM7TDMI processor employs an architectural strategy known as Thumb. The ARM7TDMI processor has two types of instruction sets, such as standard 32-bit ARM sets and 16-bit Thumb sets. A 16-bit instruction set is processed in a so-called Thumb mode, in which a 16-bit instruction is first converted into a 32-bit one and then the 32-bit instruction is performed in practice. An immediate data for a 32-bit instruction is assigned to 12-bit (having address space of 4K bytes), but one for a 16-bit instruction is restricted to 8-bit (having address space of 256 bytes). In the Thumb mode, if the branch space of a 16-bit instruction exceeds 256 bytes, then it has to be performed more than once so that the processor performance will be decreased. The more detailed architecture is described in xe2x80x9cARM7DMI Data Sheetxe2x80x9d, which is available at www.arm.com
As described above, in spite of high-performance, the VLIW architecture may consume too much power, but the application of the MISC architecture may be restricted within limited applications. In addition, the above 32-bit processors having only 16-bit instructions, such as the SH series from Hitachi, may have structural weakness on account of its instruction length. Further, the above 32-bit processors supporting both 16-bit and 32-bit instructions, such as the ARM7TDMI from ARM, may suffer from pipeline stalls due to its different instruction lengths.
Embodiments of the present invention comprise a microcomputer having a preferred instruction processor therein that can process both normal length instructions and compressed instructions. According to preferred aspects of these embodiments, the normal length instructions and the compressed instructions are provided from memory to an instruction register and then passed through decoding circuitry to a processor core. The decoding circuitry preferably comprises a demultiplexer having a data input that receives a first multi-bit instruction from the instruction register and a select input that receives a first select signal (SEL1). A compressed instruction decoder is also provided. The compressed instruction decoder has a data input electrically coupled to a first data output of the demultiplexer and a select input that receives a second select signal (SEL2). A multiplexer is also provided. The multiplexer has a first data input electrically coupled to an output of the compressed instruction decoder, a second data input electrically coupled to a second data output of the demultiplexer and a select input that receives the first select signal (SEL1). The output of the demultiplexer is electrically coupled to the processor core.
Based on this configuration of the decoding circuitry, a first select signal having a first logic value (e.g., logic 1) can be used to enable a normal length instruction to be passed directly from the second output of the demultiplexer to the second data input of the multiplexer and then to the processor core. Alternatively, a first select signal having a second logic value (e.g., logic 0) can be used to enable a compressed instruction to be passed from the first output of the demultiplexer to the compressed instruction decoder. The output of the compressed instruction decoder is then passed to the first input of the mutiplexer and then to the processor core. The processor core may generate the first and second select signals.
Embodiments of the present invention may also include methods of operating compressed instruction decoders. These methods preferably include the steps of decoding a first compressed instruction (within a multi-instruction word) as a first multi-bit operand and decoding a second compressed instruction (within the multi-instruction word) as a first address of a memory array containing a plurality of instructions (e.g., a plurality of first type instructions having operands and a plurality of second type instructions which do not have operands). These methods may also include simultaneously providing a first instruction, which is stored at the first address within the memory array, and the first multi-bit operand in parallel to an output of the compressed instruction decoder. Each of the compressed instructions that are provided to the compressed instruction decoders may include information as to how the respective compressed instruction is to be decoded. For example, for a compressed instruction having a length of 8 bits, the two most significant bits may be used as select inputs to a 1-to-4 demultiplexer which selectively passes the compressed instruction as an operand or as an address of a memory array containing instructions, for example.
Embodiments of the invention may also include methods of operating instruction processors. These methods may comprise the steps of transferring a first M-bit instruction within a first memory device to a processing device that performs operations defined by the first M-bit instruction and transferring a second M-bit instruction within the first memory device to a compressed instruction decoder. Steps are also performed to decode first and second compressed instructions (within the second M-bit instruction) as a first address within a second memory device and as a first operand, respectively. A third M-bit instruction located at the first address within the second memory device may then be transferred to the processing device. Thus, the first compressed instruction may be decoded and used to retrieve an M-bit (e.g., normal length) instruction. This M-bit instruction may then be passed to the processing device. These methods may also include steps to transfer a fourth M-bit instruction within the first memory device to the compressed instruction decoder and then decode third and fourth compressed instructions (within the fourth M-bit instruction) as a second address within the second memory device and as a second operand, respectively. Next, an N-bit instruction (where N less than M) located at the second address within the second memory device and the second operand may be transferred to a M-bit register. The N-bit instruction and the second operand may then be transferred in parallel from the M-bit register to the processing device.