The present invention relates to microcontrollers and, more specifically, the present invention relates to opcode instructions that are gathered into an instruction set which are used to manipulate the behavior of the microcontroller.
Microcontroller units (MCU) have been used in the manufacturing and electrical industries for many years. FIG. 1 shows a typical core memory bus arrangement for mid-range MCU devices. In many cases, microcontrollers utilize reduced instruction set computing (RISC) microprocessors. The high performance of some of these devices can be attributed to a number of architectural features commonly found in RISC microprocessors. These features include:
Harvard architecture
Long Word Instructions
Single Word Instructions
Single Cycle Instructions
Instruction Pipelining
Reduced Instruction Set
Register File Architecture
Orthogonal (Symmetric) Instructions
Harvard Architecture:
As shown in FIG. 2, the Harvard architecture has the program memory 26 and data memory 22 as separate memories and are accessed by the CPU 24 from separate buses. This improves bandwidth over traditional von Neumann architecture (shown in FIG. 3) in which program and data are fetched by the CPU 34 from the same memory 36 using the same bus. To execute an instruction, a von Neumann machine must make one or more (generally more) accesses across the 8-bit bus to fetch the instruction. Then data may need to be fetched, operated on, and possibly written. As can be seen from this description, that bus can become extremely conjested.
In contrast to the von Neumann machine, under the Harvard architecture, all 14 bits of the instruction are fetched in a single instruction cycle. Thus, under the Harvard architecture, while the program memory is being accessed, the data memory is on an independent bus and can be read and written. These separated buses allow one instruction to execute while the next instruction is being fetched.
Long Word Instructions:
Long word instructions have a wider (more bits) instruction bus than the 8-bit Data Memory Bus. This is possible because the two buses are separate. This further allows instructions to be sized differently than the 8-bit wide data word which allows a more efficient use of the program memory, since the program memory width is optimized to the architectural requirements.
Single Word Instructions:
Single Word instruction opcodes are 14-bits wide making it possible to have all single word instructions. A 14-bit wide program memory access bus fetches a 14-bit instruction in a single cycle. With single word instructions, the number of words of program memory locations equals the number of instructions for the device. This means that all locations are valid instructions. Typically in the von Neumann architecture (shown in FIG. 3), most instructions are multi-byte. In general however, a device with 4-KBytes of program memory would allow approximately 2K of instructions. This 2:1 ratio is generalized and dependent on the application code. Since each instruction may take multiple bytes, there is no assurance that each location is a valid instruction.
Instruction Pipeline:
The instruction pipeline is a two-stage pipeline which overlaps the fetch and execution of instructions. The fetch of the instruction takes one machine cycle (TCY), while the execution takes another TCY. However, due to the overlap of the fetch of current instruction and execution of previous instruction, an instruction is fetched and another instruction is executed every single TCY.
Single Cycle Instructions:
With the Program Memory bus being 14-bits wide, the entire instruction is fetched in a single TCY. The instruction contains all the information required and is executed in a single cycle. There may be a one-cycle delay in execution if the result of the instruction modified the contents of the Program Counter. This requires that the pipeline be flushed and a new instruction fetched.
Reduced Instruction Set:
When an instruction set is well designed and highly orthogonal (symmetric), fewer instructions are required to perform all needed tasks. With fewer instructions, the whole set can be more rapidly learned.
Register File Architecture:
The register files/data memory can be directly or indirectly addressed. All special function registers, including the program counter, are mapped in the data memory.
Orthogonal (Symmetric) Instructions:
Orthogonal instructions make it possible to carry out any operation on any register using any addressing mode. This symmetrical nature and lack of xe2x80x9cspecial instructionsxe2x80x9d make programming simple yet efficient. In addition, the leaming curve is reduced significantly. The mid-range instruction set uses only two non-register oriented instructions, which are used for two of the cores features. One is the SLEEP instruction that places the device into the lowest power use mode. The other is the CLRWDT instruction which verifies the chip is operating properly by preventing the on-chip Watchdog Timer (WDT) from overflowing and resetting the device.
Clocking Scheme/Instruction Cycle:
The clock input (from OSC1) is internally divided by four to generate four non-overlapping quadrature clocks, namely Q1, Q2, Q3, and Q4. Internally, the program counter (PC) is incremented every Q1, and the instruction is fetched from the program memory and latched into the instruction register in Q4. The instruction is decoded and executed during the following Q1 through Q4. The clocks and instruction execution flow are illustrated in FIGS. 4 and 5.
Instruction Flow/Pipelining:
An xe2x80x9cInstruction Cyclexe2x80x9d consists of four Q cycles (Q1, Q2, Q3, and Q4) as shown in FIGS. 4 that comprise the TCY as shown in FIGS. 4 and 5. Note that in FIG. 5, all instructions are performed in a single cycle, except for any program branches. Program branches take two cycles because the fetch instruction is xe2x80x9cflushedxe2x80x9d from the pipeline while the new instruction is being fetched and then executed.
Fetch takes one instruction cycle while decode and execute takes another instruction cycle. However, due to Pipelining, each instruction effectively executes in one cycle. If an instruction causes the program counter to change (e.g. GOTO) then an extra cycle is required to complete the instruction (FIG. 5). The instruction fetch begins with the program counter incrementing in Q1. In the execution cycle, the fetched instruction is latched into the xe2x80x9cInstruction Register (IR)xe2x80x9d in cycle Q1. This instruction is then decoded and executed during the Q2, Q3, and Q4 cycles. Data memory is read during Q2 (operand read) and written during Q4 (destination write). FIG. 5 shows the operation of the two-stage pipeline for the instruction sequence shown. At time TCY0, the first instruction is fetched from program memory. During TCY1, the first instruction executes while the second instruction is fetched. During TCY2, the second instruction executes while the third instruction is fetched. During TCY3, the fourth instruction is fetched while the third instruction (CALL SUB_1) is executed. When the third instruction completes execution, the CPU forces the address of instruction four onto the Stack and then changes the Program Counter (PC) to the address of SUB_1. This means that the instruction that was fetched during TCY3 needs to be xe2x80x9cflushedxe2x80x9d from the pipeline. During TCY4, instruction four is flushed (executed as a NOP) and the instruction at address SUB_1 is fetched. Finally during TCY5, instruction five is executed and the instruction at address SUB_1 +1 is fetched.
While the prior art microcontrollers were useful, the various modules could not be emulated. Moreover, the type of microcontroller as described in FIG. 1 could not linearize the address space. Finally, the prior art microcontrollers are susceptible to compiler-error problems. What is needed is an apparatus, method, and system for a microcontroller that is capable of linearizing the address space in order to enable modular emulation. There is also a need in the art for reducing compiler errors.
The present invention overcomes the above-identified problems as well as other shortcomings and deficiencies of existing technologies by providing a microcontroller instruction set that eliminates many of the compiler errors experienced in the prior art. Moreover, an apparatus and system is provided that enables a linearized address space that makes modular emulation possible.
The present invention can directly or indirectly address its register files or data memory. All special function registers, including the Program Counter (PC) and Working Register (W), are mapped in the data memory. The present invention has an orthogonal (symmetrical) instruction set that makes it possible to carry out any operation on any register using any addressing mode. This symmetrical nature and lack of xe2x80x98special optimal situationsxe2x80x99 make programming with the present invention simple yet efficient. In addition, the learning curve for writing software applications is reduced significantly. One of the present invention""s enhancements over the prior art allows two file registers to be used in some two operand instructions. This allows data to be moved directly between two registers without going through the W register; and thus increasing performance and decreasing program memory usage.
The preferred embodiment of the present invention includes an ALU/W register, a PLA, an 8-bit multiplier, a program counter (PC) with stack, a table latch/table pointer, a ROM latch/IR latch, FSRs, interrupt vectoring circuitry, and most common status registers. Unlike the prior art, the design of the present invention obviates the need for a timer in a separate module, all reset generation circuitry (WDT, POR, BOR, etc.), interrupt flags, enable flags, INTCON registers, RCON registers, configuration bits, device ID word, ID locations, and clock drivers.
Additional embodiments will be clear to those skilled in the art upon reference to the detailed description and accompanying drawings.