1. Field of the Invention
The present invention relates to methods and apparatuses for performing high speed instruction decode within computer processors. In particular, this invention relates to a microcode based instruction decoders. Even more particularly, the present invention relates to a method and apparatus for using pipelining and parallel processing in the decode of instructions within a microprocessor.
2. Description of the Related Art
Modem microprocessors employ pipelining techniques which allow multiple, consecutive instructions to be prefetched, decoded, and executed in separate stages simultaneously. Accordingly, in any given clock cycle, a first instruction may be executed while the next (second) instruction is simultaneously being decoded, and the instruction after that one (a third instruction) is simultaneously being fetched. Since less processing is performed on each instruction per cycle, cycle time can be made shorter. Thus, while it requires several clock cycles for a single instruction to be pre-fetched, decoded, and executed, it is possible to have a processor completing instructions as fast as one instruction per cycle with a very short cycle period, because multiple consecutive instructions are in various stages simultaneously.
Typically, buffers for temporarily holding data are used to define the boundary between consecutive stages of a microprocessor pipeline. The data calculated in a particular stage is written into these buffers before the end of the cycle. When the pipeline advances upon the start of a new cycle, the data is written out of the boundary buffers into the next stage where the data can be further processed during that next cycle.
Most pipelined microprocessor architectures have at least four stages including, in order of flow, 1) a prefetch stage, 2) a decode stage, 3) an execute stage, and 4) a write-back stage. In the prefetch stage, instructions are read out of memory (e.g., an instruction cache) and stored in a buffer. Depending on the particular microprocessor, in any given cycle, the prefetch buffer may receive one to several instructions.
In the decode stage, the processor reads an instruction out of the prefetch buffer and converts it into an internal instruction format which can be used by the microprocessor to perform one or more operations, such as arithmetic or logical operations. In the execute stage, the actual operations are performed. Finally, in the write-back stage, the results of the operations are written to the designated registers and/or other memory locations.
In more complex microprocessors, one or more of the four basic stages can be further broken down into smaller stages to simplify each individual stage and even further improve instruction completion speed.
The hardware in an instruction prefetch stage typically comprises a prefetch buffer or prefetch queue which can temporarily hold instructions. Each cycle, the decode stage can take in the bytes of an instruction held in the prefetch stage for decoding during that cycle.
The hardware in a decode stage typically comprises at least a program counter and hardware for converting instructions into control lines for controlling the hardware in the execute stage. Alternately, the decode stage can include a microcode-ROM. The incoming instruction defines an entry point (i.e., an address) into the microcode-ROM at which the stored data defines the appropriate conditions for the execute stage control lines. The execute stage control data for the particular instruction may exist entirely at a single addressable storage location on the microcode-ROM or may occupy several sequentially addressable storage locations. The number of addressable storage locations in the microcode-ROM which must be accessed for a given instruction may be encoded in the instruction itself. Alternately, one or more data bits in the storage locations in the microcode-ROM may indicate whether or not another storage location should be accessed.
The control data output from the microcode-ROM is written into buffer registers for forwarding to the execute stage on the next cycle transition. The decode stage also includes hardware for extracting the operands, if any, from the instruction or from registers or memory locations and presenting the operands to the appropriate hardware in the execution stage.
Some microprocessor architectures employ what are known as variable width instruction sets. In such architectures, the instructions are not all the same width. For instance, in the instruction set for the 16/32 bit class .times.86 family of microprocessors developed by Intel Corporation of Santa Clara, Calif., an instruction can be anywhere from 1 to 16 bytes wide.
Some microprocessor architectures utilize a segmented address space in which the total memory space is broken down into a plurality of independent, protected address spaces. Each segment is defined by a base address and a segment limit. The base address, for instance, may be the lowest numerical address in the segment space. The segment limit defines the size of the segment. Accordingly, the end boundary of the segment is defined by the sum of the base address and the segment limit. Alternately, the base address may be the highest address and, as such, the end boundary of the segment would be the difference between the base address and the segment limit.
To generate a linear address according to the .times.86 architecture, at the very least two quantities are added. Particularly, the base address of the particular segment, as indicated by the segment descriptor, and an offset, indicating the distance of the desired data (i.e., instruction) from the base of the segment, must be added together. The offset itself may be comprised of up to three more parts: a base, an index and a displacement. If so, those quantities must be added to generate the offset before the offset can be added to the segment base. A more detailed discussion of segmented addressing in the .times.86 architecture follows but a complete discussion can be found in the Intel486.TM. Microprocessor Family Proprammer's Reference Manual, 1995, by Intel Corporation which is incorporated herein by reference.
Prior art instruction decode units in microprocessors receive instructions from the instruction prefetch unit and translate them in a two stage process into low level control signals and microcode entry points. Many but not all instructions can be decoded at a rate of one per clock cycle. Stage 1 of the decode initiates a memory access. This allows execution of a two instruction sequence which loads and operates on data in just two clock cycles. The decode unit simultaneously processes instruction prefix bytes, opcodes, ModR/M bytes, and displacements. The outputs include hardwired micro instructions to the segmentation, integer, and floating point units. The decode unit is flushed whenever the instruction prefetch unit is flushed.
FIG. 1 is a block diagram generally illustrating the various pipeline stages of a conventional microprocessor. As shown, the microprocessor is pipelined into five stages, namely: a prefetch stage, a decode stage, an execute stage, a write back stage, and a second write back stage. As shown, the prefetch stage includes two prefetch buffers 112 and 114. Prefetch buffer 112 is the line buffer from which the decode stage pulls instruction bytes. It is the only data interface between the prefetch and decode stages. The prefetch stage also includes a one kilobyte instruction cache 116 and a cache tag memory 118 for storing tag data related to the data in the instruction cache 116. The instruction cache is directly mapped with a line size 8 bytes wide. Both prefetch buffers are also 8 bytes wide, containing byte positions 0 (least significant byte) through byte position 7 (most significant byte). The prefetch stage also includes prefetch logic 120 for performing various functions relating to the control of the loading of the prefetch buffers with instructions.
Referring to FIGS. 2 and 3, the structure of instructions are briefly described. All instructions in the X86 instruction set are considered herein to be comprised of up to three subfields, each subfield adding several possible byte widths. The three possible subfields are the prefix subfield 200, the opcode subfield 210, and the constant subfield 220. Every instruction comprises at least an opcode subfield 240. The opcode subfield 210 defines the function which the execute stage is to perform with respect to that instruction (e.g., add, subtract, multiply, XOR, data movement, etc.). The opcode subfield 210 can be one, two or three bytes in length. The opcode subfield 210 will always include an opcode byte 230 defining the function to be performed. It may also include a ModR/M byte 240. The ModR/M byte 240 is an addressing mode specifier. It specifies whether an operand is in a register or memory location, and if in memory, it specifies whether a displacement, a base register, an index register and/or scaling are to be used. When the ModR/M byte 240 indicates that an index register will be used to calculate the address of an operand, the instruction may comprise a third byte, termed the scaled index byte (SIB) 250. The SIB byte 250 is included in the instruction to encode the base register, the index register and a scaling factor.
Certain instructions include a third subfield, called the constant data subfield 220, which can specify one or two operands used by the instruction. Specifically, the constant data subfield may comprise a displacement data operand 260, an immediate data operand 270, a displacement data operand 20 and an immediate data operand 270, or two immediate operands 270. When the addressing mode is one in which a displacement will be used to compute the address of an operand, the instruction includes a displacement data operand 260 as part of the constant subfield 220. A displacement data operand 260 can be one, two or four bytes in length. An immediate operand 270 directly provides the value of an operand. An immediate operand 270 may be one, two of four bytes in length.
Accordingly, the constant subfield 220, if any, can be one, two, three, four, five, six or eight bytes wide. Certain parameters, such as the segment register to be used by instructions, the address size and the operand size, are set to default conditions in the execute and/or decode stages. These parameters, however, can be overridden by prefix bytes in the prefix subfield 200 of an instruction. There are four basic types of prefix bytes, namely, an address prefix 280 for selecting between 16 or 32 byte addressing, an operand size prefix byte 285 for selecting between 16 or 32 byte data size, a segment override byte 290 which specifies the segment register an instruction should use, an instruction prefix 275 which can toggle between two states which determine the table from which the opcode byte 230 is decoded.
Accordingly, the use of the instruction prefix 275 essentially doubles the possible number of instructions. Because a popular prefix byte type, e.g., address prefix, can appear more than once in a single instruction, there can be anywhere from 0 to 14 prefix bytes in an instruction. Decoders can pull out a maximum of seven instruction bytes 200 per cycle from the line buffer 12.
Specifically, a conventional decode stage decodes, per cycle, (1) one prefix byte 200 or (2) an opcode subfield 210 (of up to three bytes) and the first operand in the constant data subfield 220 (of up to four bytes), if any, or (3) the second operand in the constant data subfield 220 (of up to four bytes). Accordingly, using a conventional decoder an instruction which has two operands in its constant data subfield 220 requires at least two cycles to be executed and possibly more if the instruction has any prefix bytes. What is needed is a decoder that can decode in fewer cycles.
The information encoded in an instruction includes a specification of the operation to be performed, the type of the operands to be manipulated, and the location of these operands. If an operand is located in memory, the instruction also must select, explicitly or implicitly, the segment which contains the operand.
As indicated above, an instruction may have various parts with different functions. The following section describes the function of the different parts of an instruction in more detail than the previous section. However, a complete description of the precise function of the parts of instructions is described in Appendix A of the Intel486.TM. Processor Family Programmer's Reference Manual. Of these parts, only the opcode is always present as indicated above. The other parts may or may not be present, depending on the operation involved and the location and type of the operands. The function of the different parts of an instruction are listed below:
Prefixes 200: one or more bytes preceding an instruction which modify the operation of the instruction. The following prefixes can be used by application programs: PA1 SHR PATTERN, 2 PA1 IMUL CX, MEMWORD, 3 PA1 F3H REP prefix (used only with string instructions) PA1 F3H REPE/REPZ prefix (used only with string instructions) PA1 F2H REPNE/REPNZ prefix (used only with string instructions) PA1 F0H LOCK prefix PA1 2EH CS segment override prefix PA1 36H SS segment override prefix PA1 3EH DS segment override prefix PA1 26H ES segment override prefix PA1 64H FS segment override prefix PA1 65H GS segment override prefix PA1 66H Operand-size override PA1 67H Address-size override PA1 The indexing type or register number to be used in the instruction PA1 The register to be used, or more information to select the instruction PA1 The base, index, and scale information PA1 The mod field 242, which occupies the two most significant bits of the byte, combines with the R/M field to form 32 possible values: eight registers and 24 indexing modes. PA1 The reg field 244, which occupies the next three bits following the mod field, specifies either a register number or three more bits of opcode information. The meaning of the reg field is determined by the first (opcode) byte of the instruction. PA1 The R/M field 246, which occupies the three least significant bits of the byte, can specify a register as the location of an operand, or can form part of the addressing-mode encoding in combination with the mod field as described above. PA1 The ss field 252, which occupies the two most significant bits of the byte, specifies the scale factor. PA1 The index field 254, which occupies the next three bits following the ss field and specifies the register number of the index register. PA1 The base field 256, which occupies the three least significant bits of the byte, specifies the register number of the base register.
1. Segment override 290--explicitly specifies which segment register an instruction should use, instead of the default segment register. PA2 2. Address size 280--switches between 16- and 32-bit addressing. Either size can be the default; this prefix selects the non-default size. PA2 3. Operand size 285--switches between 16- and 32-bit data size. Either size can be the default; this prefix selects the non-default size. PA2 4. Repeat or instruction prefix 275--used with a string instruction to cause the instruction to be repeated for each element of the string or used to increase the number of possible instructions. PA2 Opcode 230: specifies the operation performed by the instruction. Some operations have several different opcodes, each specifying a different form of the operation. PA2 Register specifier 230, 240: an instruction may specify one or two register operands. Register specifiers occur either in the same byte as the opcode or in the same byte as the addressing-mode specifier. PA2 Addressing-mode specifier 240: when present, specifies whether an operand is a register or memory location; if in memory, specifies whether a displacement, a base register, an index register, and scaling are to be used. PA2 SIB 250 (scale, index, base) byte: when the addressing-mode specifier indicates an index register will be used to calculate the address of an operand, a SIB byte is included in the instruction to encode the base register, the index register, and a scaling factor. PA2 Displacement 260: when the addressing-mode specifier indicates a displacement will be used to compute the address of an operand, the displacement is encoded in the instruction. A displacement is a signed integer of 32, 16, or 8 bits. The 8-bit form is used in the common case when the displacement is sufficiently small. The processor extends an 8-bit displacement to 16 or 32 bits, taking into account the sign. PA2 Immediate operand 270: when present, directly provides the value of an operand. Immediate operands may be bytes, words, or doublewords. In cases where an 8-bit immediate operand is used with a 16- or 32-bit operand, the processor extends the eight-bit operand to an integer of the same sign and magnitude in the larger size. In the same way, a 16-bit operand is extended to 32-bits.
An instruction acts on zero or more operands. An example of a zero-operand instruction is the NOP instruction (no operation). An operand can be held in any of these places: in the instruction itself (an immediate operand); in a register (in the case of 32-bit operands, EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP; in the case of 16-bit operands AX, BX, CX, DX, SI, DI, SP, or BP; in the case of 8-bit operands AH, AL, BH, BL, CH, CL, DH, or DL; the segment registers; or the EFLAGS register for flag operations). Use of 16-bit register operands requires use of the 16-bit operand size prefix (a byte with the value 67H preceding the instruction); in memory; or at an I/O port.
Access to operands is very fast. Register and immediate operands are available on-chip--the latter because they are prefetched as part of interpreting the instruction. Memory operands residing in the on-chip cache can be accessed just as fast.
Of the instructions which have operands, some specify operands implicitly; others specify operands explicitly; still others use a combination of both. For example:
Implicit operand: AAM
By definition, AAM (ASCII adjust for multiplication) operates on the contents of the AX register.
Explicit operand: XCHG EAX, EBX
The operands to be exchanged are encoded in the instruction with the opcode.
Implicit and explicit operands: PUSH COUNTER
The memory variable COUNTER (the explicit operand) is copied to the
top of the stack (the implicit operand).
Note that most instructions have implicit operands. All arithmetic instructions, for example, update the EFLAGS register
An instruction can explicitly reference one or two operands. Two-operand instructions, such as MOV, ADD, and XOR, generally overwrite one of the two participating operands with the result. This is the difference between the source operand (the one unaffected by the operation) and the destination operand (the one overwritten by the result).
For most instructions, one of the two explicitly specified operands--either the source or the destination--can be either in a register or in memory. The other operand must be in a register or it must be an immediate source operand. This puts the explicit two-operand instructions into the following groups: Register to register, Register to memory, Memory to register, Immediate to register, and Immediate to memory.
Certain string instructions and stack manipulation instructions, however, transfer data from memory to memory. Both operands of some string instructions are in memory and are specified implicitly. Push and pop stack operations allow transfer between memory operands and the memory-based stack.
Several three-operand instructions are provided, such as the IMUL, SHRD, and SHLD instructions. Two of the three operands are specified explicitly, as for the two-operand instructions, while a third is taken from the ECX register or supplied as an immediate. Other three-operand instructions, such as the string instructions when used with a repeat prefix, take all their operands from registers.
Certain instructions use data from the instruction itself as one (and sometimes two) of the operands. Such an operand is called an immediate operand. It may be a byte, word, or doubleword. For example:
One byte of the instruction holds the value 2, the number of bits by which to shift the variable PATTERN.
TEST PATTERN, OFFFFOOFFH
A doubleword of the instruction holds the mask which is used to test the variable PATTERN.
A word in memory is multiplied by an immediate 3 and stored into the CX register.
All arithmetic instructions (except divide) allow the source operand to be an immediate value. When the destination is the EAX or AL register, the instruction encoding is one byte shorter than with the other general registers.
Operands may be located in one of the 32-bit general registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP), in one of the 16-bit general registers (AX, BX, CX, DX, SI, DI, SP, or BP), or in one of the 8-bit general registers (AH, BH, CH, DH, AL, BL, CL, or DL).
The processor has instructions for referencing the segment registers (CS, DS, ES, SS, FS, and GS). These instructions are used by application programs only if system designers have chosen a segmented memory model.
The processor also has instructions for changing the state of individual flags in the EFLAGS register. Instructions have been provided for setting and clearing flags which often need to be accessed. The other flags, which are not accessed so often, can be changed by pushing the contents of the EFLAGS register on the stack, making changes to it while it's on the stack, and popping it back into the register.
Instructions with explicit operands in memory must reference the segment containing the operand and the offset from the beginning of the segment to the operand. Segments are specified using a segment-override prefix, which is a byte placed at the beginning of an instruction. If no segment is specified, simple rules assign the segment by default. The offset is specified in one of the following ways:
Most instructions which access memory contain a byte for specifying the addressing method of the operand. The byte, called the ModR/M byte, comes after the opcode and specifies whether the operand is in a register or in memory. If the operand is in memory, the address is calculated from a segment register and any of the following values: a base register, an index register, a scaling factor, and a displacement. When an index register is used, the ModR/M byte also is followed by another byte to specify the index register and scaling factor. This form of addressing is the most flexible.
A few instructions use implied address modes:
A MOV instruction with the AL or EAX register as either source or destination can address memory with a doubleword encoded in the instruction. This special form of the MOV instruction allows no base register, index register, or scaling factor to be used. This form is one byte shorter than the general-purpose form.
String operations address memory in the DS segment using the ESI register, (the MOVS, CMPS, OUTS, and LODS instructions) or using the ES segment and EDI register (the MOVS, CMPS, INS, SCAS, and STOS instructions).
Stack operations address memory in the SS segment using the ESP register (the PUSH, POP, PUSHA, PUSHAD, POPA, POPAD, PUSHF, PUSHFD, POPF, POPFD, CALL, LEAVE, RET, IRET, and IRETD instructions, exceptions, and interrupts).
The ModR/M byte provides the most flexible form of addressing. Instructions which have a ModR/M byte after the opcode are the most common in the instruction set. For memory operands specified by a ModR/M byte, the offset within the selected segment is the sum of three components: A displacement+A base register+An index register (the index register may be multiplied by a factor of 2, 4, or 8). The offset which results from adding these components is called an effective address. Each of these components may have either a positive or negative value.
The displacement component, because it is encoded in the instruction, is useful for relative addressing by fixed amounts, such as: location of simple scalar operands, beginning of a statically allocated array, and offset to a field within a record.
The base and index components have similar functions. Both use the same set of general registers. Both can be used for addressing which changes during program execution, such as: location of procedure parameters and local variables on the stack, the beginning of one record among several occurrences of the same record type or in an array of records, the beginning of one dimension of multiple dimension array, or the beginning of a dynamically allocated array.
The uses of general registers as base or index components differ in the following respects: the ESP register cannot be used as an index register. When the ESP or EBP register is used as the base, the SS segment is the default selection. In all other cases, the DS segment is the default selection.
The scaling factor permits efficient indexing into an array when the array elements are 2, 4, or 8 bytes. The scaling of the index register is done in hardware at the time the address is evaluated. This eliminates an extra shift or multiply instruction.
The base, index, and displacement components may be used in any combination; any of these components may be null. A scale factor can be used only when an index also is used. Each possible combination is useful for data structures commonly used by programmers in high level languages and assembly language. Suggested uses for some combinations of address components are described below.
The displacement alone indicates the offset of the operand. This form of addressing is used to access a statically allocated scalar operand. A byte, word, or doubleword displacement can be used. The offset to the operand is specified indirectly in one of the general registers, as for "based" variables.
A register and a displacement can be used together for two distinct purposes. First, index into static array when the element size is not 2, 4, or 8 bytes. The displacement component encodes the offset of the beginning of the array. The register holds the results of a calculation to determine the offset to a specific element within the array. Second, access a field of a record. The base register holds the address of the beginning of the record, while the displacement is an offset to the field.
An important special case of this combination is access to parameters in a procedure activation record. A procedure activation record is the stack frame created when a subroutine is entered. In this case, the EBP register is the best choice for the base register, because it automatically selects the stack segment. This is a compact encoding for this common function.
(INDEX*SCALE)+DISPLACEMENT. This combination is an efficient way to index into a static array when the element size is 2, 4, or 8 bytes. The displacement addresses the beginning of the array, the index register holds the subscript of the desired array element, and the processor automatically converts the subscript into an index by applying the scaling factor.
BASE+INDEX+DISPLACEMENT. Two registers used together support either a two-dimensional array (the displacement holds the address of the beginning of the array) or one of several instances of an array of records (the displacement is an offset to a field within the record).
BASE+(INDEX*SCALE)+DISPLACEMENT. This combination provides efficient indexing of a two-dimensional array when the elements of the array are 2, 4, or 8 bytes in size.
Once again, all instruction encodings are subsets of the general instruction format shown in FIG. 2. Instructions consist of optional instruction prefixes, one or two primary opcode bytes, possibly an address specifier consisting of the ModR/M byte and the SIB (Scale Index Base) byte, a displacement, if required, and an immediate data field, if required.
Smaller encoding fields can be defined within the primary opcode or opcodes. These fields define the direction of the operation, the size of the displacements, the register encoding, or sign extension; encoding fields vary depending on the class of operation.
Most instructions that can refer to an operand in memory have an addressing form byte following the primary opcode byte(s). This byte, called the ModR/M byte, specifies the address form to be used as indicated above. Certain encodings of the ModR/M byte indicate a second addressing byte, the SIB (Scale Index Base) byte, which follows the ModRIM byte and is required to fully specify the addressing form.
Addressing forms can include a displacement immediately following either the ModR/M or SIB byte. If a displacement is present, it can be 8-, 16- or 32-bits.
If the instruction specifies an immediate operand, the immediate operand always follows any displacement bytes. The immediate operand, if specified, is always the last field of the instruction.
The following are the allowable instruction prefix codes:
The following are the segment override prefixes:
The ModR/M and SIB bytes follow the opcode byte(s) in many of the processor instructions. They contain the following information:
FIG. 3 shows the formats of the ModR/M and SIB bytes. The values and the corresponding addressing forms of the ModR/M and SIB bytes are shown in Tables 27-2, 27-3, and 27-4 of the Intel486.TM. Processor Family Programmer's Reference Manual. The 16-bit addressing forms specified by the ModR/M byte are in Table 27-2 of the same manual. The 32-bit addressing forms specified by the ModR/M byte are in Table 27-3 of the same manual. Table 27-4 of the same manual shows the 32-bit addressing forms specified by the SIB byte.
The ModR/M byte 240A contains three fields of information:
The based indexed and scaled indexed forms of 32-bit addressing require the SIB byte 250A. The presence of the SIB byte is indicated by certain encodings of the ModR/M byte. The SIB byte then includes the following fields:
FIG. 4 is high level block diagram of the decode stage of a conventional microprocessor. The line buffer 112 is shown at the top of the diagram. The decode stage pulls instructions for decoding only out of the line buffer 112. Accordingly, line buffer 112 is the data interface between the prefetch stage and the decode stage. The prefetch buffer is a 16 byte circular buffer. The decode stage pulls instructions bytes out of byte positions in the line buffer 112 as dictated by a series of instruction pointers generated by instruction pointer generating circuits 1100.
The instruction pointers include a demand-instruction pointer (DIP), a temporary instruction pointer (TIP), an opcode length pointer (TIPOPLEN), and a shift pointer (TIPSHIFT). The DIP is generated each clock cycle to point to the linear address of the first byte of the instruction currently being operated on by the decode stage. The three least significant bytes of the DIP identify the particular byte position in the line buffer at which that first instruction byte exists (because the buffer width is 8 or 2.sup.3 bytes).
The TIP is generated each cycle to point to the first byte of the instruction which has not yet been consumed by the decode stage. The three LSB's of the TIP point to the byte position in the line buffer 112 of the first byte of the instruction which has not yet been consumed. The TIP will be the same as the DIP at the beginning of an instruction. The TIP OPLEN pointer is set to the sum of the TIP pointer and the opcode length so that it points to the first byte of constant data. The three LSB's of the DIPOLEN point to the byte position in the line buffer 112 of the first byte of constant data, if any of the instruction.
The TIPSHIFT pointer points to the address to which the TIP will be updated when the decode stage consumes bytes. The three LSB's of the TIPSHIFT point to a byte position in the line buffer of the byte to which the TIP will be updated. TIPSHIFT is controlled by one of the following depending on the portion of an instruction that is currently being operated on by the decode stage. Particularly, TIPSHIFT will be (1) 1, (2) the TIP pointer plus the length of the opcode subfield in the first operand, if any, of the instruction currently in the decode stage, or (3) the TIP pointer plus the length of the second operand of the instruction currently in the decode stage.
In the terminology of this specification, bytes are "consumed" by the decode stage when they are passed through the data instruction circuits 1102 and 1104 to the opcode assembly circuit 1106 and/or constant data ports 1108, 1110 and 1112, where they are either decoded or stored in shadow registers as described more fully below. If an instruction has any prefix bytes, they are consumed one per cycle. All opcode bytes as well as all bytes of the first operand in the constant subfield, if any, are consumed simultaneously in one cycle following the decoding of the last prefix byte. If there is a second operand in the constant subfield, all of its bytes are consumed simultaneously in one subsequent cycle.
The conventional decode stage comprises an opcode data instruction circuit 1102. It is an eight byte to three byte instruction circuit period. It takes in all eight bytes from the line buffer 112 and selects the byte position in the line buffer pointed to by the TIP pointer and the following bytes in a circular Q fashion. The conventional opcode instruction circuit 1102 includes circuitry for determining if the first byte pointed to by the TIP pointer is an opcode byte or a prefix byte. If it is a prefix byte and it is tagged valid, it is forwarded to opcode assembly circuit 1106, where it is directed into prefix decoding logic circuit 1116. Control decode logic 1116 sets a flag in prefix flag register 1114 corresponding to the information conveyed by the prefix byte. If the byte position in the line buffer 112 pointed to by the TIP is not tagged valid, the decode stage simply stalls until a subsequent cycle in which it is tagged valid.
The above background describes a conventional decode circuit for microprocessors. In the quest for a greater processor speed, it is an object of the present invention to further increase the number of instructions decoded per cycle.
It is a further object of the present invention to increase the performance of the decode stage in such a modular manner that the improvement is flexible and scaleable, i.e., by adding width or depth to the number of instructions that can be handled simultaneously (via parallelling or pipelining the decode stage). In other words, it is a further object of the present invention to allow scaling of the system in that multiple decoders can be added to increase performance through parallel processing.
It is a further object of the present invention to fine tune the complete behavior of every instruction in the instruction set for maximum performance, faster decode and more efficient decode of the instruction.
It is a further object of the present invention to increase the decoder's performance without significantly increasing the memory requirements of the decode stage.
It is a further object of the present invention to increase the decoder's performance in a flexible and scaleable manner without significantly increasing the complexity of the decode stage and without adding extra cost to the system.
It is a further object of the present invention to maximize performance of the decode of memory operand instructions that do not behave in a manner similar to the register operand instructions.