The present invention relates generally to computers, and more particularly to techniques for advanced processor design and operation.
The explosion of the Personal Computer industry has until recently been fueled primarily by the 68000 family incorporated into most Apple Macintosh personal computers and the x86 family incorporated into most IBM-PC compatible products (PCs). The IBM-PC quickly achieved a dominant position due to its open architecture that allowed a host of vendors to make compatible peripherals and system units. The PC is now a mass-market consumer item.
The initial IBM PC was designed around the 8088, an 16-bit internal, 8-bit external bus, microprocessor manufactured by Intel. Subsequent advances in microprocessor fabrication and design were incorporated into later microprocessors developed by NexGen, AMD, Cyrix, Intel, and others. Each of these microprocessors became the engines for more advanced versions of the IBM-compatible PC.
Early on it was recognized that the x86 basic architecture had a number of technical limitations. A number of these limitations are related to the x86""s use of a CISC (Complex Instruction Set Computer) Architecture. The CISC architecture requires that the processor hardware be built to execute a large number of complex instructions. Many of these instructions are infrequently used but have design consequences that slow down all instructions, due to, for example, complexities that must be introduced in the decoder and datapath timing. The hardware needed to implement these infrequently used instructions results in a poor use of silicon resourcesxe2x80x94resources which could be better used doing such things as aggressive instruction prefetch. Advanced microprocessor techniques such as pipelining and superscalar decoding are difficult to implement on a CISC-type architecture. Furthermore, it is difficult to design optimizing compilers that make effective use of more than just a subset of the CISC instructions. As a result, CISC processors are often not able to enjoy the benefits of advanced static instruction scheduling.
In addition, the x86 architecture includes a number of limitations above and beyond those inherent in CISC. Among these additional limitations are a limited number of on-chip registers, variable length instructions included in the instruction set, non-consistent field encodings, and a requirement that interrupts be precise (be generated and handled a determined number of instructions from the instruction that caused the interrupt). Additionally, the x86""s segmented memory architecture, protection mode features, and compatible paging make it especially difficult to apply advanced microprocessor techniques.
Some aspects related to the x86 architecture are discussed in two U.S. patents assigned to Intel: U.S. Pat. Nos. 4,972,338 and 5,321,836, which are incorporated herein by reference to the extent necessary to understand those parts of this disclosure related to the x86 architecture.
With respect to the microprocessors disclosed in ""338 and ""386, pipelining is really not done in the contemporary sense. However, instruction fetch and instruction execution are loosely coupled, permitting parallel instruction fetch and execution. Furthermore, microprocessors designed during the associated period typically relied heavily on microcode implementations that resulted in multiple cycles per instruction to execute and had a single conventional register file in the execution unit. Additional limitations include that the address generation is implemented with sequential generation of effective and intermediate addresses and the bus interface has a flat memory hierarchy with no explicit cache subsystem.
The Intel patents also discuss an architecture that is lacking in many advanced features. The architecture is scalar, meaning that there is just one integer add unit, for example. and that it must be accessed over a number of cycles to execute one instruction so that all of the adds associated with address computation, operand fetching, and the operation on the operand can be performed. It is believed that there is no pipeline within any of the functional blocks of the microprocessors discussed in the patents and no queues other that for instruction fetch. Execution of instructions is inorder and the use of hardware in the processor is sequential.
Despite these numerous limitations, the x86 has remained the industry standard primarily due to market factors. There currently exists in the market a massive installed base of both hardware and software compatible with the x86. Virtually all PC software is distributed in binary form only, so that end users can not reccompile their software to target different architectures. The typical user has a large investment in all software types: operating systems (OS), applications programs, device drivers, and utilities. Further major investments are likely to exist in multiple system units and compatible peripherals. As a result, the cost of switching to a new architecture can be quite daunting.
What is needed is a collection of microprocessor structures and techniques that permit an advanced microprocessor to maintain compatibility with the large installed base of x86 hardware and software while overcoming the limitations of the x86 architecture to provide performance competitive to other contemporary processors implementing other architectures.
An integrated circuit having a normal mode for operating under normal operating conditions and a debug mode for operating to test and debug the integrated circuit is provided. In one implementation, the integrated circuit includes a plurality of output pins that carry a first plurality of signals in the normal mode and carry a second plurality of signals in the debug mode. The integrated circuit may embody functionality of a microprocessor. The microprocessor may include logic circuitry for enabling the second plurality of signals to be output from a multiplexer to the output pins in response to a predetermined event, such as a hit in an associated memory unit.
In one embodiment, the invention contemplates a microprocessor having a normal mode for operation under normal operating conditions and a debug mode for operations to test and debug the microprocessor. The microprocessor comprises a cache memory unit, a plurality of execution units, and a plurality of output pins. The plurality of execution units is interconnected through a plurality of internal busses. The microprocessor further comprises multiplexer circuitry having a first input coupled to the plurality of internal busses and a second input coupled the cache memory unit, and test control logic for setting the multiplexer in either the normal mode or the debug mode. During the normal mode, external data requests resulting from misses in the cache memory unit are conveyed through the second input of the multiplexer circuitry to the output pins. During the debug mode, at least some signals of the plurality of internal busses are conveyed through the first input of the multiplexer circuitry to the output pins.