Computers are ubiquitous in today's society. They come in all different varieties and can be found in places such as automobiles, the grocery store, banks, personal digital assistants, cell phones, as well as in many businesses. As will be appreciated by almost anyone owning a computer, the software being used on successive generations of computers continues to become more and more sophisticated. This increasing sophistication is due, in part, to the availability of computers with complex instruction sets, sometimes referred to as complex instruction set computers (CISC). Examples of CISC instructions include the x86 instruction set architecture (ISA), which is widely available today.
Complex instruction sets, such as the x86 ISA, may include complex instructions that do not execute efficiently in modern microprocessors because they take multiple processor cycles to execute. In order to process computer programs as quickly as possible, it is often the goal of a microprocessor to execute instructions in a single clock cycle. Thus, most modern microprocessors may decode the complex instructions into a series of more manageable microcode-operations (μ-ops) that execute in a single cycle. These μ-ops may be stored in a read only memory (ROM) that is integrated within the processor and referred to as a μcode ROM. A hardware sequencer, which also may be integrated within the microprocessor, may track the μ-ops fetched from the μcode ROM and provide addresses for the next μ-op to be executed.
Many microprocessors also may implement hardware threading to reduce the amount of time that the microprocessor is idle. Since the μcode ROM and the hardware sequencer may be integrated within the microprocessor, where space is at a premium, there may only be a single μcode ROM and/or sequencer to be shared among multiple hardware threads. However, sharing a single μcode ROM among multiple threads may create problems. For example, the microprocessor may need to arbitrate between multiple threads to determine which thread can access the μcode ROM. Also, the microprocessor may need to coordinate address calculation by the sequencer among the many threads. These problems are only exacerbated as processor operating frequencies increase. That is, at high clock frequencies, the operations involved in arbitrating and selecting threads—e.g., thread arbitration, address generation, reading the instructions from the μcode ROM—may be difficult to complete in a single clock cycle.
Furthermore, as the microprocessor begins to have difficulty executing the complex multithreaded instructions within the time allotted by a single clock cycle, the physical placement of the μcode ROM and/or sequencer within the microprocessor also may begin to affect performance. For example, the μcode ROM and/or sequencer may be physically placed a far distance away from blocks of the microprocessor to which they deliver μ-op code. In general, the further two blocks are away from each other within the microprocessor, the more time it may take to deliver data between them. Thus, the physical placement of blocks may result in the operations of arbitrating and selecting threads taking more than a single clock cycle. Thus, methods and apparatuses are needed that address one or more of these problems.