This invention is in the field of information and communications, and is more specifically directed to improved processes, circuits, devices, and systems for information and communication processing, and processes of operating and making them. Without limitation, the background is further described in connection with wireless communications processing.
Wireless communications of many types have gained increasing popularity in recent years. The mobile wireless (or “cellular”) telephone has become ubiquitous around the world. Mobile telephony has recently begun to communicate video and digital data, in addition to voice. Wireless devices, for communicating computer data over a wide area network, using mobile wireless telephone channels and techniques are also available.
The market for portable devices such as cell phones and PDAs (personal digital assistants) is expanding with many more features and applications. More features and applications call for microprocessors to have high performance but with low power consumption. Thus, keeping the power consumption for the microprocessor and related cores and chips to a minimum, given a set of performance requirements, is very important.
Wireless data communications in wireless local area networks (WLAN), such as that operating according to the well-known IEEE 802.11 standard, has become especially popular in a wide range of installations, ranging from home networks to commercial establishments. Short-range wireless data communication according to the “Bluetooth” technology permits computer peripherals to communicate with a personal computer or workstation within the same room.
Security is important in both wireline and wireless communications for improved security of retail and other business commercial transactions in electronic commerce and wherever personal and/or commercial privacy is desirable. Added features and security add further processing tasks to the communications system. These potentially mean added software and hardware in systems where cost and power dissipation are already important concerns.
Improved processors, such as RISC (Reduced Instruction Set Computing) processors and digital signal processing (DSP) chips and/or other integrated circuit devices are essential to these systems and applications. Reducing the cost of manufacture, increasing the efficiency of executing more instructions per cycle, and addressing power dissipation without compromising performance are important goals in RISC processors, DSPs, integrated circuits generally and system-on-a-chip (SOC) designs. These goals become even more important in hand held and mobile applications where small size is so important, to control the cost and the power consumed.
In high performance microprocessors, instructions often are fetched, decoded, and executed in assembly-line fashion, called a pipeline. The pipeline of a microprocessor has pipeline stages which perform processing on microprocessor instructions, which are analogous to places on a factory assembly line where processing work is performed on workpieces. In a microprocessor, instructions are often fetched in a predetermined order, and if an instruction conditionally or unconditionally specifies that the next instruction should be out of the usual order, then that event is called a branch.
Processors execute some set of instructions in assembly-line order by using a series of circuit stages collectively called a pipeline through which the operations actually sequentially occur to perform the operations represented by each instruction. The operation of each stage is arranged to take relatively little time, and the instructions can be processed rapidly at a high clock rate or processor speed.
Computer software has a list of instructions that represent operations that the processor is to perform or execute, often in list-wise order. However, some of the instructions, called branch instructions, represent directions to the processor to go somewhere else in the list of instructions to execute a succeeding instruction instead of to the next instruction in the list-wise order. Some of these branches are unconditional. Other branches depend on the existence or detection of some condition or event more or less near in time to the time when the branch is to be executed.
Branches present a challenge to pipeline processing of instructions. The most efficient processing of instructions occurs when every stage of the pipeline is operating on the instruction stream. The execution of a branch generally occurs in a later, or downstream, portion of the pipeline. The branch determines which instruction should subsequently come after the branch. The instructions currently being executed earlier in the pipeline may or may not be the ones determined to be subsequent instructions. If the instructions currently being executed earlier in the pipeline are the wrong ones, the operations performed in the earlier pipestages are irrelevant and need to be invalidated or flushed. These irrelevant operations waste time and power. The flush operation also consumes time and power. Then the correct subsequent instruction needs to be issued to the pipeline. The wasted operations are not made up or recovered.
For high performance purposes, a microprocessor may put instructions subsequent to a branch instruction into the pipeline to fetch, decode, and execute, even when the branch instruction has not yet been executed. This process is called branch prediction, which is a not-fully-certain prediction of whether a given branch instruction will take or not-take a branch. However, if a branch prediction is wrong, the instructions in the pipeline and any improvidently computed results from them will have to be “flushed” and replaced with a different sequence of instructions based on the actual branch determined when the branch instruction is actually executed. A pipeline flush entails a substantial amount of wasted time and degrades the performance which is so important in a high-performance microprocessor.
As microprocessor clock frequency has increased, execution pipelines have lengthened (deepened). Also, multiple instructions are “speculatively” issued to one or more pipelines, meaning that the instructions are issued on the uncertain assumption that the branch predictions are correct. In consequence, the importance of accurate branch prediction is increasing because ever more pipeline stages are in danger of being subject to wasted operations (“bubbles”) if any branch predictions are incorrect.
The term “branch prediction” as used herein refers to predicting either the state of a branch as taken or not-taken (and any additional states of the branch) or, depending on the context, predicting the succeeding address of an instruction which should succeed a given branch instruction. The succeeding address is called a “next” address herein if the succeeding address is obtained by automatic sequencing of an address counter such as by incrementing or decrementing by one. The succeeding address is called a “target address” or “target” herein when such address is out of program order and is established by what is called a “taken” branch instead of a not-taken branch. A “not-taken” branch goes to the next address established by automatic sequencing of a counter such as by incrementing (or decrementing) it. A branch prediction can point to the address of an instruction or to the address of a cache line for a cache memory or both. The term “cache line” is used herein to refer to information bits or a storing circuit for them that thereupon holds the information bits read from a line in a cache memory. A “storing circuit” means a flop, a register, a register file, a random access memory (RAM), or other suitable circuit for storing information.
Branch prediction circuitry of various types hitherto have been provided to predict the behavior of branch instructions in software with the goal of delivering instructions for execution in the pipeline that reflect the actual order of branching that will occur. However, the prior art approaches still fall short of the goal of perfect branch prediction imposing power dissipation problems, and introduce complexities for the pipestages, and limit processor speed.
Among other problems, it would be highly desirable to solve problems of how to more efficiently and economically perform branch prediction. These problems need to be solved with respect to CPI (cycles per instruction) efficiency and operating frequency and low power dissipation in superscalar, deeply pipelined microprocessors and other microprocessors.