This invention relates generally to computers and more particularly to memory processing within a processor of a computer.
Computers are known to include a central processing unit (CPU), system memory, video graphics processing circuitry, audio processing circuitry, and input/output (I/O) ports. The I/O ports enable the computer to interface with peripheral devices such as monitors, keyboards, mouses, printers, the Internet, local area networks, etc. The computer components work in concert to provide a user with a very powerful tool. In general, the system memory stores applications (e.g., word processing, spread sheets, drawing packages, web browsers) that are executed by the central processing unit and supporting by the co-processing elements (e.g., the video graphics and audio processing circuits).
As one would appreciate, the efficiency of the central processing unit, while processing applications and system level operations (e.g., power management, screen saver, system level interrupts, etc.), effect the overall efficiency of the computer. Accordingly, the architectural design of the central processing unit is critical and is continually being improved. Currently, the architecture of the central processing unit includes an instruction cache, a fetch module, an instruction decoder, an instruction issuance module, an arithmetic logic unit (ALU), a load store module, and a data cache. The instruction cache and data cache are used to temporarily store instructions and data, respectively. Once the instruction is cached, a fetch module retrieves it and provides it to the decoder. Alternatively, the fetch module may retrieve an instruction directly from main memory and provide it to the decoder, while simultaneously storing the instruction in the instruction cache.
The decoder decodes the instruction into microcode and, via the instruction issuance module, provides it to the ALU. The ALU includes an address calculation module, a plurality of integer operation modules, a plurality of integer operation modules, a plurality of floating point modules, and a plurality of multi-media operation modules to perform a plurality of operations. The integer modules may include two arithmetic/logic modules, shift modules, one multiply module and one divide module. The floating point modules include a floating point adder and a floating point multiplier. The multimedia module includes two multi-media arithmetic and logic modules, one multi-media shift module and one multi-media multiplier.
The elements of the CPU are operably coupled together via a CPU internal pipeline, which enhances the CPUs performance of transactions (e.g., read and/or write data operational codes, instruction operational codes, etc.). As such, multiple transactions are on the bus at any given time and proceed without issue unless a transaction encounters a miss (i.e., the data, or instruction being sought is not in the local data or instruction cache). When this occurs, the pipeline process stops for all transactions until the data or instruction needed is locally stored. Such stoppage, or stalling, of the pipeline process causes delays, which negatively impact the CPU""s efficiency.
Therefore, a need exists for a CPU pipelining process that minimizes stalling.