Advanced CPU's and embedded processors are achieving higher performance as time goes on. However, memory subsystems are requiring lower latency and more bandwidth to sustain performance. Dynamic random access memory (DRAM), for example, is getting faster in clock speed, wider in bus size, and larger in capacity. CPU and bus master I/O devices are competing for access to the memory subsystems, in terms of both latency and bandwidth, to perform read and write operations.
A CPU is the computing and control hardware element of a computer-based system. In a personal computer, for example, the CPU is usually an integrated part of a single, extremely powerful microprocessor. An operating system is the software responsible for allocating system resources including memory, processor time, disk space, and peripheral devices such as printers, modems, and monitors. All applications use the operating system to gain access to the resources as necessary. The operating system is the first program loaded into the computer as it boots up, and it remains in memory throughout the computing session.
Typical PC systems use either 64-bit or 128-bit DRAM memory subsystems. In the latter case, the memory subsystem is usually organized as two independent sections so as to be controlled by two 64-bit memory controllers (MC). A typical 64-bit memory controller (MC) may support between two and four SDRAM dual in-line memory modules (DIMM) that make up the memory subsystem. Each DIMM has up to two memory rows (each side of a double-sided DIMM is called a memory row), and each memory row may have multiple internal memory banks. Each bank comprises multiple memory pages, one page from each DRAM chip of the memory row.
An operating system keeps track of the percentage of time that the CPU is idle and writes the idle percentage value to a register. For example, the CPU may have been idle for about 40% of a last predefined time period. Different operating systems use different windows of time to compute the idle percentage value. Older operating systems have longer idle loops. Newer operating systems have shorter idle loops in order to accommodate as many tasks as possible running simultaneously.
In most systems, the performance of the processor may be altered through a defined “throttling” process and through transitions into multiple CPU performance states. Throttling is a type of forced power management. The CPU may be put to sleep for short periods of time even when the system is highly active. Throttling helps manage power consumption of the CPU.
Certain CPU power management schemes are known which use statistical methods to monitor CPU host interface (sometimes known as Front-Side Bus) activities to determine average CPU percent utilization and set the CPU throttling accordingly. However, advanced CPUs incorporate large cache memory that hide greater than 90% of the CPU activities within the CPU core. Therefore, the FSB percent utilization has little correlation to the actual core CPU percent utilization. As a result, prior implementations cannot correctly predict idle states of CPUs with super-pipelined architectures and integrated caches. Cache is a section of very fast memory (often static RAM) reserved for the temporary storage of the data or instructions likely to be needed next by the processor.
High performance I/O devices often employ bus-mastering mechanisms to minimize CPU overhead. A bus master is a device within a CPU-based and memory-based system that may access the memory without using the CPU. If it is not known, in a most effective way, when the CPU may be powered down, then it is not known when the CPU may issue any additional read/write accesses to memory. Therefore, other bus master I/O devices may not have as timely access as possible to the memory subsystem.
FIG. 1 shows a typical, simple round robin (RR) arbiter. In such an arbiter, the next memory access passes to the next device in the arbitration chain (e.g., CPU to AGP graphics device to southbridge (SBR) device to CPU).
In practice, CPU's tend to be latency-sensitive while I/O devices tend to be bandwidth-sensitive. As a result, typical arbitration algorithms have been designed to grant CPU accesses to memory with the shortest possible latency while ensuring sufficient bandwidth for I/O devices. Depending on the operating system and application environments, a weighted round robin (WRR) arbiter is often used (see FIG. 2). The weight (priority) of the CPU and, for example, an AGP graphics device can be programmed through a register setting to balance the memory latency and bandwidth between, for example, the CPU, the AGP graphics device, and a southbridge (SBR) device. A next access is based, in part, on the weighting (priority) given to the CPU and the AGP graphics device.
It is desirable to improve bus master performance of memory accesses without degrading CPU performance.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with embodiments of the present invention as set forth in the remainder of the present application with reference to the drawings.