1. Field of the Invention
Embodiments of the present invention relate generally to the field of computing devices and more specifically to a technique for reducing power consumed during frame updates through compression and local storage of display and cursor data.
2. Description of the Related Art
High performance mobile computing devices typically include high performance microprocessors and graphics adapters as well as large main memories. Since each of these components consumes considerable power, the battery life of a high performance mobile computing device is usually quite short. For many users, battery life is an important consideration when deciding which mobile computing device to purchase. Thus, longer battery life is something that sellers of high performance mobile computing devices desire.
As mentioned, the graphics adapters found in most high performance mobile computing devices consume considerable power, even when performing tasks like refreshing the screen for display. For example, a typical graphics adapter may refresh the screen twenty to sixty times per second. For each screen refresh, the graphics adapter usually reads several blocks of display data store in main memory, creates a frame from this display data, and then transmits the frame for display. Transmitting the read requests from the graphics adapter to the main memory consumes power, reading the blocks of display data from main memory consumes power, and creating the frame as well as transmitting the frame for display consumes power. Further, this sequence of events usually involves several intermediate logic blocks, such as a bus controller and a memory controller, each of which also consumes power.
FIG. 1 illustrates a prior art mobile computing device 100 that uses display data stored in main memory to refresh the screen. As shown, the computing device 100 includes a graphics processing unit (“GPU”) 102, a Hyper Transport™ (“HT”) bus 108, a microprocessor 104 and a main memory 106. The GPU 102 is coupled to the HT bus 108 through a bus interface 130, the microprocessor 104 is coupled to the HT bus 108 through a bus interface 132, and the main memory 106 is coupled to the microprocessor 104 through a memory interface 136. Additionally, the GPU 102 includes a Frame Buffer Unified Memory Architecture (“FB UMA”) 110, a Fast PCI™ Bus Interface (“FPCI”) 112 and display logic 114, where the FB UMA 110 includes arbitration logic 116, unrolling logic 118, tiling logic 120 and control logic 115. Control logic 115 may include firmware or software, and is coupled to arbitration logic 116, unrolling logic 118, tiling logic 120, the FPCI 112 and display logic 114 through interfaces that are not shown in FIG. 1. The FPCI 112 is coupled to tiling logic 120 and display logic 114 through interfaces 126 and 128, respectively. Unrolling logic 118 is coupled to tiling logic 120 and arbitration logic 116 through interfaces 124 and 122, respectively. Display logic 114 is coupled to arbitration logic 116 through interface 127. The microprocessor 104 includes a memory controller 134. Finally, a software driver 140 as well as display and cursor data 138 are stored in the main memory 106.
Refreshing the screen begins with display logic 114 requesting arbitration logic 116 to read some or all screen addresses, defined by line and pixel coordinates, from the display data 138 in the main memory 106. This request causes arbitration logic 116 to schedule a read operation. Arbitration logic 116 prioritizes all outstanding read and write requests within the FB UMA 110 and transmits requests to unrolling logic 118 in order of priority. For example, since display logic 114 uses the current display data 138 to refresh the screen within a fixed time period (e.g., one-twentieth to one-sixtieth of a second), read operations contributing to screen refresh are assigned a high priority by arbitration logic 116 based on that fixed time constraint. Alternatively, other read or write operations that are not under timing constraints are assigned a lower priority by arbitration logic 116.
Once arbitration logic 116 prioritizes and transmits the high priority read operation through the interface 122 to unrolling logic 118, control logic 115 directs unrolling logic 118 to unroll the read operation into a series of smaller (e.g., 64B) read operations that are small enough for the HT bus 108 to perform in a single bus transaction. In a subsequent step of the overall read operation, the result of these smaller read operations are combined into the single, contiguous and ordered data block originally requested by display logic 114. For example, if display logic 114 requests control logic 115 to perform a high priority read operation of pixels from the cursor and display data 138, and arbitration logic 116 transmits that operation to unrolling logic 118, unrolling logic 118 will unroll the pixel read operation into a series of smaller read operations.
After unrolling logic 118 unrolls the read operation into smaller read operations, control logic 115 directs unrolling logic 118 to transmit those smaller read operations through the interface 124 to tiling logic 120. Control logic 115 then directs tiling logic 120 to determine the physical memory address for each smaller read operation based on the screen address associated with the smaller read operation initially requested by display logic 114. Control logic 115 also directs tiling logic 120 to transmit each smaller read operation with its corresponding physical address through the interface 126 to the FPCI 112.
For each smaller read operation received by the FPCI 112, the FPCI 112 transmits a read request to the memory controller 134 within the microprocessor 104 through the interface 130, the HT bus 108 and the interface 132. However, if the HT bus 108 is in power savings mode before the FPCI 112 transmits the read request to the memory controller 134, the FPCI 112 brings the HT bus 108 out of power savings mode before transmitting the request. Once one or more read requests are transmitted to the memory controller 134, the memory controller 134 reads the requested data from the main memory 106 through memory interface 136 and transmits the data to the FPCI 112. As is well-known, the memory controller 134 frequently transmits the data back to the FPCI 112 out-of-order relative to the order of read requests transmitted by the FPCI 112 to the memory controller 134. Since display logic 114 expects contiguous and ordered display data to create the frame properly, the FPCI 112 reorders and combines the smaller blocks of data received from the memory controller 134 into a single, contiguous and ordered data block that is transmitted through the interface 128 to display logic 114, which then creates the frame accordingly.
As previously described, one drawback of the foregoing process is that read operations between the GPU 102 and the main memory 106 may consume substantial power, which can reduce the battery life for mobile computing devices. More specifically, each read operation consumes power due to transmitting a read request from the FPCI 112 to the memory controller 134 through the HT bus 108 and transmitting a read response from the memory controller 134 to the FPCI 112 through the HT bus 108. Additionally, if either the HT bus 108 or memory controller 134 is in power saving mode before transmitting a request or response, bringing the HT bus 108 or the memory controller 134 out of power saving mode consumes additional power. Further, as is commonly known, reading display data from the system memory 106 consumes substantial power both in the main memory 106 and in the memory controller 134. Thus, over the course of many screen refreshes, substantial battery power is consumed.
As the foregoing illustrates, what is needed in the art is a way to reduce the amount of battery power consumed by a mobile computing device when refreshing the screen.