1. Field of the Invention
The present invention is generally directed to computing devices (e.g., computers, embedded devices, hand-held devices, and the like). More particularly, the present invention is directed to memory used by processing units of such computing devices.
2. Background Art
A computing device typically includes one or more processing units, such as a central-processing unit (CPU) and a graphics-processing unit (GPU). The CPU coordinates the activities of the computing device by following a precise set of instructions. The GPU assists the CPU by performing data-parallel computing tasks, such as graphics-processing tasks and/or physics simulations which may be required by an end-user application (e.g., a video-game application). The GPU and CPU may be part of separate devices and/or packages or may be included in the same device and/or package. Further, each processing unit may be included in another larger device. For example, GPUs are frequently integrated into routing or bridge devices such as, for example, Northbridge devices.
There are several layers of software between the end-user application and the GPU. The end-user application communicates with an application-programming interface (API). An API allows the end-user application to output graphics data and commands in a standardized format, rather than in a format that is dependent on the GPU. Several types of APIs are commercially available—including DirectX® developed by Microsoft Corporation of Redmond, Wash.; OpenGL® and OpenCL maintained by Khronos Group. The API communicates with a driver. The driver translates standard code received from the API into a native format of instructions understood by the GPU. The driver is typically written by the manufacturer of the GPU. The GPU then executes the instructions from the driver.
In a conventional system, the CPU and GPU are each typically coupled to an external memory. The external memory may include instructions to be executed and/or data to be used by the CPU and/or GPU. The external memory may be, for example, a dynamic random-access memory (DRAM). The external memory can be configured to be quite large, thereby providing ample storage capacity to each processing unit to which it's coupled. Unfortunately, accessing the external memory may take several hundred clock cycles. Accordingly, an external memory may not provide memory sufficient bandwidth or fast memory access for high-end GPUs.
One potential solution for providing sufficient memory bandwidth to a GPU is to provide the GPU with an internal memory. The internal memory may be, for example, an embedded or stacked DRAM. Compared to external memory, an internal memory provides higher bandwidth, faster memory access, and consumes less power. However, the capacity of the internal memory cannot easily be scaled to meet the storage demands of high-end GPUs. For example, a high-end GPU may require more memory than can be included in an internal memory of the GPU.
Given the foregoing, what is needed is memory, and applications thereof, that provide both sufficient memory capacity (like external memory) and high bandwidth (like embedded memory).