1. Field of the Invention
The present invention relates to computer system operations, in particular a cache memory controller and associated software driver modules to accelerate data transfers between an on-chip cache and a memory.
2. Description of Related Art
In some computer systems, including multicore processors systems, various application accelerator processors or application accelerator processor cores are specialized for particular computationally intensive applications. Examples include video codec processors to perform video compression and decompression operations in real time, image sensor processors to perform compression or processing of image sensor data, and display processors to format image data for display on a display device. The computations for these applications may require reads and writes of large amounts data between on-chip cache memory and off-chip DRAM (dynamic random access memory). The data transfer bandwidth may present a major bottleneck, particularly for real time applications such as video. Memory is generally organized hierarchically. The memory hierarchy can include a relatively small first level (L1) cache memory and a larger second level (L2) cache memory on the same integrated circuit as the processor core circuitry, along with off-chip, large scale memory implemented often using DRAM. In some configurations, a third level (L3) cache can be included on-chip. Other memory can be used for sharing data among processor cores, such as shared cache memory and message-passing memory. Additional memory in the hierarchy can include persistent stores, such as flash memory, magnetic disk drive memory, network-attached storage and so on. Given the variety of memory technologies, the organization of memory systems is very diverse.
There are many varieties of computer system architectures, each of which can include different memory system configurations. The co-owned and co-pending U.S. patent application Ser. No. 12/891,312, entitled “Enhanced Multi-Processor Waveform Data Exchange Using Compression and Decompression,” filed 27 Sep. 2010 (US 2011/0078222), which is incorporated by reference as if fully set forth herein, describes several computer system architectures, and demonstrates the variety architectures and memory configurations. The commonly owned non-provisional patent application Ser. No. 13/534,330 (the '330 application), filed Jun. 27, 2012, entitled “Computationally Efficient Compression of Floating-Point Data,” incorporated herein by reference, describes several embodiments for compression floating-point data by processing the exponent values and the mantissa values of the floating-point format. The commonly owned non-provisional patent application Ser. No. 13/617,061 (the '061 application), filed Sep. 14, 2012, entitled “Conversion and Compression of Floating-Point and Integer Data,” by Wegener, incorporated herein by reference, describes algorithms for converting floating-point data to integer data and compression of the integer data. The commonly owned non-provisional patent application Ser. No. 12/605,245 (the '245 application), entitled “Block Floating Point Compression of Signal Data,” incorporated herein by reference, publication number 2011-0099295, published Apr. 28, 2011, describes efficient bit packing for integer samples. The commonly owned non-provisional patent application Ser. No. 13/358,511 (the '511 application), filed Jan. 25, 2012, entitled “Raw Format Image Data Processing,” incorporated herein by reference, describes compression of raw format image data at least as fast as the image data rate. The commonly owned patent application Ser. No. 13/617,205 (the '205 application), filed Sep. 14, 2012, entitled “Data Compression for Direct Memory Access Transfers,” by Wegener, incorporated herein by reference, describes providing compression for direct memory access (DMA) transfers of data and parameters for compression via a DMA descriptor.
As processor performance has improved, processors are executing programs over larger and larger data sets. Also, one processor or group of processors may concurrently execute many programs, each of which requires access to different sizes and types of data sets. For example, broad varieties of application programs acquire, collect, process, and display numerical data. Numerical data includes a variety of data types, such as integers, floating-point numbers, image data, video data, and graphics objects. Numerical data can be accumulated in large files, or acquired at high speeds, and movement of such data among elements of processor system memory hierarchies can cause bottlenecks in system performance.
Thus, the amount of memory available, in terms of the number of bytes, at each element of a memory system for a given computer system, and the bandwidth of the data channels among the elements of the memory system, can limit the efficiency and speed with which a given program can be executed. Given the variant computer systems architectures and variant memory system configurations, the control of data flow among the memory elements is often implemented in a platform-specific manner. This platform-specific memory management interferes with users' ability to individually manage data flow to improve the efficiency of accessing memory resources in a given computer system.
It is desirable to provide technologies that increase the effective bandwidth for data transfers between on-chip and off-chip components of the memory system in computer systems in a manner that is transparent to the application program.