The present invention is directed to the compression of waveform data for data transfers among computing cores and for data writes to memory and later decompression upon data reception at a computing core or data reads in a multiple core processing architecture, especially the compression of both integer and floating-point numerical data types. The present invention supports the selection of lossless, fixed-rate, or fixed-quality compression modes on all data types.
In waveform data processing applications, the central processing unit (CPU) of a microprocessor or other signal processing fabric performs arithmetic and logical operations on waveform data values under the control of a stored program in order to transform waveform data values in an application-specific way. Input, intermediate, and output waveform data values are retrieved from storage, memory or input devices, processed, and provided to storage, memory or output devices. The waveform data may be represented by integer and floating-point numerical data types. Examples of such waveform data processing applications include but are not limited to:
receiving and transmitting mobile telephone signals in a cellular telephone,
recording and playing audio in a portable audio player,
retrieving compressed video from a DVD, decompressing the compressed video, and transmitting the decompressed video to a display device,
recording and playing back digitized speech in a voice recorder, and
simulating chemical, molecular, electrical, or biological processes.
The waveform data processing industry is composed of a staggering number of manufacturers who offer a broad range of waveform data processing engines and waveform data storage devices. Waveform data processing engines are most often implemented using a digital signal processor (DSP)-enabled CPU that supports multiply-accumulate (MAC) operations using dedicated assembly language instructions such as MPY and MAC. Companies offering CPUs that have MPY and MAC instructions for waveform processing applications include Intel Corporation (the x86 instruction set family of processors, including the Pentium, Nehalem, Itanium, Larrabee, and other processors), Nvidia graphics processing units (GPUs), Advance Micro Devices (AMD) family of x86-compatible CPUs, AMD/ATI GPUs, Texas Instruments (the TMS320 DSP family), Analog Devices (the Blackfin, TigerSharc, SHARC, and ADSP-21xx families), Motorola (the PowerPC and 56xxx families), ARM (the Cortex, ARM7, ARM9, ARM10, and ARM11 families), MIPS Technology (the R2000 through R16000, MIPS16, MIPS32, MIPS64, and MIPS DSP families), Microchip (the dsPIC family), IBM (the PowerPC family), and many others. Waveform data processing applications can also be implemented using a programmable fabric of logic, arithmetic, and storage elements in a field-programmable gate array (FPGA). Companies offering FPGAs that are used for waveform data processing applications include Altera (the Cyclone, Arria, and Stratix families), Xilinx (the Spartan and Virtex families), Actel (the Axcelerator and ProASIC families), Lattice (the XP, ECP, and SC families), and many others. Waveform data processing applications can also be included in application-specific integrated circuits (ASICs) that are designed to perform specific waveform data processing operations. ASIC vendors include TSMC, UMC, IBM, LSI Logic, and many others.
The DSP, FPGA, ASIC, and memory market segments are all sub-segments of the semiconductor industry. The terms “memory” and “storage” are used interchangeably in the following description for devices and subsystems that temporarily or permanently store integer or floating-point sampled data values used in waveform data processing applications. Waveform data memories may include the following semiconductor categories: static random access memories (SRAM), dynamic random access memories (DRAM), double and quadruple data rate random access memories (DDR and QDR), flash memories, solid state drives (SSD), flash drives, disk drives, ferro-magnetic random access memories (FRAM), cache memories, and any other future semiconductor memories used to store waveform data. Companies making semiconductor memory or storage devices include SRAM manufacturers include Cypress, Dallas Semiconductor, Honeywell, Hynix, IDT, Micron, Mitsubishi, NEC, Renesas, Sharp, Sony, Toshiba, UTMC/Aeroflex, White Electronic Design, and others; DRAM manufacturers Samsung, Hynix, Micron, Elpida, Nanya, Qimonda, ProMOS, Powerchip, and others; flash memory manufacturers include Samsung, Toshiba, Intel, ST Microelectronics, Renesas, Hynix, and others; FRAM manufacturers include Fujitsu, Ramtron, and Samsung.
In this description, “waveform data processing applications” include applications that perform mathematical and/or logical operations on sampled data waveforms. Sampled data waveforms are often (but not exclusively) obtained by digitizing real-world analog signals such as speech, audio, images, video, or other sensor output signals using an analog-to-digital converter (ADC). Sampled data signals can also be simulated and can either be fed directly, or after additional waveform data processing operations, to a digital-to-analog converter (DAC) in order to generate analog speech, audio, images, or video signals. In this description, the term “sampled data waveforms” also includes such intermediate and/or final sampled data waveforms generated from mathematical and/or logical operations performed upon input or intermediate sampled data waveforms.
Waveform data are preferentially stored in two primary numerical formats: integer formats and floating-point formats. Integer formats represent waveform data using signed, unsigned, or sign-and-magnitude representations, where the width of the sampled data value is typically fixed. Common integer formats suitable for waveform data processing are 8-bit and 16-bit signed integers in the range {−128, +127} and {−32768, +32767}, respectively, and 8-bit and 16-bit unsigned integers in the range {0, 255} and {0, 65535}, respectively. Alternately, waveform data may be represented in 32-bit, 64-bit, and 128-bit floating-point formats. The most common floating-point formats conform to the IEEE-754 standard for floating-point values. The IEEE-754 standard was originally issued in 1985 and was subsequently updated in 2008. The IEEE-754 standard represents 32-bit floating-point values (also called “floats” or “single-precision floats”) using one sign bit, 8 exponent bits, and 23 mantissa bits. The IEEE-754 standard represents 64-bit floating-point values (also called “doubles” or “double-precision floats”) using one sign bit, 11 exponent bits, and 52 mantissa bits. Other floating-point representations exist, such as 16-bit “half floating point,” but operations on floats and doubles is usually supported in a CPU or DSP processor with dedicated floating-point circuitry. Such circuitry is often called a floating-point unit or FPU. In many applications floating-point calculations are much faster, and consume much less power, when the floating-point data are represented in single-precision format, rather than double-precision format.
Storage devices used in waveform data processing applications exhibit varying access times. The fastest storage elements, with access times below 10 nsec, are usually SRAMS that can be fabricated on the same semiconductor die or integrated circuit (IC) with the processor cores. Such SRAM storage is called cache memory, on-chip memory, or register files. The slowest semiconductor storage elements are typically flash memories, with access times to individual sampled data elements in the 100 nsec to 1 microsec range. Flash memory writes are slower than flash memory reads. Memory technologies are commonly arranged in a hierarchy, with the fastest storage elements nearest the CPU or DSP processing fabric, with slower storage elements layered around the faster storage elements. The terms “on-chip” and “off-chip” are adjectives used to characterize the proximity of storage to the CPU or processing fabric. On-chip storage is on the same semiconductor substrate, or packaged in the same multi-chip module (MCM) as the CPU or processing fabric. Off-chip storage is located on a separate integrated circuit (IC) from the CPU or processing fabric. Other slow storage elements include disk drives and tape drives, whose access times are tens of msec and whose data rates are typically 100 MB/sec or lower.
Given the layered hierarchy of memory used in waveform data processing applications, it is a continuing goal of applications that process waveform data to improve the CPU or signal processing fabric's access time to sampled data stored in memory. A secondary goal is to reduce the latency between CPU or signal processing fabric requests for waveform data and the appearance of that data in memory (typically cache or register file) that is directly accessible to the CPU or signal processing fabric. A third goal is to reduce the complexity of the fabric that connects waveform data processor cores to their memory hierarchy.
Techniques exist for compressing and decompressing both instructions and data in waveform processing applications. Many compression or encoding techniques can accept data in only one waveform data format, for example integer data or floating-point data, but not both. Similarly, many compression or encoding techniques offer only one compression mode, such as lossless mode or lossy mode, but not both. Many compression or encoding techniques are only applicable to a certain class of waveform data such as speech, audio, images, or video, but do not provide sufficient compression on other classes of waveform data. Many compression or encoding techniques operate on (address, data) pairs, which are typically found in memory controllers for SRAM, DRAM, or flash.
In a multi-core waveform processing system, many types of waveform data may be represented using different data formats. The programs for the particular application typically define the data format. The purpose of multi-core processing architectures is to perform computationally intensive operations, generally on high volumes of data. There is a need for compression of the waveform data for transmission among the computing cores and between the cores and memory to enable rapid transfer of high volumes of data in compute-intensive applications.
This description uses the terms integrated circuit (IC) and chip interchangeably to refer to a single package with electronic or optical connections (pins, leads, ports, etc.) containing one or more electronic die. The electronic die, or semiconductor die, is a semiconductor substrate that includes integrated circuits and semiconductor devices. The die may have a single core or a plurality of cores. The core may be a processing unit for any type of data processor. For example, a processor core may be a central processing unit (CPU), a digital signal processor (DSP), a graphics processing unit (GPU), a microcontroller unit (MCU), a communications processor or any type of processing unit. The individual cores on a single die may be the same type of processing unit or a combination of different types processing units appropriate for the application. Such processing units may include (but are not limited to) a memory controller, a direct memory access (DMA) controller, a network controller, a cache controller, and a floating-point unit (FPU). Such processing units may be integrated on the same die with one or more processor cores or may be on a separate die from the processor cores.
In this description, “real time” applied to compression means that a digital signal is compressed at a rate that is at least as fast as the sample rate of a digital signal. The attribute “real time” can also describe rates for processing, transfer and storage of the digital signal, as compared to the original signal acquisition rate or sample rate. The sample rate is the rate at which an ADC or DAC forms samples during conversion between digital and analog signals. The bit rate of an uncompressed sampled, or digital, signal is the number of bits per sample multiplied by the sample rate. The compression ratio is the ratio of the bit rate of the original signal samples to the bit rate of the compressed samples. In a waveform data processing application that simulates the function of a real-time system, the sequence of operations performed on the sequence of waveform data values may be identical to a real-time processing sequence, but the rate at which the processing is performed may be slower than “real time.” This description refers to such applications as simulated waveform data processing applications.
This description refers to various communications fabrics. A communications fabric is any connection between two processing cores that allows two or more cores to communicate with each other. Examples of communications fabrics include a bus, a network, the traces on a printed circuit board, a wireless link including a transmitter and a receiver, a switch, a network interface card (NIC), a router, a network-on-chip, or any other wired or wireless connection between two processor cores.
This description refers to lossless and lossy compression. In lossless compression, the decompressed samples have identical values to the original samples. In some applications, lossy compression may be necessary to provide sufficient bit rate reduction. In lossy compression, the decompressed samples are similar, but not identical, to the original samples. Lossy compression creates a tradeoff between the bit rate of the compressed samples and the distortion in the decompressed samples.