1. Field of the Invention
The present invention relates to a computing system and, more particularly, to a computing system that uses computing processors residing in data storage devices to process data in a highly parallel fashion.
2. Description of the Related Art
A computing system generally includes a Central Processing Unit (CPU), a cache, a main memory, a chip set, and a peripheral. The computing system normally receives data input from the peripheral and supplies the data to the CPU where the data is to be processed. The processed data can then be stored back to the peripheral. The CPU can, for example, be an Arithmetic Logic Unit (ALU), a floating-point processor, a Single-Instruction-Multiple-Data execution (SIMD) unit, or a special functional unit. The peripheral can be a memory peripheral, such as a hard disk drive or any nonvolatile massive data storage device to provide mass data storage, or an I/O peripheral device, such as a printer or graphics sub-system, to provide I/O capabilities. The main memory provides less data storage than the hard drive peripheral but at a faster access time. The cache provides even lesser data storage capability than the main memory, but at a much faster access time. The chip set contains supporting chips for said computing system and, in effect, expands the small number of I/O pins with which the CPU can communicate with many peripherals.
FIG. 1 illustrates a conventional system architecture of a general computing system. In FIG. 1, block 10 is a CPU. Block 11 is a cache that has a dedicated high speed bus connecting to CPU for high performance. Block 12 is a chip set to connect CPU with main memory 13 and a fast peripheral 14 such as a graphics subsystem. Block 15 is another chip set to expand the bus, such as RS-232 or parallel port for slower peripherals. Note that the components discussed above are very general building blocks of a computing system. Those skilled in the art understand that a computing system may have different configurations and building blocks beyond these general building blocks.
An execution model indicates how a computing system works. FIG. 2 illustrates an execution model of a typical scalar computing system. Between a CPU 10 and a hard disk 17, there are many different levels of data storage devices such as main memory 13, a cache 11, and register 16. The farther the memory devices are positioned from the CPU 10, the more capacity and the slower speed the memory devices have. The CPU 10 fetches data from the hard disk 17, processes the data to obtain resulting data, and stores the resulting data into the various intermediate data storage devices, such as the main memory 13, the cache 11 or the register 16, depending on how often they will be used and how long they will be used. Each level of storage is a superset of the smaller and faster devices nearer to the CPU 10. The efficiency of this buffering scheme depends on the temporal and spatial localities. The temporal locality means the data accessed now are very likely to be accessed later. The spatial locality means the data accessed now are very likely to be accessed in the same neighborhood later. In today""s technology, the CPU 10, the register 16, and two levels of cache 11 are integrated into a monolithic integrated circuit.
FIG. 3 shows an execution model of a vector computer. A vector computer has an array of vector CPUs 210, an array of vector registers 216, a main memory 13, and a hard drive 17. The size of the vector array is usually a power of 2, such as 16 or 32, for example. The vector CPUs 210 fetch the data from the hard drive 17 through the main memory 13 to the vector registers 216 and then process an array of the data at the same time. Hence, the processing speed by the vector computer can be improved by a factor equal to the size of the array. Note that a vector computer can also have a scalar unit, such as the computer system described in FIG. 2, as well as many vector units such as those described in FIG. 3. Some vector computers also make use of caches.
A vector computer is able to exploit data parallelism to speed up those special applications that can be vectorized. However, vector computers replicate many expensive hardware components such as vector CPUs and vector register files to achieve high performance. Moreover, vector computers require very high data bandwidth in order to support the vector CPUs. The end result is a very expensive, bulky and power hungry computing system.
In recent years, logic has been embedded into memories to provide a special purpose computing system to perform specific processing. Memories that include processing capabilities are sometimes referred to as xe2x80x9csmart memoryxe2x80x9d or intelligent RAM. Research on embedding logic into memories has led to some technical publications, namely: (1) Duncan G, Elliott, xe2x80x9cComputational RAM: A Memory-SIMD Hybrid and its Application to DSP,xe2x80x9d Custom Integrated Circuit Conference, Session 30.6, 1992, which describes simply a memory chip integrating bit-serial processors without any system architecture considerations; (2) Andreas Schilling et al., xe2x80x9cTexram: A Smart Memory for Texturing,xe2x80x9d Proceedings of the Sixth International Symposium on High Performance Computer Architecture, IEEE, 1996, which describes a special purpose smart memory for texture mapping used in a graphics subsystem; (3) Stylianos Perissakis et al., xe2x80x9cScalable Processors to 1 Billion Transistors and Beyond: IRAM,xe2x80x9d IEEE Computer, September 1997, pp. 75-78, which is simply a highly integrated version of a vector computer without any enhancement in architecture level; (4) Mark Horowitz et al., xe2x80x9cSmart Memories: A Modular Configurable Architecture,xe2x80x9d International Symposium of Computer Architecture, June 2000, which describes a project to try to integrate general purpose multi-processors and multi-threads on the same integrated circuit chip; and (5) Lewis Tucker, xe2x80x9cArchitecture and Applications of the Connection Machines,xe2x80x9d IEEE Computer, 1988, pp. 26-28, which used massively distributed array processors connected by many processors, memories, and routers among them. The granularity of the memory size, the bit-serial processors, and the I/O capability is so fine that these processors end up spending more time to communicate than to process data.
Accordingly, there is a need for computing systems with improved efficiency and reduced costs as compared to conventional vector computers.
The invention pertains to a smart memory computing system that uses smart memory for massive data storage as well as for massive parallel execution. The data stored in the smart memory can be accessed just like the conventional main memory, but the smart memory also has many execution units to process data in situ. The smart memory computing system offers improved performance and reduced costs for those programs having massive data-level parallelism. This invention is able to take advantage of data-level parallelism to improve execution speed by, for example, use of inventive aspects such as algorithm mapping, compiler techniques, architecture features, and specialized instruction sets.
The invention can be implemented in numerous ways including, a method, system, device, and computer readable medium. Several embodiments of the invention are discussed below.
As a smart memory computing system to process data in parallel, one embodiment of the invention includes at least: a central processing unit; a main memory unit that provides data storage for the central processing unit; a smart memory unit to not only store data for the central processing unit but also to process data therein; and a massive data storage that provides storage for a superset of data stored in the main memory system and in the smart memory system.
As a smart memory computing system to process data in parallel, another embodiment of the invention includes at least: a central processing unit; a main memory unit that provides data storage for the central processing unit; a smart memory unit to not only store data for the central processing unit but also to process data therein; a massive data storage that provides storage for a superset of data stored in the main memory system and in the smart memory system; and means for the central processing unit to interact with the smart memory system.
Other aspects and advantages of the invention will become apparent from the following detailed description taken in conjunction with the accompanying drawings which illustrate, by way of example, the principles of the invention.