This invention relates generally to semiconductor memories, and more particularly to memories including logic circuits for functions other than data accessing.
Memory performance continues to be a problem in computer systems. Over the last decade, microprocessor performance improvement has been at the rate of 60% per year, while memory access time has improved at less than 10% per year. In part, this is due to the fact that the design criteria for microprocessors and dynamic random access memories (DRAM) are entirely different. For logic circuits, speed is the most important feature, while density and low leakage are the most important features for DRAM. In other words, DRAM designers have focused on improving storage capacity, not speed.
As a consequence, two different processes tend to be used for fabricating semiconductor devices. One process works well for logic circuits, for example, microprocessors, but not for DRAM. The other process works well DRAM, but not for logic. A DRAM fabricated using a process tuned for logic cells would contain relatively few memory cells per unit area. It would also consume much more electrical power than a comparable DRAM fabricated using the memory process. The additional power consumed would be dissipated as heat, and the memory chip would potentially require additional cooling. For these reasons, using a logic process to fabricate a DRAM chip is not cost effective. The logic process is best used in cases where speed is more critical than power dissipation or area efficiency.
Most semiconductor memories provide-only basic memory functionality, that of storing data and allowing its retrieval. Recently, there have been attempts to develop a smart DRAM. A smart DRAM has logic and memory functions fabricated and packaged as a single chip. One such effort is the Berkeley IRAM project described by Patterson et al. in xe2x80x9cIntelligent RAM (IRAM): the Industrial Setting, Applications, and Architecturexe2x80x9d, ICCD ""97, International Conference on Computer Design, October 1997, see also http://iram.cs.berkeley.edu. The primary goal of IRAM is to put a powerful processor core onto a memory chip fabricated using a DRAM process. Because IRAM uses a novel (non-standard) vector architecture, there is no software compatibility with the existing base of software. Software compatibility is essential for running existing programs efficiently. Also, the external interface of the IRAM chip differs substantially enough from existing DRAM designs such that it precludes using the IRAM as a standard DRAM.
The M32R/D microcontroller that is available from Mitsubishi Electric Corporation includes 2 MBytes of DRAM. For details, see the xe2x80x9cM32000D4AFP User""s Manual,xe2x80x9d available through the Mitsubishi Semiconductor Web site at xe2x80x9cwww.mitsubishichips.com.xe2x80x9d The M32R/D is fabricated using a process that is a hybrid between the logic process and the memory process. Like the IRAM, the M32R/D microcontroller has an external interface that is different than that of a standard DRAM chip. By contemporary standards, the M32R contains only a small amount of memory, and probably can be better characterized as a microprocessor with embedded memory than as a smart DRAM.
The 3DRAM Frame Buffer Memory, also from Mitsubishi Electric Corporation, is a high-performance memory used in 3-D graphics frame buffers. See the xe2x80x9cM5M410092 Specification, Rev 3.11,xe2x80x9d published by the Mitsubishi Electric Electronic Device Group. The 3DRAM executes read-modify-write operations on the memory chip itself rather than in the host processor, to accelerate graphics depth-buffer computations. The 3DRAM is an application-specific memory that is tailored for Sun Microsystems graphics workstations, and is not a cost-effective main memory to be used in PCs.
Another design of an enhanced memory is described by Oskin et al. in xe2x80x9cActive Pages: A Computation Model for Intelligent Memory,xe2x80x9d Proceedings of the 25th Annual International Symposium on Computer Architecture, pp. 192-203, June 1998. The Active Page architecture places configurable logic and/or a set of processing elements onto a DRAM memory chip. The logic or processing elements can perform computations in the memory concurrently with programs executing on the host processor. To compensate for implementing logic using the slower DRAM process, the architecture allows for a high degree of parallelism. The Active Page DRAM chip functions much like a scalable parallel processor, and it is most efficient when it is executing software that is already set up to run in parallel. While it is noteworthy to improve execution performance for parallel computations, the Active Pages approach provides only limited benefit for improving the performance of the serial programs that are executed by typical users of home computers.
What is desired is an architecture for a multi-functional memory that provides benefit for the largest possible number of computer users. In particular, is it desirable to use a memory architecture that allows the memory in the semiconductor to test itself continuously, while simultaneously allowing the values in the memory to be accessed normally by the host processor without affecting the access time.
Provided is a multi-functional general purpose random access memory that is fabricated on a single semiconductor substrate. The substrate includes a memory array including a plurality of pages, one or more processing elements, and internal-external address mapping means. The pages, processing elements, and mapping means are connected to each other by clock, control, data, and address signal lines. The signal lines connect the processing elements and the internal-external mapping means to a host processor via an external access path, and signal lines connect the processing elements and the memory pages via a multi-function access path, and the signal lines connect the internal-external mapping means to the memory pages via in internal access path.
The memory also includes a plurality of spare pages with one processing element for each of the plurality of spare pages. The processing elements can be configured for continuous memory testing, memory filling, block transfer, string matching, and data compression while an external host processor concurrently accesses the memory for data reads and writes.