The present invention relates in general to electronic memories and in particular to a dynamic random access memory (DRAM) with integral static random access memory (SRAM), and systems and methods using the same.
Currently available dynamic random access memories (DRAMs) are generally based upon architectures which share the following characteristics. First, the typical general purpose DRAM has a single data port for writing and reading data to and from addressed storage locations (xe2x80x9cdual portedxe2x80x9d DRAMs are available which provide two data ports, typically one random and one serial port, however, these devices are normally limited to special memory applications). Second, data writes and reads are only made on a location by location basis, with each location typically being one bit, one byte or one word wide. Specifically, in a xe2x80x9crandom access modexe2x80x9d, an access (read or write) is made to a single location per row address strobe (/RAS) active cycle and in a xe2x80x9cpage modexe2x80x9d an access is made to a single location per column address strobe (/CAS) or master clock cycle of the row addressed during the given /RAS cycle. Alternatively, in synchronous DRAM, a memory access cycle is initiated by asserting an active command in the DRAM, during which row addresses are latched on the rising edge of a master clock. A read/write command causes column addresses to be latched on the rising edge of the master clock following which, after a latency period expires, data is clocked out with each rising edge on the master clock. Third, no method has generally been established to handle contention problems which arise when simultaneous requests for access are made to the same DRAM unit. Current techniques for handling contention problems depend on the DRAM and/or system architecture selected by the designer and range, for example, from xe2x80x9cuniform memory-noncontentionxe2x80x9d methods to xe2x80x9cnon-uniform memory accessxe2x80x9d (NUMA) methods.
Similarly, the system architectures of personal computers (PCs) generally share a number of common features. For example, the vast majority of today""s PCs are built around a single central processing unit (CPU), which is the system xe2x80x9cmaster.xe2x80x9d All other subsystems, such as the display controller, disk drive controller, and audio controller then operate as slaves to the CPU. This master/slave organization is normally used no matter whether the CPU is a complex instruction set computer (CISC), reduced instruction set computer (RISC), Silicon Graphics MIPS device or Digital Equipment ALPHA device.
Present memory and PC architectures, such as those discussed above, are rapidly becoming inadequate for constructing the fast machines with substantial storage capacity required to run increasingly sophisticated application software. The problem has already been addressed, at least in part, in the mainframe and server environments by the use of multiprocessor (multiprocessing) architectures. Multiprocessing architectures however are not yet cost effective for application in the PC environment. Furthermore, memory contention and bus contention are still significant concerns in any multiprocessing system, and in particular in a multiprocessing PC environment.
A CPU typically exchanges data with memory in terms of xe2x80x9ccache lines.xe2x80x9d Cache lines are a unit of data by which operandi and results can be stored or retrieved from memory and operated on by the CPU in a coherent fashion. Cache lines accesses are made both to cache and to system memory.
In systems operating with CPUs having a 32-bit data I/O port, a cache line is normally eight (8) 32-bit words or 256 bits. In the foreseeable future, data I/O ports will be 64 bits wide, and cache lines may be comprised of 16 64-bit data words or 1024 bits in length. Typically, the CPU may read a cache line from a corresponding location in memory, perform an arithmetic or logic operation on that data and then write the result back to the same location in system or cache memory. A given location for a cache line can be in one or more physical rows in memory and therefore an access to cache line location may require multiple /RAS cycles. In any event, the CPU, depending on the operating system running, can generally access any location in memory for storing and retrieving operandi and results.
Often situations arise when the results from a given operation exceed the length of the cache line and therefore data can no longer be processed as coherent cache line units. For example, if the CPU performs a n by n bit integer multiplication, the result could be a maximum of 2n bits. In other words, while each operand can be retrieved from memory as a cache line, the result exceeds the length of a single cache line and coherency is lost. Similarly, when operandi containing decimal points or fractions are involved, exceeding the length of a cache line can also take place. In the case of fractions, long strings of bits, which exceed cache line length, may be required to minimize rounding errors and therefore increase the precision of the calculations.
In any computing system, and in particular multiprocessing systems, the ability to operate on data as cache lines substantially improves operating efficiency. Thus, when a cache line is exceeded during an operation, system performance is reduced. Specifically, when a cache line is exceeded, the CPU must either access that data as two cache lines or as a cache line and additional discrete words or doublewords of data. As a result, extra memory cycles are required to execute an operation and the transfer of data within the system is more difficult because the necessary data is no longer in proper cache line data structures. Moreover, performance in multiprocessor systems is impaired when one processor is waiting for a second processor to complete its read or write to memory before being able to read or write its data.
Thus, the need has arisen for new memory and system architectures in which operations can be performed on coherent units of data, even if cache lengths are exceeded. In particular in multiprocessor systems, there is a need for system and memory architectures in which multiple processors can operate on data simultaneously.
Among the many advantages, the principles of the present invention allow for the efficient accessing of blocks of data as required by the multiple CPU data processing system. For example, in a four bank embodiment, with two registers per bank, a contiguous block of eight rows of data and associated addresses can be stored in register for fast access. Typically, the CPU accesses data within such spatially or temporally contiguous blocks. Thus, when the CPU requires data from memory, and that data is already stored in register, data with a given spatial or temporal locality thereof is also most likely already in a register. In this fashion, the number of xe2x80x9chitsxe2x80x9d to pre-stored data is substantially increased. The principles of the present invention also allow for high speed accesses directly from the registers, in addition to traditional accesses to the DRAM cell array. The advantages are particularly evident, in a single chip implementation according to the principles of the present invention.
A data port associated with each bank provides for independent access to each bank by the multiple processors. In an embodiment having an address port in each bank, the multiple processors may independently access incongruent memory locations in each bank. That is, memory cells having different relative locations within each bank are accessible, in this embodiment.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.