1. Technical Field
This invention generally relates to a memory architecture. More particularly, the present invention relates to a wide memory architecture for highly parallel vector processing applications.
2. Discussion
High speed memory architectures using register files which are directly accessible by the functional units of the processor are generally known within the electronics art. Such high speed memory or register files are increasingly used in conjunction with Fast Fourier Transform (FFT) processors and vector processors, as well as other types of highly parallel processors. The large number of numerical calculations performed by these processors requires very fast memory architectures for storing the intermediate results of the calculations in order to achieve high processing throughput.
As the demand for more complex signal processors increases, so does the demand for wider memory architectures. For example, FFT processors and vector processors are typically based upon highly parallel computer architectures. This means that the conventional 32 and 64 bit data bus channels associated with current high performance microprocessors will be replaced with a data bus including from 500 to 4,000 data channels. Accordingly, the memory employed to support these highly parallel processors will also require a very wide data pathway. However, this increase in the number of data channels, or strip transmission lines formed on the silicon chip, consumes valuable area on the silicon chip, reduces the number of transistors that can be formed on the silicon, and thus reduces the density of the memory that can be realized on a single chip. Additionally, the increase in the number of data lines also increases the capacitive load placed on the data bus and memory circuit.
To overcome the problems created by the increase in the number of data lines, memory designers used conventional single port memory to minimize the impact of the additional data lines. However, as the throughput requirements of these highly parallel processors increased, the single data bus of single-ported memory architectures created a bottleneck to and from the processor. Thus, memory designers developed multi-ported memory architectures, which essentially provided more than one data bus or data pathway between the memory and the processor. These are also referred to as multi ported register files.
The conventional approach to functionally implementing multi-ported memory was to allow simultaneous multiple access (read and write) from and to the processor. More particularly, two or more separate execution units of a particular vector process were able to simultaneously read/write from two or more different memory locations via separate parallel data busses. Due to the increased number of data and address control lines, these multi-ported memories evolved into very complex circuit architectures to support the simultaneous multiple access capabilities. Additionally, the increase in the number of data channels to support two or more separate data busses served to lower the memory density, and increase the capacitive loads placed upon the circuit, as described above. Thus, while multi-ported memory architectures provided the necessary throughput for the processor, this performance was achieved at a higher design and manufacturing expense and resulted in lower density memory.
The conventional approach to designing a multi-port register file is based upon modifying the basic single-port register cell into a multi-ported cell by adding additional read and write ports (transistors). However, the multi-port memory architectures currently known within the art present several design efficiency problems. For example, the 1-bit multi-ported register cell layout cannot be effectively optimized because the ratio between the I/O connectors and the number of devices in the cell is too large. Further, the interface between the multi-port register file and the functional units of the processor wastes significant integrated chip space because of the large number of data channels and electrical crossovers. The increased number of data channels make it difficult to utilize the silicon under the data bus routing channels. Additionally, silicon area of a multi-ported register is roughly proportional to the number of ports it supports. The conventional approach to achieving a multi-port register file is not efficient when the number of ports exceeds a certain value, typically six or seven ports. Multi-ported register files become wire routing bound as the number of ports increases. Additionally, the extra wiring capacitance of routing over the memory cells slows the memory as the number of ports increases. The effects of these inefficiencies are magnified as the number of read and write ports provided to the multi-port memory increases.
As such, it would be desirable to provide a high throughput wide memory architecture or register file based upon a single-ported memory architecture, which reduces the number of data channels and allows a significantly higher chip element density, similar to that of standard SRAM. Additionally, it is desirable to provide a wide memory architecture that allows for an increased number of transistors, and thus storage cells, without increasing the power required to drive the circuit. Finally, it is desirable to provide a wide memory architecture with the performance characteristics of multi-ported memory which avoids the wire routing problems typical of multi-ported memory. Such a wide memory architecture would significantly reduce the cost of manufacturing highly parallel vector processors and the memory required by these processors.