1. Field of the Invention
The present invention is related to a field programmable gate array (FPGA) having embedded static random access memory (SRAM). More particularly, the present invention is related to a bus architecture for the embedded SRAM, and the connection of the SRAM bus architecture to the general interconnect architecture of the FPGA.
2. The Prior Art
As integrated circuit technology advances, geometries shrink, performance improves, and densities increase. This trend makes the design of systems of ever increasing complexity at ever decreasing cost feasible. This is especially true in logic products such as Application Specific Integrated Circuits (ASICs), Complex Programmable Logic Devices (CPLDs), and Field Programmable Gate Arrays (FPGAs).
The need for integrating fast, flexible, inexpensive memory into these logic products to provide memory for a variety of purposes such as register files, FIFOs, scratch pads, look-up tables, etc. has become more apparent, because there are significant cost and performance savings to be obtained by integrating this functionality directly into, for example, an FPGA. However, providing this memory by having other than explicitly dedicated SRAM blocks included in the FPGA has not proved satisfactory. Typically, the implementation of memory without dedicated SRAM blocks in an FPGA has been done by either providing external SRAM to the FPGA or by using the logic modules, flip-flops and interconnect of the FPGA. Both of these solutions are less than satisfactory.
Using external SRAMs with FPGA designs is undesirable for several reasons. Separate memory chips are expensive, require additional printed circuit board space, and consume I/O pins on the FPGA itself. Also, a separate memory chip is required to implement each memory function, thereby further increasing the cost.
When SRAM is implemented with the logic modules in the FPGA, it requires a substantial amount of the routing and logic resources of the FPGA, because the available logic blocks are implemented as gates and latches and the programmable interconnect is employed to connect them. This substantially degrades both the performance and flexibility of the FPGA by consuming a considerable amount of logic array resources, and imposes critical paths that are quite long for even a small memory block.
Xilinx offers the capability of using the configurable logic blocks on their 4000 Series of parts as 16×1 SRAM blocks, but requires the use of normal interconnect to combine the blocks into larger memory configurations. While this distributed SRAM approach is an improvement in density and is flexible for building larger memories, it is still slow and consumes logic array resources. The necessary overhead circuitry was sufficiently large that Xilinx actually removed it when they developed their low cost 4000-D parts. On their 4000-E Series parts, they offer the ability to configure two configurable logic blocks to emulate a dual ported 16×1 SRAM block, however, this design still carries with it performance and flexibility degradation.
Altera has also attempted to improve on the connection of the SRAM blocks in their embedded array blocks for their 10K FLEX parts. They include one or more columns on their larger parts of embedded array blocks which are size matched to their logic array blocks. The embedded array blocks contain 2K bits of single ported SRAM configurable as 256×8, 512×4, 1024×2, or 2048×1. This approach builds the flexibility of different widths and depths into the SRAM block, but at a significant performance cost since the access time of an embedded array block is very slow for a memory of the size and the technology in which it is built. Further, array routing resources are required for memory configurations other than those indicated.
Another approach to SRAM memory in FPGA applications is found in “Architecture of Centralized Field-Configurable Memory”, Steven J. E. Wilton, et. al., from the minutes of the 1995 FPGA Symposium, p. 97. This approach involves a large centralized memory which can be incorporated into an FPGA. The centralized memory comprises several SRAM arrays which have programmable local routing interconnect which are used exclusively by the centralized memory block. The local routing interconnects are used to make efficient the configuration of the SRAMs within the centralized memory block. However, the local interconnect structure disclosed in Wilton suffers performance problems due to excessive flexibility in the interconnect architecture.
Actel's 3200 DX family of parts attempted an intermediate approach by including columns of dual ported SRAM blocks with 256 bits which are configurable as either 32×8 or 64×4. These blocks are distributed over several rows of logic modules to match the density of I/O signals to the SRAM block to that of the surrounding FPGA array. Polarity control circuits were added to the block enable signals to facilitate use as higher address bits. This architecture was designed to provide high performance and reasonable flexibility, with density approaching the inherent SRAM density of the semiconductor process, and routing density comparable to the rest of the logic array. Unfortunately, this approach required array routing resources to interconnect SRAM blocks into deeper and wider configurations.
One of the desirable attributes of user-assignable SRAM blocks in an FPGA architecture is the ability to connect the SRAM blocks to one another to form memories that are either wider (i.e. longer word length) or are deeper (i.e. more words). In connecting SRAM blocks into deeper and wider configurations it must be appreciated that the addresses have to go to each of the SRAM blocks, the write data has to go to each of the SRAM blocks, and the data must be able to be read from all of the SRAM blocks. In addition, the control signals used by the SRAM blocks to read and write data must also be routed to each of the SRAM blocks.
Since routing resources must be used to connect the dedicated SRAM blocks to one another to create either wider or deeper memories, and given that routing resources are not unlimited, preventing a degradation in the performance of the FPGA by efficiently forming deeper and wider memories is an important concern. In preventing a degradation of the FPGA performance, the connection to the user of SRAM blocks to provide deeper and wider memory configurations should not substantially impact the place and route algorithms of the FPGA, nor prevent the use of place and route algorithms for connecting the logic in the FPGA. Several approaches are known in the art for configuring dedicated SRAM blocks to provide deeper and wider memories.
The difficulty in creating deeper and wider SRAM block configurations in the prior art has been that array routing resources have been required to interconnect the SRAM blocks into these configurations. Part of the problem has been that the array routing resources have not been used very efficiently. In certain instances, this was due to the fact that the devices to which the SRAM blocks have been added were not originally designed with embedded SRAM blocks, rather the SRAM blocks have been inserted as an add-on piece.
These problems are better illustrated with reference to FIGS. 1 and 2. In FIG. 1, four 256×8 SRAM blocks are connected into a deeper configuration or essentially a 1024×8 memory. In this configuration it can be seen that the lower order write address bits must be supplied to each of the 256×8 SRAM blocks along with the write data. Additionally, logic must be implemented to provide a 2 to 4 decode of the two higher order address bits used to select the one of four 256×8 SRAM blocks to which the data will actually be written. To read data from the SRAM blocks in this deeper configuration, the lower order read address bits must be supplied to each of the 256×8 SRAM blocks, and then additional logic must be implemented to provide a 4 to 1 multiplexer so that the correct data may be selected from the 1 of 4 SRAM blocks from which data is being output.
In FIG. 2, four 256×8 SRAM blocks are configured in a wider configuration to provide a 256×32 SRAM block. In this configuration, the write address must be supplied to each of the 256×8 blocks to perform a write operation, and to perform a read operation the read address must also be supplied to each of the 256×8 blocks. The write data must be routed so that the first 8 bits of the data at a particular address location is supplied to a first 256×8 SRAM block, the next 8 bits of the data at that particular address location is supplied to a second 256×8 SRAM block, and the third and fourth 8 bits of data at the same address are supplied to a third and fourth SRAM block, respectively. For a data read, the output of the SRAM blocks must be connected so that the correct 8 bits are taken from each of the four 256×8 SRAM blocks to form a single 32 bit word.
Clearly there is a need for a flexible SRAM bus architecture in an FPGA for embedding user-assignable SRAM blocks into the FPGA. The SRAM bus architecture of the present invention provides routing resources for efficiently using individual SRAM blocks or for connecting multiple blocks of user SRAM to make wider and/or deeper memory configurations. The SRAM bus architecture of the present invention is implemented with minimal use of the routing resources of the array, and with a minimal degradation in the performance of the FPGA.