1. Field of the Invention
The present invention relates to Programmable Logic Devices (PLD) integrated circuit devices. In particular, the present invention relates to random access memory circuits for use in FPGA arrays.
2. The Prior Art
Programmable Logic Devices (PLDs) are known in the art. A PLD is an integrated circuit having a programmable logic core comprising uncommitted logic modules and routing interconnects that is able to implement an arbitrary end-user logic design up to the logic capacity of the device. PLDs come in a number of types with Field Programmable Gate Arrays (FPGAs) being the variety with the largest logic capacity and highest performance in commercially available devices, which typically makes them the flagship product lines of PLD manufacturers. Since high capacity and high performance typically result in them being used for the most challenging applications, the present invention is preferably applied to FPGAs, though the inventive principles herein apply to all classes of PLD.
An FPGA comprises circuitry to implement any number of initially uncommitted logic modules arranged in a programmable array along with an appropriate amount of initially uncommitted routing interconnects. Logic modules are circuits which can be configured to perform a variety of logic functions, for example, AND-gates, OR-gates, NAND-gates, NOR-gates, XOR-gates, XNOR-gates, inverters, multiplexers, adders, latches, and flip/flops. Routing interconnects can include a mix of components, for example, wires, switches, multiplexers, and buffers. Logic modules, routing interconnects, and other features, for example, user I/O buffers, PLLs, DLLs, and random access memory circuit blocks, are the programmable elements of the FPGA.
The programmable elements have associated control elements (sometimes known as programming bits or configuration bits) that determine their functionality. The control elements may be thought of as binary bits having values such as on/off, conductive/non-conductive, true/false, or logic-1/logic-0 depending on the context. Depending on the technology employed different numbers and types of circuit elements are used to create a control element. For example, to connect two circuit nodes an antifuse, a floating gate transistor, or an SRAM bit controlling a pass transistor may be used as one type of control element in their respective technologies. Or to create a programmable logic-0/logic-1 generator to control a logic circuit, programming one of two antifuses (one coupled to logic-0 and one coupled to logic-1), programming one of two floating gate transistors (one coupled to logic-0 and one coupled to logic-1), or a single SRAM bit, may be used as a second type of control element in their respective technologies. Other types of control elements are possible and the above examples are not limiting in any way.
The characteristics of the control elements vary according to the technology employed and their mode of data storage may be either volatile or non-volatile. Volatile control elements, for example, SRAM bits, lose their programming data when the FPGA power supply is disconnected, disabled or turned off. Non-volatile control elements, for example, antifuses and floating gate transistors, do not lose their programming data when the FPGA power supply is removed. Some control elements, such as antifuses, can be programmed only one time and cannot be erased. Other control elements, such as SRAM bits and floating gate transistors, can have their programming data erased and may be reprogrammed many times. The detailed circuit implementation of the logic modules and routing interconnects can vary greatly and is appropriate for the type of control element used.
The logic design programmed into an FPGA by the end user is typically implemented by use of a computer program product (also known as software or, more specifically, design software) produced by the PLD manufacturer and distributed by means of a computer-readable medium, for example, providing a CD-ROM to the end user or making the design software downloadable over the internet. Typically the manufacturer supplies a library of design elements as part of the computer program product. The library design elements include virtual programmable elements that provide a layer of insulation between the end user and the circuit details of the physical programmable elements of the FPGA. This makes the design software easier to use for the end user and simplifies the manufacturer's task of processing the end user's design by the various tools in the design software.
Typically, a user creates a logic design using the manufacturer-supplied design software by means of a schematic entry tool, a hardware description language such as Verilog or VHDL, importing it in some computer readable format, or some combination of the above. The design software then takes the completed design and converts it into the appropriate mix of logic-type virtual programmable elements, maps them into corresponding physical programmable elements inside the FPGA, virtually configures the routing interconnect-type programmable elements to route the signals from one logic-type programmable element to another, and generates the data structure necessary to assign values to the various physical control elements inside the FPGA. If a programming fixture is physically present on the design system, the data structure may be directly applied to program an FPGA. Alternatively, the data structure may be ported in a computer-readable medium to a dedicated programming system or into the end user's system for programming the FPGA at a later time.
Random Access Memory (RAM) blocks have been present in FPGA arrays by most PLD manufactures since the mid-1990s. A variety of inconsistent terminology has arisen surrounding them due to the inherent vagueness and inconsistent use of some engineering terms. Thus some precise definitions are needed for use in this specification.
A “port” is a set of memory block signal terminals that are programmably coupleable to the FPGA array routing interconnects and the associated memory block internal circuitry for performing operations. A port comprises in part a set of address input terminals (or address bus) for specifying particular storage locations in the memory block. A port may be readable, writeable, or both. A read-only port may read data from the addressed location but may not write data into that location. Thus it is readable but not writeable. A write-only port may write data into the addressed location but may not read data from that location. Thus it is writeable but not readable. A read-write port may both read data from the addressed location and write data into the addressed location. Thus it is both readable and writeable.
In addition to having a set of address input terminals, a port will also typically have a set of control input terminals. These will often include a variety of signals like, for example, a clock signal, one or more enable signals, operation select signals, mode select signals, etc., that can very considerably from one embodiment to another as a matter of design choice. Typically in an FPGA, some of these signals will be routed to the memory block through routing interconnects while others will be set by programmable logic-0/logic-1 generators which may be programmably coupled to the control input locations.
A port will also include a set of data signal terminals. A read-only port will have a set of data output terminals (or read data signals or read data bus), a write-only port will have a set of data input terminals (or write data signals or write data bus), and a read-write port will typically have both a set of write data input terminals and another set of read data output terminals. In theory, a read-write port could utilize a single set of bidirectional input/output terminals, but while this technique is used in some types of discrete memory chips to minimize pin count, it is not typically employed in an FPGA memory block.
The ports that have been discussed so far are user ports, meaning that they are used in an FPGA logic design in a manner similar to which any memory block would be used by someone of ordinary skill in the art by means of a logic design utilizing the FPGA routing interconnects to couple to the memory block. In FPGAs, alternate methods of accessing the contents of a RAM block are often present for initialization, programming, test, and potentially other purposes. These alternative methods of access are not considered ports in the context of the present invention.
One common example of such an alternate access method would be the configuration memory of an SRAM-based FPGA of the sort disclosed in U.S. Pat. No. 6,049,487 to Plants et al, in FIG. 4, FIG. 14 and FIG. 15. In FIG. 4 a memory block is shown having a “READ PORT”, a “WRITE PORT” and a “LOAD PORT (READ/WRITE)”. As described in conjunction with FIG. 14 and FIG. 15, each location in the memory block is also part of a larger configuration memory having many more rows and columns than the relatively small memory block and may be accessed as part of this memory by the mechanism described as a “LOAD PORT.” This is not a user port (or simply “port”) in the sense used in this specification because the address, data, and control signals of the “LOAD PORT” are not programmably coupled to the FPGA routing interconnects. The memory block of FIG. 4 is a two port SRAM with a read-only port and a write-only port as these terms are defined in this specification.
Ports may also be synchronous or asynchronous. A synchronous port responds to the arrival of the active edge of a clock input signal on its clock input control terminal according to the logic levels present on its other input terminals, while an asynchronous port responds only to the logic levels on its input terminals. Typically writeable ports are synchronous because of the complex timing that writing data into a RAM block entails and it would be difficult for an FPGA end user to try and coordinate a series of pulses and strobes of the sort shown in FIG. 9 of Plants. By moving the timing internal to the RAM writeable port, the user only needs to have the address, data and control signals make setup and hold time relative to a single clock edge, which in principle is no more complicated than making setup and hold time relative to a clock edge for a flip-flop.
Readable ports can be either synchronous or asynchronous. Typically large FPGA memory blocks are implemented synchronously because they employ sense amplifiers and thus also have fairly complicated internal timing. It is often easier to attain high memory block performance and generally more reliable to use a clock edge to start off the internal timing than to use techniques such as address transition detection (ATD) for large memory blocks. Smaller memory blocks often operate asynchronously because they often do not have sense amplifiers and the associated control and timing circuits.
FIG. 1A shows a “single port” prior-art FPGA memory block, generally indicated by reference number 100. In FIG. 1A, single port RAM block 102 is shown coupled to CONTROL bus 104, WRITE_DATA bus 106, ADDRESS bus 108, and a READ_DATA bus 110. Busses 104, 106, 108 and 110 together, along with the reading and writing circuitry internal to single port RAM block 102, comprise the single port. By necessity, this is a read-write port since a RAM block with just a write-only port is not particularly useful (unless there is some alternative way to read it) and a RAM block with just a read-only port behaves more like a read-only memory (ROM) than a RAM (assuming there is some alternative way to write it).
FIG. 1B shows a “two port” FPGA memory block of the prior art, generally indicated by reference number 120. In the figure, two port RAM block 122 is shown having a write-only port 130 and a read-only port 140. Coupled to write port 130 is WRITE_CONTROL bus 132, WRITE_DATA bus 134, and WRITE_ADDRESS bus 136. Coupled to read port 140 is READ_CONTROL bus 142, READ_ADDRESS bus 144, and READ_DATA bus 146. Busses 132, 134 and 136 together, along with the writing circuitry internal to Two Port RAM block 122, comprise the write-only port 130. Busses 142, 144 and 146 together, along with the reading circuitry internal to two port RAM block 122, comprise the read-only port 140.
FIG. 1C shows a “dual port” FPGA memory block of the prior art, generally indicated by reference number 150. In the figure, dual port RAM block 152 is shown having a read-write port “A” 160 and a read-write port “B” 170. Coupled to read-write port A 160 is CONTROL_A bus 162, WRITE_DATA_A bus 164, ADDRESS_A bus 166, and READ_DATA_A bus 168. Coupled to read-write port B 170 is CONTROL_B bus 172, WRITE_DATA_B bus 174, ADDRESS_B bus 176, and READ_DATA_B bus 178. Busses 162, 164, 166 and 168 together, along with their associated reading and writing circuitry internal to dual port RAM block 152, comprise read-write port A 160. Busses 172, 174, 176 and 178 together, along with their associated reading and writing circuitry internal to dual port RAM block 152, comprise read-write port B 170.
For purposes of this specification, a dual port memory has two read-write ports while a two ported memory has some other combination of port types. The distinction needs to be made because in the early days of FPGA memory blocks, two port RAM blocks were common but were typically marketed as dual port RAM blocks. Later, when memories with two read-write ports became common, they were typically marketed as “true dual port” RAM blocks in order to contrast them from the earlier (and arguably mislabeled) two ported memory blocks.
Xilinx, Inc., of San Jose, Calif. introduced distributed SRAM blocks in some of their 4000 series FPGA product families. This allowed the standard 4-input lookup table logic modules to be used as 16-bit memory blocks. A single logic module could be used as a single ported 16×1 SRAM or combined with a neighboring logic module to produce a 16×2 or 32×1 single ported SRAM. Two logic modules could also be combined to produce a 16×1 two ported SRAM with one read-write port and one read-only port. The single port SRAM options could be synchronous or asynchronous while the two port SRAM options were synchronous.
Altera Corp., of San Jose, Calif. introduced Embedded Array Blocks (EAB) in their FLEX 10K embedded programmable logic family devices. The EAB was a 2,048-bit (or 2 Kb or simply 2K) single ported SRAM block which could be configured as 256×8, 512×4, 1K×2 and 2K×1. It was capable of both synchronous and asynchronous operation.
Actel Corp., of Mountain View, Calif. introduced the 3200 DX family of FPGAs which included a 256-bit two port SRAM block which could be configured as 32×8 or 64×4. It had a synchronous write-only port and a read-only port which could be programmed to either be synchronous or asynchronous.
After the early attempts, most PLD manufacturers eventually settled on synchronous dual port SRAM blocks in their FPGA families. A typical example is the BlockSelectRAM+ memory blocks in the first Virtex FPGA family by Xilinx. These were 4,096-bit dual port SRAM blocks with each port independently configurable as 256×16, 512×8, 1K×4, 2K×2 and 4K×1. Each port was synchronous and independently configurable as to width and depth.
Different approaches to timing synchronous ports were also tried. In U.S. Pat. No. 6,049,487, a 2,048-bit two port SRAM with a synchronous write-only port and programmably synchronous or asynchronous read-only port was disclosed. In the text associated with FIG. 5, FIG. 11 and FIG. 12, the internal workings of the memory block were described as being asynchronous and using an ATD circuit to time the sense amplifiers. When combined with appropriately timing the clock signal to the latches 74 in series with the read address input terminals 72 in FIG. 5, it created the effect of a pseudo D-type flip-flop with variable timing. This allowed an end user to make the SRAM block behave like a flip-flop with the ability to swap setup time in one clock cycle for clock-to-data-out time in the next by varying the relative timing of the read address signals relative to the read clock signal.
In the Axcelerator family of FPGAs, Actel introduced the output pipeline register. The Axcelerator family had a 4,096-bit two port memory block with a synchronous write-only port and a synchronous read-only port, each port independently configurable as 128×36, 256×18, 512×9, 1K×4, 2K×2 and 1K×1. The AX SRAM block included a register with each output terminal on the read data bus. The register could be programmably placed in series with the read data or it could be bypassed with a multiplexer. The effect of the register was to give the end user the option of having a read port with a two clock cycle latency or the typical one clock cycle latency of other synchronous readable ports. This allowed the end user to place the entire memory function in a single pipeline stage to increase performance if desired.
In subsequent generations of FPGAs, Altera has gone to multiple sizes of memory blocks with their TriMatrix memory scheme. For example, the original Stratix FPGA family and the later Stratix IV FPGA family each have two different sizes of dual ported memory blocks in their FPGA arrays, with the third memory (the “Tri” in “TriMatrix”) being the use of a LAB (Altera parlance for a cluster of SRAM-based lookup table logic modules) as a memory block. This approach is described in detail in U.S. Pat. No. 7,236,008 to Cliff, et al.
In recent years, soft processors have become increasingly important FPGA applications. A soft processor is a CPU or microcontroller implemented using FPGA array logical and routing interconnects. Typically, processors perform operations on the contents of temporary storage registers internal to the processor. These registers are typically part of a data structure known as a register file. Each register has a unique address inside the register file which the processor uses to access its contents.
In many common processor operations, the contents of two different registers are accessed as operands, a logic or arithmetic function is performed on the two operands, and the results of the operation are then stored back in the register file—either in one of the two registers containing the original operands or in a third register. Typically both operands are read at the same time that a result from a previous operation is written. Thus it is very common to be simultaneously reading two registers while performing a simultaneous write.
It is difficult to construct register files for soft processors in FPGAs of the prior art. Building them out of logic modules can be very costly in terms of FPGA resources. For example, a 32×32 (32 words each having 32 data bits) will require 1,024 individual flip-flops plus additional logic to construct. Thus a memory block is typically used. Unfortunately, conventional FPGA memory blocks are poorly suited to use as register files for several reasons. First, they are usually larger than necessary. It is inefficient to build a 32×32=1 Kb register file using a 4 Kb, 8 Kb, or 16 Kb memory block. Second, they are usually synchronous which limits flexibility in optimizing critical paths into and out of the register file since there is no control over the location of the pipeline registers before or after it. Third, they do not support three ports which results in complex logic being required to compensate. Alternatively, two dual or two port memory blocks are used. This involves simultaneously controlling a writeable port on each block and using the other readable port on each as one of the two readable ports for the register file. This is also an inefficient use of FPGA resources.