The present invention generally relates to memory system design, and more particularly to a memory system that can be configured by users to optimize the size and performance of the memory system.
Programmable integrated circuits (ICs) are a well-known type of integrated circuit that may be programmed by a user to perform specified logic functions. One type of programmable IC, the field programmable gate array (FPGA), is very popular because of a superior combination of capacity, flexibility and cost. A FPGA typically includes an array of configurable logic blocks (CLBs) surrounded by a ring of programmable input/output blocks (IOBs). The CLBs and IOBs are interconnected by a programmable interconnect structure. The CLBs, IOBs, and interconnect structure are typically programmed by loading a stream of configuration data (bitstream) into internal configuration memory cells that define how the CLBs, IOBs, and interconnect structure are configured. The configuration bitstream may be read from an external memory (e.g., an external PROM). The collective states of the individual memory cells then determine the function of the FPGA.
As processing technology improves, more and more CLBs, IOBs and interconnect structures can be fabricated inside a FPGA. Recently, it is possible to build an entire data processing system (containing a central processor unit, memory, and various controllers) inside a FPGA. In some cases, not all the CLBs, IOBs and interconnect structures in the FPGA are used for building the data processing system, and some of them can be used for other applications.
One of the most important resources in a data processing system is memory. Many FPGAs provide blocks of random access memories (RAMs) each has thousands of memory cells (called xe2x80x9cblock RAMsxe2x80x9d). These blocks can be organized into different configurations. As example, a block RAM may have a capacity of 16 Kilobits. This block RAM may be arranged by a user to have an address depth of either 16K, 8K, 4K, 2K, 1K and 0.5K, with the corresponding number of bits per address as 1, 2, 4, 8, 16 or 32, respectively. A user can also combine a number of blocks to increase the total size of a memory system. More information about block RAMs can be found in U.S. Pat. No. 5,933,023 entitled xe2x80x9cFPGA Architecture Having RAM Blocks with Programmable Word Length and Width and Dedicated Address and Data Lines,xe2x80x9d assigned to Xilinx, Inc. This patent is incorporated herein by reference.
In general, it is desirable to allow a data processing system to have access to as much memory as possible. One of the reasons is that some software modules require a minimum amount of memory to run. Another reason is that it is sometimes possible to speed up computation by allocating more memory to a task. On the other hand, a large amount of memory requires a large number of block RAMs. With the addition of each block RAM, the memory data access time of the memory is lengthened. One way to solve this problem is to introduce delays between a request for memory access and the granting of the access. In other words, xe2x80x9cwait statesxe2x80x9d need to be inserted. As a result, the performance of the data processing system at the memory interface is reduced.
Another problem reserving a large amount of memory for the data processing system is that the total amount of block RAMs in a FPGA is limited. In addition to the data processing system, other logic modules in the FPGA may need to have more memory. If all or most of the block RAMs are allocated to the data processing system, it may compromise the design of other logic modules.
The optimal amount of memory and number of wait states vary with different designs. For example, real-time applications tend to require fast execution because the data processing system has to complete computations within a short period of time. Thus, it is desirable to eliminate wait states. On the other hand, it may be advantageous to enable a general purpose design to run many software applications. Thus, it would be advantageous to include more memory in the data processing system. In order to give users the most design flexibility, it is desirable to allow the users to configure the memory system to achieve an optimal performance.
The present invention provides an on-chip data processing system comprising a user configurable on-chip memory system and an on-chip processor core. The memory system comprises at least a memory controller, block RAMs, and storage of design values related to the memory system. The number of block RAMs and the number of address lines (i.e., address depth) associated with the block RAMs can be selected and configured by users. One advantage of this invention is that only the necessary amount of block RAMs used by the processor core is allocated to the data processing system. All the block RAMs that are not allocated can be used by other on-chip applications. As a result, it optimizes the use of a valuable resource: block RAMs.
One embodiment of the memory controller contains an address manager that can deactivate some of the address lines originated from the processor core. The number of deactivated address lines is user configurable. The deactivation may be accomplished by a combination of demultiplexers, multiplexers and memory cells that store user supplied information.
Users can apply the memory controller of the present invention to set up the number of wait states of the memory system. In order to make sure that the memory system functions properly, the number of wait states needs to be chosen so that block RAMs have time to respond to a request. The present invention also involves an algorithm that allows users to select the optimal combination of wait states and associated address depth.
The number of wait states and/or the number of address lines may be set prior to configuration of a FPGA. In another embodiment, one or both of these two parameters may be set by programming instructions of the processor core.
The memory system of the present invention may also be applied to a data processing system having separate instruction and data sides. In this system, an instruction memory controller is associated with block RAMs used for storing instructions and a data memory controller is associated with block RAMs used for storing data. In one embodiment, the instruction and data block RAMs can be physically the same. In this case, it may be desirable to use memory management unit (MMU) schemes in general, for memory protection.
The data processing system may have two types of block RAMs, local and global. Local block RAMs have direct connection to the processor core while the global block RAMs are connected to the processor core through the interconnect structure of the programmable logic device. As a result, the delays in accessing the local block RAMs is much less than that of the global block RAMs. Thus, the number of wait states of the local block RAMs are smaller than that of the global block RAMs.
The above summary of the present invention is not intended to describe each disclosed embodiment of the present invention. The figures and detailed description that follow provide additional example embodiments and aspects of the present invention.