The present invention is generally in the field of digital computing and, more specifically, is directed to methods and apparatus for interfacing between a processor or bus and an execution subsystem or xe2x80x9cenginexe2x80x9d that employs shared, reconfigurable memory in a highly flexible, microprogrammable, memory-centric architecture.
A. Introduction
The prior application, Ser. No. 08/821,326, entitled xe2x80x9cShared, Reconfigurable Memory Architectures for Digital Signal Processingxe2x80x9d described the need to improve digital signal processing performance while containing or reducing cost. That application describes improved computer architectures that utilize available memory resources more efficiently by providing for shared and reconfigurable memory so as to reduce I/O processor requirements for computation intensive tasks such as digital signal processing. The memory systems described in the prior case are shared in the sense that a given block of memory can first be configured for access by the CPU, for example to load data, and then xe2x80x9cswappedxe2x80x9d so that the same block of physical memory can be directly accessed by an execution unit, for example a DSP execution unit, to carry out various calculations on that data. After the calculations are completed, the same block of memory can be xe2x80x9cswappedxe2x80x9d once again, so that the CPU has immediate access to the results.
The memory is reconfigurable in a variety of ways, as described below, so as to allocate memory resources as between the CPU and the execution unit (or multiple execution units) in the most efficient manner possible. Reconfiguring the memory can include forming memory blocks of various sizes; selecting write (input) sources; selecting read (destination) targets; selecting word size, and so forth. Various particulars and alternative embodiments are set forth below, so as to enable one skilled in the art to implement shared, reconfigurable memory architectures. The parent case described the invention with reference to digital signal processing. However, DSP is just one example of computation-intensive calculation. The concepts of the prior case as well as the present invention are applicable to a wide variety of execution tasks, including but not limited to DSP and related tasks such as motion picture encoding, decoding, and encryption, decryption, etc.
B. Memory Centric Controller
Another aspect of the prior application is a memory-centric DSP controller (xe2x80x9cMDSPCxe2x80x9d). The MDSPC was described as providing memory address generation and a variety of other control functions, including reconfiguring the memory as summarized above to support a particular computation in the execution unit. The name xe2x80x9cMDSPCxe2x80x9d was appropriate for that controller in the context of the parent case, in which the preferred embodiment was described for digital signal processing. However, the principles of the parent cases and the present invention are not so limited. Accordingly, the related application entitled MEMORY CENTRIC CONTROLLER uses its title (or xe2x80x9cMCC)xe2x80x9d to describe the a controller which is functionally similar to the xe2x80x9cMDSPCxe2x80x9d introduced in the earlier parent case. The MEMORY CENTRIC CONTROLLER application describes the structure and operation of the MCC in greater detail, and includes detailed description of interfacing between the reconfigurable memory referred to above and one or more execution units. More specifically, the MCC application describes xe2x80x9cbandwidth matchingxe2x80x9d methods and apparatus to ensure that the execution unit can be operated at maximum throughput even where the memory, e.g. DRAM, is relatively slow.
It is no less important to system performance, however, to ensure sufficient bandwidth at the interface between the memory-centric engine and the host processor bus. Further, bandwidth alone is not enough; compatibility with existing standard interfaces or bus specifications is highly advantageous for reasons explained below. The present application thus is directed to methods and apparatus for interfacing between a processor or bus and a memory-centric execution subsystem or xe2x80x9cenginexe2x80x9d of the type described in the two related applications identified above.
C. DMA and Custom CPU Interface
The memory-centric architecture as described in the prior applications interfaces with a microprocessor core using essentially two types of interface, each of which is known in prior art for memory access. The first is DMAxe2x80x94Direct Memory Access. A DMA transfer allows moving a block of data into or from a memory without requiring cycles of the CPU processor or core. Instead, a DMA controller, generally hardware, handles the transfer and notifies the CPU when the transfer is done. The DMA is given a start location or address and a length count, and provides addressing to the memory. In the prior applications, we noted that the MDSPC (or MCC) includes DMA hardware for handling such tasks. We also refer to an I/O channel as one that allows DMA transfers. The DMA controller can be on board (or within) a processor or part of another component of the system. Thus one method for interfacing is to treat the MCC as a coprocessor to the core and communicate with it (i.e. transfer data) by employing I/O or DMA methodologies analogous to those known in prior art. One problem with DMA transfer, however, is the DRAM in the memory-centric engine may not provide adequate transfer rates to support the host bus. Alternatively, one can interface with the MCC by utilizing an existing co-processor interface provided on the particular processor core implementation being deployed.
A more direct interface is to modify the processor core architecture to accommodate the memory-centric engine directly. This solution is preferred in terms of optimizing performance and minimum transistor count. It provides maximum computing engine performance together with optimum core processor performance. The problem with that approach is that custom modification of an existing core architecture or hardware implementation is a difficult and time consumingxe2x80x94and therefor expensivexe2x80x94task. Core designs are complex and modifying them requires substantial engineering resources and time-to-market. Therefore, custom modification of each core processor design as a prerequisite to implementation of a memory-centric engine can be expected to impede implementation of the MCC engine.
In view of the foregoing background, the need remains for an architecture solution that will provide a standard interface, in other words i.e., take advantage of known and existing interfacing methods and apparatus. In other words, the need remains for a way to interface to a memory-centric computing engine, that provides enhanced performance and high bandwidth without requiring custom modification of the host processor core, yet still provide performance improvements over standard interfaces such as DMA and standard I/O channels or interfacing via a co-processor bus.
A memory-centric computing engine provides any one of several standard interfaces. For example, a standard memory interface can be provided. In this case, the interface includes address lines, data lines, and control signals such as RAS/(row address strobe), CAS/(column address strobe), write enable, output enable, etc. In other words, the MC engine presents itself to the processor like a memory. However, techniques are shown below that provide SRAM speed at the interface combined with DRAM density in the engine so as to accommodate complex computations requiring substantial amounts of data.
This invention provides simplified yet high performance interaction between the engine and the host. For example, the processor can load a calculation xe2x80x9cproblemxe2x80x9d (data) into the apparent xe2x80x9cmemoryxe2x80x9d and, after execution, simply read out the results using the same methods as a standard DRAM memory read operation. In the meantime, the MC engine performed the necessary operations.
The interface is configurable under microcode control. It can accommodate a memory interface, CPU bus interface, or indeed virtually any given standard or specification. Examples include the PCI Local Bus, VME Ebus, RAMBUS, etc. Other presently known bus interface standards are the Sun SBUS, PCMCIA, Multibus and the ISA and EISAxe2x80x94commonly called the IBM AT bus. Detailed technical specifications of these standards are widely published and therefore need not be detailed here.
According to one aspect of the invention, a method of interfacing a processor bus to a computation engine having a microprogrammable memory-centric controller and an array of memory is defined. The claimed method includes the steps of providing a predetermined series of microcode instructions for execution by the MCC; selecting a start address within the series of microcode instructions for carrying out a corresponding operation; and executing the series of microcode instructions in the MCC beginning at the selected start address so as to carry out the corresponding operation in the engine. The series of microcode instructions can be stored in a non-volatile memory accessible to the MCC; or in a non-volatile external memory accessible to the MCC. Alternatively, using the disclosed architecture, microcode can be downloaded under processor control to a separate microcode storage memory accessible to the MCC, or into the array of memory (DRAM) in the computation engine.
Another aspect of the invention also directed to interfacing with a bus is a method of downloading the microcode instructions by first asserting a predetermined address; decoding the predetermined address in the MCC; and in response to the decoding step, configuring the engine for storing the microcode instructions under processor control into the array of memory. The decoding can be done by xe2x80x9chard-wiredxe2x80x9d logic or it can be microcode programmable, or a combination of the two.
According to another aspect of the invention, the computing engine includes an SRAM buffer memory and the memory array comprises an array of DRAM memory. In operation, a write operation for example includes storing data from the external bus into the SRAM buffer memory and then transferring the stored data from the buffer memory into the DRAM array.
According to a further aspect of the invention, moving the stored data to the DRAM array includes writing a series of data words into a latch and then writing the contents of the latch into the DRAM array in a single write operation so as to improve matching access time of the SRAM buffer memory to access time of the DRAM array.
Thus the present invention provides methods and apparatus to reconfigure hardware, move data, etc. control executions, conduct testing, and provide high-speed interface, while maintaining compatibility with known standard bus interface specifications. Moreover, because easily reconfigured under software control, can easily be adapted and changed to a different interface as may be required. For example, the data format, word size, error correction bits, addressing format etc. can all be changed within the bounds of the available number of xe2x80x9cwiresxe2x80x9d of signal lines. Thus, the memory-centric engine can interface with, say the PCI bus in one application, while the same or an identical chip can comply with RAMBUS standards in another application. Bandwidth of course is key, and we show below the methods and apparatus for interfacing at the appropriate bus speed, using SRAM buffer cells and memory block swapping techniques.
The foregoing and other objects, features and advantages of the invention will become more readily apparent from the following detailed description of a preferred embodiment of the invention which proceeds with reference to the accompanying drawings.