The technical field of this invention is data processing systems and particularly data processing systems with combined external memory access and direct memory access.
Data processing systems typically employ data caches or instruction caches to improve performance. A small amount of high speed memory is used as the cache. This cache memory is filled from main memory on an as needed basis. When the data processor requires data or an instruction, this is first sought from the cache memory. If the data or instruction sought is already stored in the cache memory, it is recalled faster than it could have been recalled from main memory. If the data or instruction sought is not stored in the cache memory, it is recalled from main memory for use and also stored in the corresponding cache. A performance improvement is achieved using cache memory based upon the principle of locality of reference. It is likely that the data or the instruction just sought by the data processor with be needed again in the near future. Use of cache memories speeds the accesses needed to service these future needs. A typical high performance data processor will include instruction cache, data cache or both on the same integrated circuit as the data processor core.
Conventional data processor cores typically include direct memory access (DMA). Direct memory access is a method of memory access not requiring data processor core activity, conventionally accomplished by a DMA functional block. This DMA functional block includes an I/O device and a controller function. This functional feature allows interface of external devices with the data processor core, internal memory, external memory, and other portions of the integrated circuit. The DMA interface is the communication link which relieves the data processor core from servicing these external devices on its own, preventing with loss of many data processor core cycles which would be consumed in a direct data processor core to external device interface.
The conventional direct memory access unit consists of a simple set of address generators which can perform reads and writes of some, or perhaps all, addresses within a data processing system. The address generation logic is normally implemented as a simple counter mechanism, with a reload capability from a set of data processor core memory-mapped registers. A typical use of a direct memory access unit is for the data processor core to load the counters with a starting address and a count, representing the amount of data to transfer. The data processor core must supply both the source and destination addresses for the transfer. Once this information has been loaded into the counters, the data processor can start the direct memory access via a memory mapped register write. The direct memory access unit then begins performing read and write accesses to move the requested data without further intervention from the data processor core. The data processor core is free to begin performing other tasks.
As the direct memory access unit performs read and writes to the source and destination locations, the addresses are incremented in each counter while the count is decremented. Once the count reaches zero, the transfer is complete and the direct memory access terminates. Most direct memory access units include a mechanism of signaling this complete state back to the data processor core via a status bit or interrupt. In general the interrupt method is preferred because it does not require a polling loop on the DSP to determine the completion status.
There are several features which are becoming increasingly common to direct memory access units which have attempted to address the issue of providing higher performance. The first is the inclusion of more DMA channels. A single DMA channel basically consists of all the hardware required to process a single direct memory access. This will generally include at least a source and destination address register/counter, a byte count register/counter, and the associated control logic to allow it to perform basic read and write operations. In a multi-channel direct memory access unit, the logic for a single channel is generally just replicated multiple times to provide increased channel capability. In addition to the multiple instantiations of the channels, a multi-channel direct memory access unit must also include some arbitration logic to provide time division access by all the channels to the memory/peripherals which the channels can address. Conventional direct memory access units may include from 2 to 16 channels. One advantage of additional channels is that each channel can contain parameters for a specific type of transfer. The data processor core sets up each channel in advance, and does not have to reload the direct memory access unit registers each time a new transfer has to be done, the way it would have to if only a single channel existed. Alternatively, a single direct memory access unit with plural channels may service plural data transfer requestors.
It is typical to provide a separate access port to the external memory interface for both the data processor core and for the direct memory access unit. The data processor core may directly request read or write data or instruction access from external memory via the external memory interface. Additionally, the data processor core makes cache service requests via the external memory interface. Cache service may also be reads or writes. The direct memory access unit may also make memory accesses, either reads or writes, under control of the data processor core via the external memory interface. This known technique requires the external memory interface to service two masters, the data processor core and the direct memory access unit.
This known technique encounters problems when extended to integrated circuits including plural data processor cores. According to this known technique, each of the plural data processor cores requires a port to the combined chip level external memory interface. In addition the external memory interface must be responsive to a combined chip level direct memory access unit. This configuration requires a lot of interconnect on the chip between the data processor cores and the external memory interface. In addition, this configuration is not easily scalable to accommodate additional data processor cores on the same integrated circuit. This is because each new data processor core requires another master port connection to the external memory interface.
This invention relates to a data processing apparatus. The data processing apparatus includes a data processor core having integral cache memory and local memory, and external memory interface and a direct memory access unit. The data processor core has a single data interchange port. The external memory interface has an internal data interchange port and an external data interchange port adapted for connection to devices external to the data processing apparatus. The direct memory access unit is connected to the single data interchange port of the data processor core and to the internal data interchange port of the external memory interface. The direct memory access unit transports data according to commands received from the data processor core to or from devices external to the data processing unit via the external memory interface. As an extension of this invention, a single direct memory access unit may serve a multiprocessing environment including plural data processor cores.
The data processor core, external memory interface and direct memory access unit are preferably embodied in a single integrated circuit. This single integrated circuit has the external interchange port of the external memory interface as its sole data port to external devices.
The data processor core preferably includes an instruction cache for temporarily storing program instructions and a data cache for temporarily storing data. The data processor core requests direct memory access data transfers for cache service such as: instruction cache fill upon a read access miss to the instruction cache; data cache fill upon a read access miss to the data cache; data writeback to system memory upon a write miss to the data cache; write data allocation to the data cache upon a write miss to the data cache; data writeback to system memory upon eviction of dirty data from the data cache.