1. Field of the Invention
The present invention relates to memory system management and usage in a data processing system, and particularly to reducing latencies associated with load and store memory operations in a data processing system.
2. State of the Art
The memory system of single chip integrated computer or processing system generally includes an on-chip memory portion and a larger off-chip external memory portion. In this case, the memory system is managed by storing a bulk of the digital information off-chip, loading portions of the off-chip information into the on-chip memory portion, processing the data in the on-chip memory portion, and then storing the data either back to the off-chip memory or outputting it to another destination.
One of the drawbacks of this technique occurs when the size of the block of data being transferred from external memory to the chip is larger than the area in which it is to be transferred. When this occurs, it is necessary to perform multiple load, process, and store transactions to process the oversized block of data.
For instance, FIG. 1A shows a typical prior art processing system 10 including a memory system 11 having an external memory portion 12 and an on-chip memory portion 13, a CPU 15, a data processing unit 17, and an input/output (I/O) port 18, all interconnected with a system bus. Also included within the memory system 11 is a memory controller 14 for managing memory transactions on the system bus and a DMA controller 16 for performing direct memory access transactions.
FIG. 1B shows a timing diagram of a memory transaction in which a block of data 12A stored in external memory portion 12 is transferred to a smaller memory area in the on-chip memory portion 13 so as to allow the processing unit to process the transferred data and then store the processed data back to external memory or transfer it to another destination. In cycle 0, a first portion of the block of data stored in data block 12A is loaded into the smaller memory area 13A. In cycle 1, data processing unit 7 processes the data, and in cycle 2, the processed data in memory area 13A is either stored out to the external memory portion 12 or to another destination such as I/O ports 18. In cycle 3, a second portion of the block of data 12A is loaded into the on-chip memory area 13A, which is processed in cycle 4, and stored in cycle 5. In cycle 6, more data is loaded into memory area 13A. These cycles (i.e., load, process, store) continue until all of the block of data 12A is processed and stored. The problem with this technique is that during the storing and loading cycles (cycles 2/3, cycles 5/6, etc.) which occur when transferring data into and out of buffer 13A, the processing system is idle, thereby causing a reduction in overall system efficiency.
The present invention is a system and method of performing memory transfers between a larger memory area and a smaller memory area which does not exhibit the memory related latencies as described above.
The present invention is a system and method of performing memory transactions between a first memory area having a first predefined buffer size and a second memory area having a second predefined buffer size that is smaller than the first buffer size. Instead of performing multiple consecutive transactions between the first memory area and only one buffer area within the smaller memory area, in accordance with the method of the present invention consecutive transactions are alternately performed between the first memory area and at least two predefined memory banks having a combined size that is greater than or equal to the second buffer size.
In one embodiment, a memory bank is defined to be half the size of the second predefined buffer size such that first and second memory banks comprise a single second memory area. Alternatively, a memory bank is defined to be the same size as the second predefined buffer size such that first and second memory banks each comprise one of the second memory areas.
Memory transactions between a buffer in a first memory area and two predefined memory banks A and B in the second memory area occur in the following manner:
1) during a first iteration:
a first portion of data from the first memory area buffer is loaded into memory bank A;
2) during the next iteration:
data that was loaded into memory bank A during the first iteration is processed and then stored out to a new destination;
a next portion of data from the first memory area buffer is loaded into memory bank B;
the data processing and storing operations are synchronized to ensure operations in this iteration are complete before going to step 3);
3) during the next iteration:
data that was loaded into memory bank B during the previous iteration is processed and then stored out to a new destination;
a next portion of data from the first memory area buffer is loaded into memory bank A;
synchronize operations to ensure completed before going to step 4);
4) during the next iteration:
Repeat steps 2 and 3 until all of the data in the first memory area buffer has been transferred to either memory banks A or B and has been processed such that data is streamed into the two banks without any loading or storing delays when switching from bank to bank.
In accordance with the system and method, the memory transactions are DMA transactions performed using a DMA (direct memory accessing) controller, DMA set-up registers, and DMA stream controller. The DMA controller controls the DMA transactions according to the DMA set-up registers and the DMA stream controller ensures that loading, storing, and processing operations are synchronized. In one embodiment, the DMA set-up registers include a start address in the first memory area, transfer size, bank size, start address in each bank within the second memory area, read/write mode of the second memory area, and status/control information as well as a stream control information.