1. Field of the Invention
The invention pertains generally to computers. In particular, it pertains to computer memory.
2. Description of the Related Art
Due to their respective technological developments, computer processor speeds have increased faster than computer memory speeds, resulting in a disparity between the operational speed of the processor and the operational speed of the main memory that supplies that processor with instructions and data. This can cause the processor to remain idle while waiting for a requested instruction or data word to be returned from main memory. This problem has been addressed by using cache memory. A cache memory is a memory that is faster, but more expensive and therefore smaller, than the main memory. During operation, the cache memory can be loaded with the most recently used instructions and data from main memory, and a subsequent access to those same locations can be retrieved from the fast cache memory rather than the slower main memory. Although loading the instructions/data into cache creates its own overhead burdens, this approach is effective because computer software typically executes the same code repetitively. Thus, once the particular instructions have been loaded into cache, they can be repeatedly accessed from cache and executed more quickly than if they had to be retrieved from main memory every time.
Conventional computer systems place at least some of the cache memory in the processor chip. This speeds up cache access even more by eliminating the capacitive effects of driving signals between chips. If the cache is too big to fit on the processor chip, some of it can be located on a cache chip that is located close to the processor chip to reduce those inter-chip capacitive effects. The main memory is typically located further away from the processor chip. Since main memory is comparatively slow, the additional capacitive effects caused by this greater distance may not make any difference in the effective access speed of main memory.
Although cache memories are feasible in personal computers and larger computer systems, many applications require a small, embedded processor to perform a few dedicated functions, and the additional cost of even a small standard cache memory would make the final product economically unfeasible. These systems typically do not use a cache memory, and must accept the slow access speeds of their memory, even though the processor may be capable of much higher speeds. Many of these systems use flash memory, or some other form of electrically erasable programmable read-only memory (EEPROM), for a main memory because the devices require a non-volatile memory to preserve the data and instructions when the device is powered off.
FIG. 1 shows a conventional embedded system 1, with a processor (CPU) 11 accessing instructions and data from a flash memory array 13 in a flash memory 12. The flash memory of the example can transfer multiple data words in a burst, and is therefore referred to as a burst flash. CPU 11 and flash memory 12 communicate with each other over a bus 14. The bus of the example has multiple address lines to send a memory address to memory 12, multiple data lines to transfer data to/from the addressed memory location in memory array 13, various control lines to control these transfers, and a WAIT# line. When CPU 11 makes a read request to memory 12, memory 12 uses the WAIT# line to signal CPU 11 to wait until memory 12 has the requested data available. Even in flash memory, which can have read access speeds that are comparable to static random access memory (SRAM), this wait may last for several clock cycles due to the need to turn on various bit lines, word lines, and source lines before the selected memory cells can be accessed, and the need to compare analog voltages after the cells are accessed. This delay, controlled by the WAIT# line, is the mechanism used to integrate the fast CPU with the much slower memory. Interface 15 is used to connect the various bus signals to flash memory 12, and to control the flow of signals between flash array 13 and bus 14.
FIG. 2 shows a timing diagram of a typical transfer over bus 14. The clock signal CLK provides overall timing synchronization for the other bus signals. Multiple address lines ADR provide the memory addresses A1, A2, A3 or A4 of the requested data words D1, D2, D3 or D4 to be read from memory, while address-valid signal ADV# indicates when the address lines are valid. Chip select CE# and output enable OE# provide other control signals that are known in the art. When memory 12 sees a valid address, it asserts the wait signal WAIT# until it has the requested data available. When WAIT# is released (goes from low to high), the subsequent CLK cycles are used to time the reading of the now-available data from the memory over the DATA lines. In the example of FIG. 2, the WAIT process causes each request to take about six clock cycles to complete, even if the same data is being re-requested, as is the case for words D1 and D2 in FIG. 2. Most of those cycles represent idle time for the CPU, thus wasting much of its capability.