1. Field of the Invention
The present invention relates generally to a controller module for Flash memory. More specifically, it relates to an apparatus and methods for optimization of memory latency and throughput from a plurality of Flash memory units by providing sequential read access optimization through interleaving the access among the Flash memory units, and is applicable to CPU access on embedded Flash in a System-On-Chip (SOC).
2. Prior Art
Flash memory is widely used as non-volatile memory in many applications, including personal computers, solid state memory drives, as well as devices such MP3 and media players which require storage when not connected to a power source. NAND Flash provides a non-volatile memory solution that balances density and cost, with memory sizes of several gigabytes.
Processor speeds have been increasing much faster than memory speeds, leading to the well-known “processor-memory gap.” CPU speeds may be in the gigahertz range, with central bus speeds of over 500 MHz. These high speeds create a need for high throughput and low latency memory access from memory components. In the case of memory, the typical access speeds are considerably lower than the processor speed, leading to lower throughput and long latency times both for memory read and for memory write. Flash controllers typically must insert several wait-states on the bus when accesses to Flash memory are done by the Processor. A solution based on using memory cache is possible but more expensive and high in power consumption. Such cache-based solutions would not be typically appropriate for embedded applications, as often deployed on a System-On-Chip (SOC).
Solutions for improving the Flash memory throughput often employ Flash controller devices and modules, which interface between the CPU and the Flash memory. The Flash controller can use techniques such as interleaving multiple flash memories to improve throughput. Interleaved memory is a technique well-known in the prior art for increasing memory throughput using multiple memory units. In this technique, Flash controllers use one or more address bits to select a Flash memory unit from a set of Flash memory units, and accesses the Flash memory units on multiple buses. Successive memory accesses, such as successive Flash read operations, can be executed by accessing the multiple Flash units in parallel, thus shortening the overall access time and improving throughput. By accessing the Flash memory units over separate buses, the Flash controllers are also able to reduce memory latency. For example, interleaving two 8-bit Flash devices each operating at 33 MHz would give an effective throughput of 66 MHz. In the prior art, an exemplary solution is described wherein a Flash controller is used to interleave data between two 8-bit Flash units, thereby allowing 16-bits of data to be fetched by accessing the two Flash units in parallel. However, the use of interleaving alone is not sufficient to provide Flash unit access for sequential address access with no wait states as the ratio between the Flash access time and the system clock period increases.
Thus there is a need to provide a general solution for high throughput and low latency Flash memory read access which would be useable by any type of Flash memory. Such a solution should be able to meet the needs of high clock speeds used by current CPUs. The solution should be applicable to embedded Flash access in a System-On-Chip.