1. Field of the Invention
The present invention relates to an embeddable Flash memory system for non-volatile storage of code, data, and bit-streams for embedded FPGA configurations. More specifically, the invention relates to a memory system integrated into a single chip together with a microprocessor and including a modular array structure comprising a plurality of memory blocks.
2. Description of the Related Art
As is well known in this specific technical field, the continuous size and price reduction in hand-held digital equipment together with demanding computing performance and low power constraint for consumer applications is increasing the need for a technology that combines high performance digital CMOS transistor and non-volatile flash memory.
For instance, an efficient power block for a memory device is disclosed in the article by R. Pelliconi, D. Iezzi, A. Baroni, M. Pasotti, P. L. Rolandi, “Power efficient charge pump in deep sub micron standard CMOS technology,” Proceedings of 27th ESSCIRC, pp100-103, September 2001.
At the same time raising costs of mask sets and shorter time-to-market available for new products are leading to the introduction of systems with a higher degree of programmability and configurability, such as system-on-chip with configurable processors, embedded FPGA, and embedded flash memory.
In this respect, the availability of an advanced embedded flash technology, based on NOR architecture, together with innovative IP's, like embedded flash macrocells with special features, is a key factor.
For a better understanding of the present invention reference is made to the Field Programmable Gate Array (FPGA) technology combining standard processors with embedded FPGA devices.
These solutions enable configuration of the FPGA at deployment time with exactly the required peripherals, exploiting temporal re-use by dynamically reconfiguring the instruction-set at run time based on the currently executed algorithm.
The existing models for designing FPGA/processor interaction can be grouped in two main categories:                the FPGA is a co-processor communicating with the main processor through a system bus or a specific I/O channel;        the FPGA is described as a functional unit of the processor pipeline.        
The first group includes the GARP processor, known from the article by T. Callahan, J. Hauser, and J. Wawrzynek entitled: “The Garp architecture and C compiler” IEEE Computer, 33(4):62-69, April 2000. A similar architecture is provided by the A-EPIC processor that is disclosed in the article by S. Palem and S. Talla entitled: “Adaptive explicit parallel instruction computing”, Proceedings of the fourth Australasian Computer Architecture Conference (ACOAC), January 2001.
In both cases the FPGA is addressed via dedicated instructions, moving data explicitly to and from the processor. Control hardware is kept to a minimum, since no interlocks are needed to avoid hazards, but a significant overhead in clock cycles is required to implement communication.
Only when the number of cycles per execution of the FPGA is relatively high may the communication overhead be considered negligible.
In the commercial world, FPGA suppliers such as Altera Corporation offer digital architectures based on U.S. Pat. No. 5,968,161, issued to T. J. Southgate, entitled: “FPGA based configurable CPU additionally including second programmable section for implementation of custom hardware support”.
Other suppliers (Xilinx, Triscend) offer chips containing a processor embedded on the same silicon IC with embedded FPGA logic. See for instance U.S. Pat. No. 6,467,009, issued to S. P. Winegarden et al., entitled: “Configurable Processor System Unit”, and assigned to Triscend Corporation.
However, those chips are generally loosely coupled by a high speed dedicated bus, performing as two separate execution units rather than being merged in a single architectural entity. In this manner the FPGA does not have direct access to the processor memory subsystem, which is one of the strengths of the academic approaches outlined above.
In the second category (FPGA as a function unit) we find architectures commercially known as: “PRISC”; “Chimaera” and “ConCISe”.
In all these models, data are read and written directly on the processor register file, minimizing overhead due to communication. In most cases, to minimize control logic and hazard handling and to fit in the processor pipeline stages, the FPGA is limited to combinatorial logic only, thus severely limiting the performance boost that can be achieved.
These solutions represent a significant step toward a low-overhead interface between the two entities. Nevertheless, due to the granularity of FPGA operations and its hardware-oriented structure, their approach is still very coarse-grained, reducing the possible resource usage parallelism and again including hardware issues not familiar nor friendly to software compilation tools and algorithm developers.
Thus, a relevant drawback in this approach is often the memory data access bottleneck that often forces long stalls on the FPGA device in order to fetch on the shared registers enough data to justify its activation.