The present invention pertains to the field of the architecture of computer systems. More particularly, the present invention relates to computer systems that use a large-block erasable non-volatile semiconductor memory as main memory.
As modern computer programs have become increasingly more sophisticated, modern personal computer systems have also had to become more sophisticated in order to accommodate these computer programs. Computer programs are made up of a larger number of code instructions than they once were and on average, require access to larger files of data that are read from, and written to, when executing the programs.
Typically, the heart of a personal computer system is a central processing unit (CPU) that resides on a microprocessor chip. New microprocessor chips that operate at increasingly high operating speeds are constantly being developed in order to permit personal computers to execute the larger programs in a timely manner. Usually, these microprocessor chips are developed using CMOS (complementary metal-oxide semiconductor) technology. The greatest amount of power consumption for CMOS chips occurs on the leading and trailing edges of clock pulses (i.e. when a clock signal transitions from a low voltage state to a higher voltage state and vice versa).
When the operating speed of the microprocessor is increased, the number of clock pulses in a particular time period increases thereby increasing the power consumption of the microprocessor during this time period. Furthermore, more heat is generated by the microprocessor and must be dissipated in order to prevent the damage of components within the computer system.
Both power consumption and heat dissipation pose serious problems when designing a personal computer system. This is especially true in the case of mobile computers that are typically powered by batteries. The more power that the computer consumes, the less time that the computer can operate off of a given sized battery. Therefore, as the operating speed of the computer is increased, a designer is faced with several unattractive alternatives.
If the same sized batteries are used, then the effective operating time for the computer system must decrease when the operating speed is increased. On the other hand, if the effective operating time is to remain constant then it is necessary to either add additional batteries, thereby increasing the bulk and weight of the computer, or to use an exotic and therefore expensive battery technology (or both).
The trend in mobile computers is towards smaller, faster, less expensive and lighter units. Thus, the need to add additional batteries, or more expensive batteries is a significant disadvantage. This disadvantage is exacerbated by the need to add cooling fans, or to implement other cooling techniques, in order to dissipate the additional heat that is generated by the high speed microprocessors.
Additionally, because the microprocessors are operating at a higher speed, they can execute more instructions in a given amount of time, and therefore can also process a greater amount of data during that period. A bottle neck has developed in computer systems having fast microprocessors that can prevent the higher speed of the microprocessor to be utilized effectively. This bottle neck is the bus (or buses) that provide instructions for the microprocessor to execute and the data that the microprocessor will use when executing the instructions.
If the next instruction to be executed is not available when the microprocessor needs it, then the microprocessor must wait idly (i.e. insert wait cycles) while the required instruction is retrieved and provided to the microprocessor. Furthermore, if the next instruction to be executed requires data that is not immediately available to the microprocessor, the microprocessor must also idle until the data has been retrieved. During this idle time, the microprocessor clock continues to toggle thereby needlessly consuming power and generating heat that must be dissipated.
In order to decrease the frequency with which the microprocessor encounters these wait cycles, many modern high performance microprocessors have a small internal cache, called a primary cache. Instructions that are likely to be executed and data that is likely to be needed by the executing instructions are stored in the internal cache so that they may be accessed immediately by the CPU of the microprocessor.
The sequential nature of computer programs is such that when a particular instruction within the program is executed, it is highly probable that the next instruction to be executed will be the instruction that follows the currently executing instruction. Therefore, when an instruction is to be executed, the cache is checked to determine whether a copy of the required instruction is immediately available within the cache. If a copy of the required instruction is stored within the cache (called a cache hit), then the copy of the instruction can be supplied to the CPU imnmediately from the cache and there is no need for the CPU to wait while the instruction is retrieved to the microprocessor chip from wherever it is stored in the computer system.
On the other hand, if a copy of the required instruction is not stored within the cache (called a cache miss), then the CPU must wait while the instruction is retrieved to the microprocessor chip from wherever it is stored within the computer system. Actually, rather than only retrieving the next instruction to be executed, a cache line is formed by retrieving the next instruction to be executed and a certain number of instructions following the next instruction to be executed. That way, if the subsequent instructions are in fact required to be executed, they will be immediately available to the CPU from within the cache line of the cache. Because of the sequential nature of programs, the benefits of caching also applies to data used by the programs.
Because the internal cache is filled a cache line at a time, many microprocessors can accept data in a burst mode. In a typical burst read, the microprocessor specifies the first address of the data or instructions to be read into a cache line. Then, the data or instructions that are stored at the addresses of the cache line are sent sequentially from where they are stored within the computer system to the microprocessor.
Frequently the internal cache of the microprocessor is formed using static random access memory (SRAM). Because each SRAM cell is formed by six to eight transistors, there is only room on a microprocessor chip for a relatively small SRAM cache. Furthermore, SRAM is volatile meaning that SRAM retains the information stored as long as there is enough power to run the device. If power is removed, the contents of the SRAM cache are lost.
Some microprocessors are dynamic, meaning that if power is removed from them, when power is restored they cannot return directly to the state they were in when the power was removed. When power is restored the microprocessor must be reinitialized, and at least some of the processing progress previously made will probably be lost.
Other microprocessors are static, meaning that they can be placed in an energy saving deep powerdown mode, and then be returned relatively quickly to the state they were in immediately before they entered the deep powerdown mode.
As mentioned earlier, data and instructions are stored within the computer system and provided to the microprocessor over one (or more) bus systems. Because most types of relatively fast random access memory are both volatile and relatively expensive, a typical computer system stores code and data on relatively inexpensive, nonvolatile memory store such as a floppy disk or hard disk.
The typical computer system also has a main memory made of volatile memory because the nonvolatile memory has a relatively slow access speed. When a program is to be executed, the computer system uses a technique known as shadowing to copy the code and data required to execute the program from the slow nonvolatile memory to the faster volatile memory. The shadow copy in the main memory is then used to execute the program. If any changes are made to the shadow copy during the course of the program execution, the shadow copy can be copied back to the slower nonvolatile memory, when the program finishes execution. Furthermore, because an unexpected power failure will cause the contents of the volatile main memory to be lost, it is common to save intermediate results generated during the course of execution of the program.
The most common form of main memory is dynamic random access memory (DRAM). DRAM is more commonly used than SRAM, even though it is slower than SRAM because DRAM can hold approximately four times as much data as a SRAM of the same complexity.
DRAMs store information in integrated circuits that contain capacitors. Because capacitors lose their charge over time, DRAMs must be controlled by logic that causes the DRAM chips to continuously xe2x80x9crefreshxe2x80x9d (recharge). When a DRAM is being refreshed, it cannot be read from, or written to, by the microprocessor. Thus, if the microprocessor must access the DRAM while it is being refreshed, one or more wait states occur.
In some computer systems, SRAM is used as main memory in place of DRAM. One advantage of using SRAM as main memory is that SRAM is relatively faster to access than DRAM. Furthermore, because SRAM does not need to be refreshed, it is always available for access by the microprocessor, thereby eliminating the DRAM associated need for the microprocessor to include wait states when accesses are attempted while the DRAM is being refreshed. Moreover, the lack of a refresh requirement simplifies designing a computer system having SRAM based main memory because one does not have to worry about controlling refresh cycles. In fact, a simple battery back-up can be supplied to preserve the contents of the SRAM in the event of a power failure. Of course, if the battery back-up fails, the contents of the SRAM main memory will be lost.
Rather than building a main memory completely from SRAM, it is more common to implement the main memory using DRAM, and then to supplement the DRAM based main memory with a SRAM based external cache memory (i.e. a cache memory that is external to the microprocessor chip). Because the external cache is not contained on the microprocessor chip, it can typically be made to store more data and instructions than can be stored by the internal cache. Because the external cache is not located on the microprocessor chip, however, it must supply the data and instructions to the microprocessor using one of the buses that often form bottlenecks for data and instructions entering and leaving the microprocessor chip.
A high speed microprocessor chip typically interfaces with the rest of the computer system using one or two high speed buses. The first of these buses is a relatively high speed asynchronous bus called a main memory bus. The second of these buses is a relatively high speed synchronous bus called a local bus. The typical operating speed of main memory and local buses is in the range of 16 to 33 MHz and the trend is towards increasingly faster buses.
Although most microprocessors can interface directly with a main memory bus, some microprocessors do not provide an external interface to a local bus. These microprocessors typically interface with a relatively slow speed synchronous bus called an expansion bus. The typical operating speed of an expansion bus is in the range of 8 to 12 MHz.
The main memory (or DRAM) bus is used by the microprocessor chip to access main memory. Usually, rather than interfacing directly to the DRAM chips, the microprocessor is coupled to a DRAM controller chip that, in turn, is coupled to the DRAM chip or chips. The DRAM controller controls accesses to the DRAM chips initiated by the microprocessor. The DRAM controller also controls overhead maintenance such as the refresh cycles for periodically refreshing the DRAM contents. Some microprocessors have the DRAM controller built directly into them. Frequently, the DRAM or SRAM chips are contained in surface-mount packages and several DRAMs or SRAMs are attached to a small circuit board to form what is called a Single In-line Memory Module (SIMM). One can then relatively easily modify the total amount (or the access speed) of main memory in a computer system by simply swapping one type of SIMM for another. A SRAM based external cache may also be coupled to the microprocessor through the DRAM bus.
If a computer system has a local bus, then the microprocessor can access devices coupled to the local bus at a relatively fast speed. Thus, high bandwidth devices such as graphics adapter cards and fast input/output devices are typically coupled directly to the local bus. Sometimes the external cache is coupled to the local bus rather than to the DRAM bus. It is also possible to supplement (or replace) the main memory on the main memory bus by coupling DRAM to the local bus using a DRAM controller designed to interface with the local bus.
Each device coupled to the local bus has an associated capacitive load. As the load on the local bus is increased, the maximum operating speed for the local bus decreases and the power required to drive the bus increases. Therefore, one device coupled to the local bus can be a peripheral bus bridge from the local bus to another bus called a high speed peripheral bus (e.g. a peripheral component interconnect (PCI) bus). The bus bridge isolates the load of the devices coupled to the high speed peripheral bus from the high speed local bus.
Another device coupled to the local bus is typically an expansion bus bridge that couples the high performance local bus to a lower performance expansion bus. The low bandwidth components of the computer system are then coupled to the lower performance expansion bus. One type of device that is typically coupled to the expansion bus uses flash memory. Flash memory typically is a high-density, nonvolatile, read-write memory. Examples of flash memory based devices include BIOS ROM and hard disk substitutes.
Flash memories differ from conventional EEPROMs (electrically erasable programmable read only memories) with respect to erasure. Conventional EEPROMs use a select transistor for individual byte erase control. Flash memories, on the other hand, achieve much higher density with single transistor cells. For a typical flash memory array, a logical xe2x80x9conexe2x80x9d means that few if any electrons are stored on a floating gate associated with a bit cell. A logical xe2x80x9czeroxe2x80x9d means that many electrons are stored on the floating gate associated with the bit cell. Each bit of the flash memory array cannot be overwritten from a logical zero state to a logical one state without a prior erasure. During a flash erase operation, a high voltage is supplied to the sources of every memory cell in a block or in the entire chip simultaneously. This results in a full array or a full block erasure.
After a flash memory array has been erased, a logical one is stored in each bit cell of the flash memory array. Each single bit cell of the flash memory array can then be programmed (overwritten) from a logical one to a logical zero, given that this entails simply adding electrons to a floating gate that contains the intrinsic number of electrons associated with the erased state. Program operations for flash memories are also referred to as write operations.
The read operation associated with a typical flash memory array closely resembles the read operation associated with other read-only memory devices. A read operation for a typical high speed flash memory array takes on the order of 80 nanoseconds (nS). Write and erase operations for a flash memory array are, however, significantly slower. Typically, an erase operation takes on the order of one second. A write operation for a single word of a flash memory array takes on the order of 10 microseconds.
British patent document no. GB 2 251 324 A, published Jul. 1, 1992, describes a computer system that uses flash memory. The patent document discloses various architectures to incorporate a flash memory into a computer system. One architecture referred to therein is a variable file structure. For the variable file structure, computer code is stored contiguously in flash memory, allowing a CPU to execute computer code directly from the flash memory array without the need for RAM. A direct mapped variable file structure is described that allows direct code execution from all of the flash memory array. A page mapped variable file structure is also described that allows direct code execution from a portion of the flash memory array. Thus, flash memory can serve as the main memory within portable computers, providing user functions similar to those of disk-based systems.
A ROM-executable DOS is available commercially and provides several benefits to both system manufacturers and ultimately end users. First, because most of the operating system is composed of fixed code, the amount of system RAM required to execute DOS is reduced from 50K to 15K, thereby conserving system space and power. Secondly, DOS can now be permanently stored in, and executed from, a single ROM-type of device such as flash memory. This enables systems to be provided that are ready to run right out of the box. Lastly, users enjoy xe2x80x9cinstant onxe2x80x9d performance because the traditional disk-to-DRAM boot function and software downloading steps are eliminated.
For example, by storing application software and operating system code in a Resident Flash Array (RFA), users enjoy virtually instant-on performance and in-place code execution. An RFA also protects against software obsolescence because, unlike ROM, it is in-system updatable. Resident software, stored in flash rather than disk, extends battery life and increases system reliability.
Because erasing and writing data to flash memory is a distinctly different operation than rewriting information to a disk, new software techniques have been developed to allow flash to emulate disk functionality. File management software such as Microsoft""s FLASH FILE SYSTEM (FFS) allows Flash Memory components and flash cards to emulate the file storage capabilities of disk. Microsoft""s FFS transparently handles data swaps between flash blocks similar to the way MS-DOS (MS-DOS is a trademark of Microsof) handles swaps between disk sectors. Under FFS, the user can input a MS-DOS or Windows command without regard for whether a flash memory or magnetic disk is installed in the system. Flash filing systems make the management of flash memory devices completely transparent to the user. Flash filing systems similar to the Microsoft FFS are available or are being developed for other operating systems besides DOS and WINDOWS (WINDOWS is a trademark of Microsoft).
Flash Memory is exceptionally well-suited to serve as a solid-state disk or a cost-effective and highly reliable replacement for DRAMs and battery-backed static RAMs. Its inherent advantages over these technologies make it particularly useful in portable systems that require the utmost in low power, compact size, and ruggedness while maintaining high performance and full functionality.
Flash memory, however, typically has an asynchronous interface wherein an address to be read is specified and then, a set time later, the contents stored at the specified address are output from the flash chip. It is only after the data has been output from the flash chip that the next address to be read can be sent to the flash chip. A high speed bus like the local bus can run at 33 MHz wherein every cycle of the bus takes about 30 nS. A typical high performance flash chip, on the other hand, has a read access time of about 80 nS. Hence, if flash is to be used as main memory, every single memory access to flash involves wait states and zero wait state back to back burst read cycles from flash cannot be supported. This is true for other devices having a read latency similar to that of flash memory. Thus, using prior art technology, it is not practical to use these memories as main memory for a high speed microprocessor.
Therefore, one object of the present invention is to provide an efficient memory hierarchy based on non-volatile memory versus volatile memory wherein both data and applications are stored in random access nonvolatile memory and further wherein applications are executed directly from the random access nonvolatile memory.
It is also an object of this invention to enable flash memory to operate in an optimal synchronous fashion with any synchronous bus.
It is also an object of this invention to enable flash memory to operate in an optimal synchronous fashion with any synchronous bus to provide a low cost, low power alternative to volatile main memory, and to eliminate the time required to transfer code and data from the hard disk to the main memory.
It is also an object of this invention to enable flash memory to operate in an optimal synchronous fashion with any synchronous bus so that the CPU can execute programs directly out of the flash memory without any degradation in performance when compared to volatile memory based main memory.
It is also an object of this invention to enable flash memory to operate in an optimal synchronous fashion with any synchronous bus and to thereby eliminate the need to incorporate costly memory subsystem designs such as interleaving into the system.
It is also an object of this invention to enable flash memory to operate in an optimal synchronous fashion with any synchronous bus and to thereby support back to back burst cycles and thus ensure that cache line fills are performed in a quick and optimal fashion.
It is also an object of this invention to enable flash memory to operate in an optimal asynchronous fashion with any asynchronous main memory bus.
It is also an object of this invention to enable flash memory to operate in an optimal asynchronous fashion with any asynchronous main memory bus to provide a low cost, low power alternative to volatile memory based main memory and to also eliminate the time required to transfer code and data from the hard disk to the main memory.
It is also an object of this invention to enable flash memory to operate in an optimal asynchronous fashion with any asynchronous main memory bus such that the CPU can execute programs directly out of the flash memory without any degradation in performance when compared to volatile memory.
It is also an object of this invention to enable flash memory to operate in an optimal asynchronous fashion with any asynchronous main memory bus and to eliminate the need to have custom controllers.
It is also an object of this invention to enable flash memory to operate in an optimal asynchronous fashion with any asynchronous main memory bus to provide a glueless interface to the existing main memory controller and thus reduces cost and loading on the local bus.
A flash memory chip that can be switched into four different read modes is described. Computer systems and hierarchies that exploit these modes are also described. In the first read mode, asynchronous flash mode, the flash memory is read as a standard flash memory. In this mode, the reading of the contents of a first address must be completed before a second address to be read can be specified.
In the second read mode, synchronous flash mode, a clock signal is provided to the flash chip and a series of addresses belonging to a data burst are specified, one address per clock tick. Then, the contents stored at the addresses specified for the burst are output sequentially during subsequent clock ticks in the order in which the addresses were provided. Alternately, if a single address is provided to the flash chip when it is in the synchronous mode, the subsequent addresses for the burst will be generated within the flash chip and the data burst will then be provided as output from the flash chip.
In the third read mode, asynchronous DRAM (dynamnic random access memory) mode, the flash memory emulates DRAM. Thus, row and column addresses are strobed into the flash memory using row and column address strobe signals. The flash memory then converts the row and column addresses internally into a single address and provides as output the data stored at that single address. Furthermore, although the flash memory does not need an extended precharge period or to be refreshed, when in the asynchronous DRAM mode, the flash memory responds to precharge periods and refresh cycles as would a DRAM. Therefore, when in the asynchronous DRAM mode, the flash memory can be controlled by a standard DRAM controller.
In the fourth read mode, synchronous DRAM mode, the features of the second and third modes are combined to yield a flash memory that emulates a synchronous DRAM. Thus, addresses to be read as a data burst are specified by strobing row and column addresses into the flash memory using RAS and CAS signals. The data of the data burst is then provided sequentially as output from the flash memory on subsequent clock ticks.
Other objects, features, and advantages of the present invention will be apparent from the accompanying drawings and from the detailed description which follows below.