Not applicable.
Not applicable.
1. Field of the Invention
The present invention generally relates to data storage in computer systems. More particularly, the present invention relates to the validation of data that has been stored in random access memory during periods when the data is susceptible of becoming corrupted, or in mission critical computer systems where tolerance for error is low.
2. Background of the Invention
Almost all computer systems include a processor and a system memory. The system memory functions as the working memory of the computer system, where data is stored that has been or will be used by the processor and other system components. The system memory typically includes banks of dynamic random access memory (DRAM) circuits. According to normal convention, a memory controller interfaces the processor to a memory bus that connects electrically to the DRAM circuits. The system memory provides storage for a large number of instructions and/or a large amount of data for use by the processor, providing faster access to the instructions and/or data than would otherwise be achieved if the processor were forced to retrieve data from a disk or drive.
Because system memory typically is constructed of dynamic random access memory circuits, the contents of the memory are volatile. To preserve the integrity of the data stored in system memory, a periodic refresh signal must be sent to each memory cell to refresh the voltage levels of each cell, where the data is stored. Failure to timely refresh the memory cells of system memory causes the data to be lost. Thus, when power is turned off to a computer, the contents of system memory are lost. Data that is to be stored long-term on a computer system thus is stored in other non-volatile memory devices. Most computer systems include a hard drive which is capable of permanently storing data on magnetic tape. Other removable drives, such as zip drives, CD-ROMs, DVD-ROMs, and the like, may also be used for long-term storage of data. In these types of media, the data is preserved, even when power is removed from the computer system.
Almost all portable computers, and some desktop computers, may be placed in a low power state to preserve power. Preservation of power is especially important in portable computers, where operating power may be provided from batteries. To extend the life of batteries in portable computers, and thus extend the amount of time that a user can operate a portable computer without recharging the batteries or finding an electrical source, most portable computers are capable of going into a sleep mode where minimal power is consumed. The sleep mode permits the computer system to be placed in standby, so that operation can resume when the user is ready, without requiring the system to boot.
As power management of portable computer systems has evolved, two different low power modes have been developed and used commercially. The first low power mode is known as xe2x80x9chibernationxe2x80x9d or xe2x80x9chibernation to disk.xe2x80x9d In this mode, which is the lowest power mode of the computer system other than power-off, the computer system consumes minimal energy. The hibernation mode can be analogized to a no-power bookmark of the existing state of the computer system. When the hibernation mode is entered, the system hardware state is copied to the hard drive. Because the hard drive is non-volatile memory, all power can then be removed from the system. Upon resume, the entire system state is copied from the hard drive image and restored to system memory and to the devices whose state was copied. Hibernation to disk typically is referred to as the xe2x80x9cS4xe2x80x9d state by the ACPI nomenclature.
In hibernation mode, the system memory (or RAM) is not powered. Hibernation to Disk has been referred to as xe2x80x9cZero volt suspendxe2x80x9d because no power is required to sustain the system contents. Thus, the data in system memory is no longer available once the system enters the hibernation mode because the memory cells are not refreshed. When resuming from hibernation, a delay period is encountered as the working data is reloaded from the hard drive back to the system memory. The time required to access data from the hard drive is significantly longer than accessing data from system memory. Thus, there is a perceptible delay that occurs when data is loaded form the hard drive to the system memory after the hibernation mode is exited. In many instances the resume process from hibernation mode can take between 30 seconds to 1 minute, as the system memory and system devices are completely restored from the relatively slow hard drive memory.
Conventional Hibernation to Disk is implemented by powering down the system in response to a system event. The system event can be the manual selection of an icon or menu entry, the selection of one or more keys, or system inactivity. Because the hibernation mode results in the removal of power, the context of all system peripherals is read and then stored to the hard drive. Next, the contents of the system memory are copied to the hard drive. A hard drive file that is equal to the size of the memory to be stored is created, which holds a mirror image of the system memory. After the contents of system memory are backed up, a flag is set in non-volatile memory indicating that the system context has been completely saved. Once the flag is set, the power is removed causing the contents of volatile memory (such as DRAM and the context of peripheral devices) to be lost. When the system resumes operation, the system BIOS or operating system polls the nonvolatile flag bit that indicates that the hard drive contains valid system context. If the flag bit is set, the BIOS or operating system restores the system context from the hard drive before resuming system operation.
The second low power state is referred to as the xe2x80x9csuspendxe2x80x9d mode or xe2x80x9cSuspend to RAMxe2x80x9d mode. In the suspend mode, the system memory remains powered while the system is taken to a non-operational state. The advantage of keeping the system memory powered is that when operation is resumed, the system is ready within a very short period for operation, in the state last used by the operator. Thus, resuming from a suspend mode only takes a few seconds, because very little system context is moved. Suspend to RAM generally is preferred as a bookmark feature because of its xe2x80x9cinstant onxe2x80x9d low latency resume time. Suspend to RAM is also called the S1, S2, or S3 power state by the ACPI nomenclature.
Conventional Suspend to RAM works by stopping the clocks to the system, while leaving the entire system power on. Because the power used by the system depends on the system clock speeds, removing the clock signals significantly lowers the system power. Suspend to RAM often is referred to as xe2x80x9cPower on Suspend.xe2x80x9d When the system resumes operation from Suspend to RAM, the clocks may simply be started to restore system operation. Another form of Suspend to RAM stores the context of certain system devices to system memory. Examples of the device contexts that may be saved include peripherals such as audio controllers, the state of the processor, the contents of the processor cache, and the like. Once the context of these devices is stored to system memory, the clocks to those devices are stopped and power is removed. The system memory, however, remains powered to maintain its contents. To resume operation, the system BIOS or operating system restores the context of the peripherals from system memory, and then system operation is resumed.
The hibernation mode has been preferred because little or no power is consumed while the system is in this state. Recent improvements in the circuitry used for Suspend to RAM, however, have minimized the power drain that occurs in suspend mode. However, Hibernation to Disk still has a key integrity advantage over Suspend to RAM, because Suspend to RAM relies on the use of volatile DRAM memory. If power is lost to the DRAM during suspend mode, the system context is lost, and the user may lose work or data. Also, DRAM is inherently subject to data corruption because the DRAM cells must be periodically refreshed to maintain a charge on very small capacitors that represent each data bit. A leaky cell, high temperature, or electromagnetic interference can invalidate the contents of the DRAM. These or other conditions may cause the DRAM contents to become corrupted while the system is in suspend mode.
Traditionally, the use of either Suspend to RAM or Hibernation to Disk have been exclusive, so that only one of these techniques is implemented as the low power state in a computer system. Recently, the IBM 600 portable computer advanced an idea marketed as xe2x80x9cRedisafe,xe2x80x9d in which Suspend to RAM was used, but the system contents also were stored redundantly to the hard drive. In the event that the system loses power while in suspend mode, the system BIOS restores the system contents from the hard drive. If power is not lost, the system resumes operation from system memory. Thus, the Redisafe system provided a redundant backup copy of the system memory, thereby protecting the user from a power loss, while still preserving the lower latency of the Suspend to RAM mode if power was not lost.
While this approach has some advantages over the previous low power modes, it still does not protect the user from the potential of hardware problems that may result during a Suspend to RAM. The IBM system relies solely on detecting a loss of power during suspend mode, and does not gauge the integrity of the DRAM contents after the resume is completed. Thus, while the IBM system takes measures to insure the integrity of system memory in the event of a power failure, it does not consider the validity of the data itself.
It would be desirable if a system could be developed that would minimize latency to the extent possible for a low power mode of a computer system. It would also be advantageous if a computer system provided a low power mode which could be resumed quickly in the event that the contents of system memory were valid, but which used a copy of data that had been saved to non-volatile memory in the event that the data in system memory was not valid. Despite the apparent advantages such a system would offer, to date no such system has been developed.
The present invention solves the deficiencies of the prior art by implementing a low power mode in a computer system that stores a copy of the data in system memory to the hard drive prior to entering the suspend mode. The system supports a quick resume from suspend if the data in system memory is valid. If the data in system memory is not valid, then the system causes the data to be restored from the hard drive. Thus, the system supports a quick resume, while also supporting a system that insures data integrity in the suspend mode. To minimize the amount of data that must be reloaded in the event the data is corrupted, the system memory may be partitioned into smaller blocks or pages that can be validated independently.
According to one exemplary embodiment of the present invention, error checking and correction memory is used as the system memory. Prior to entering a Suspend to RAM state, the system stores a backup copy of the system memory and other context information to the hard drive. When the system resumes from the suspended state, the CPU reads system memory. If error checking and correction memory is implemented, appropriate ECC logic will examine the data read from memory, and if errors are detected, the ECC logic will generate a non-maskable interrupt (NMI). An algorithm executing on the CPU acknowledges the NMI, and identifies the memory address being read which caused generation of the NMI. The CPU then reads the backup copy of that address range from the hard drive, and restores that memory range to the system memory, as a substitute for the invalid data in system memory. This operation is repeated until all data in system memory is examined.
As an alternative embodiment, the present invention may be used in systems that do not implement ECC memory, by having the CPU or some other programmable device perform the error checking of system memory. In this embodiment, the CPU detects initiation of a low power state, and reads each page of memory. For each page of memory, the CPU calculates a signature for that page. The signature may represent a checksum value, a cycle redundancy check (CRC) value, or any other appropriate signature that can be used to later verify the validity of the data upon exiting from a low power mode. After the signature is calculated, that page of memory is saved to the hard drive. The signature value also is saved to either non-volatile memory or to volatile memory. Thus, the signature may be saved to static RAM, the hard drive, or to system memory. This process is repeated until a signature is calculated for each memory page, and the memory page and signature have been saved. When the system resumes from suspend mode, the CPU reads a page of system memory and calculates the signature. The calculated signature is then compared with the saved signature value. If the signatures match, the data for that page is assumed valid. If the signatures do not match, the data in that page is assumed to be invalid, and the CPU then restores the backup copy for that page from the hard drive. This process is repeated until all pages are validated or replaced.
The present invention also may be configured to run in the background after operation is resumed from a low power mode. In that event, the page translation tables are programmed to respond with a Page Fault interrupt if an access is directed to a section of memory that has not yet been validated. In response to the Page Fault Interrupt, an algorithm executing on the CPU determines if the Page Fault interrupt was generated because data had not been validated, or because the application software had not yet utilized the memory. If the algorithm determines that this memory address has not been validated, then the algorithm proceeds to validate that page of memory, and preferably all other pages in that Page Directory.
These and other aspects of the present invention will become apparent upon analyzing the drawings, detailed description and claims, which follow.