1. Field
The application describes control of execution of a program.
High-functional-complexity devices, such as a controller for executing a program in which the controller controls each unit in the devices, have been widely available. The controller of the devices reads a program stored in a memory and executes the program to control each unit in the devices. Further, in accordance with the execution of the program, the controller reads various data from the memory or writes data into the memory as necessary.
For example, a magnetic disk device for recording or reading information onto or from a magnetic disk in response to a request from a host computer includes a built-in controller formed of a processor. The controller executes a program stored in the memory to control a spindle motor of a magnetic disk or a motor of a carriage arm having a head in accordance with a request command sent from the host computer to store data on the magnetic disk or read stored data. Examples of storage devices which store a program executed by a processor and data include volatile semiconductor memories such as a static random access memory (SRAM) and a dynamic random access memory (DRAM). Volatile semiconductor memories provide high-speed reading and writing; however, the contents stored are lost when power is turned off. Thus, in general, a program and data are stored in a low-speed non-volatile storage device or semiconductor memory such as a flash memory or a disk medium, and are loaded onto a volatile semiconductor memory at the power-on time and executed.
With the development of fine processing technologies of semiconductor memories, the degree of integration of memory cells has increased; however, a bit flip caused by a soft error is prone to occur. A soft error stands for a change in the state of data held in memory cells (“1” or “0”), which is caused by the impact of cosmic rays (alpha rays) against a semiconductor memory. The cosmic rays may have reached the earth from outer space through the Earth's atmosphere. For example, a DRAM holds binary bit information as an amount of electric charge accumulated in a memory cell. Thus, if the energy imparted by the impact of cosmic rays causes the amount of electric charge to change, the information is lost. The size of the memory cells decreases as the degree of integration increases, and the number of bits affected by a single cosmic ray impact also increases.
Changes in programs written in SRAMs or DRAMs or data required for the program operations due to bit flips may cause devices to be unable to exhibit the desired performance, and may also cause malfunctions or damage. For example, when a controller operates without control in a magnetic disk device, desired information may not be recorded on a magnetic disk. Additionally, correctly recorded data may be broken, or a magnetic disk may be damaged so as to be in a reading-disabled state.
One countermeasure against bit flips in volatile semiconductor memories is to use error correction code (ECC) memories having an error correction function. However, ECC memories have problems of having a limited number of bits that can be correctly corrected for errors. ECC memories also have high cost because of a built-in high-complexity error correction circuit, and low reading and writing speed because of the access delay caused by the error correction circuit.
Another countermeasure against bit flips is the multiple use of parity-enabled memory space. That is, a parity-enabled memory is used as a volatile memory, and a program and data are written into two locations of a parity-enabled memory or into two parity-enabled memories. If a parity error is detected when the program is read from one of the memories, the program can be read from the other memory. However, the multiple use of memory space can reduce the memory use efficiency, resulting in an increase in cost.
Accordingly, techniques for detecting only a bit flip without using a memory in multiple ways or without performing error correction have been proposed. For example, a technique has been proposed in which the parity or checksum of a program or data is checked for each predetermined number of blocks. A block in which an abnormal state is detected is overwritten with the original in a read-only memory (ROM) or flash memory (see, for example, Japanese Unexamined Patent Application Publication Nos. 2000-132461, 2005-208958, and 2006-72461).
In this technique, however, in a case where a program or data has already been used by a processor before a parity or checksum error is detected, mere overwriting of the program would not be sufficient to recover the processor from the abnormal operation state, and the abnormal operation state may lead to another abnormal operation state. If the device itself is reset every time an error in the stored contents is detected, on the other hand, the overall operation speed is reduced and the device may fail to respond within a required period of time.