1. Field of the Invention
The invention relates to data validation. Specifically, the invention relates to apparatus, systems, and methods for managing data for optimized data error recovery for errors in prefetched data.
2. Description of the Related Art
Computer data is frequently stored in primary memory to provide high speed access to the data for other computer devices such as secondary storage controllers, Central Processing Units (CPUs), and the like. Such data is often critical to the proper operation of various computer applications. Data that includes any errors may be unacceptable. Mission critical operations such as financial or business transactions may be aborted if a computer system can not recover from such errors, and must provide adequate assurances that the data retrieved from primary memory is exactly the same as the data that was originally stored in the primary memory.
Referring now to FIG. 1, illustrated therein is a representative system 1100 that prefetches data. Specifically, the system 100 prefetches data from primary memory. The system 100 includes a CPU 102 that communicates with a plurality of peripheral devices or modules by way of a host bridge 104 and controllers 106a-c coupled to a communication bus 108. The host bridge 104 allows the CPU 102 to operate with a different clock speed than the communication bus 108, and still operably communicate with peripheral devices.
The controllers 106a-c provide a common interface between the CPU 102 and the peripheral devices. The controllers 106a-c include one or more communication modules configured to communicate over the communication bus 108 using well known protocols such as the Peripheral Component Interconnect (PCI) protocol. Alternatively, the communication bus 108 may implement other communication protocols and associated interfaces including Video Electronics Standards Association (VESA), Industry Standard Architecture (ISA), or the like.
The CPU 102 may communicate with a peripheral device through a controller 106a-c. Alternatively, one peripheral device may communicate with another peripheral device through the CPU 102, or two peripheral devices may communicate directly with each other. For example, in Direct Memory Access (DMA), a peripheral device such as a hard drive coupled to a disk controller 106b may exchange data with a memory array 110 through a memory controller 106a without using the CPU 102.
The memory array 110 generally comprises one or more separate chips in communication with the memory controller 106a. Primary memory technology provides Random Access Memory (RAM) for high speed access to data stored at any specific address within the memory array 110. Data is stored in the memory array 110 in row-column format. A variety of memory technologies may be used to implement the memory array 110 and memory controller 106a. Well known memory types include Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Fast Page Mode Dynamic Random Access Memory (FPM DRAM), Extended Data-Out Dynamic Random Access Memory (EDO DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), Rambus Dynamic Random Access Memory (RDRAM), and the like.
Typically, an individual memory cell stores a data bit using a capacitor. The capacitors are periodically refreshed to compensate for leaking electrical charge. Unfortunately, this technology may occasionally result in a memory cell changing the value originally written to the memory cell (one to zero or zero to one). Other factors may also cause memory cell errors. Modern memory chips are susceptible to errors, also known as memory faults. Conventionally, memory errors are infrequent; however, for mission critical data, even an infrequent error may prove to be very costly.
To compensate for errors introduced into data stored in primary memory, memory controllers 106a include one or more error validation modules and error correction modules. For example, a memory controller 106a typically includes an Error Correction Code (ECC) module. The ECC module detects and corrects single bit errors found in data stored at a specific address in the memory array 110. If two or more bits are in error, the ECC module can detect such a condition; however, the ECC module is unable to correct the error because of the ambiguity the multiple bit error produces.
Consequently, if a conventional ECC module or other data validation and correction module identifies a multi-bit error in the data, an interrupt signal 112 is immediately generated. The interrupt signal 112 notifies a device or module that requested the data (the “requester”) that an uncorrectable error has occurred. Alternatively, as depicted in FIG. 1, the interrupt signal 112 may be sent to a CPU 102 or other controller for the system.
As used herein, “requester” means a device or module that makes a request for data from another device or module of the system 100 such as another controller 106b-c or a CPU 102. As used herein, an “uncorrectable error” means an error in a packet of data stored in the memory array 110 wherein more than one bit is different from the value for the bit originally stored in the memory array 110. Further, an uncorrectable error may be error that can not be corrected by the memory controller 106a. 
Once the memory controller 106a signals an uncorrectable error, the requester, typically the CPU 102, initiates an error recovery process. The error recovery process may include a variety of techniques that attempt to correct the data in the data packet with the uncorrectable error. Typically, the recovery techniques progressively increase in the amount of delay caused by the disruption. Each recovery technique is attempted in turn until the data is recovered. Examples of some recovery techniques may include a subsequent attempt to retrieve the data from the memory array 110, a comparison of the data packet with the uncorrectable error to other accurate data, or the like. Eventually, a data recovery process may involve retrieving a backup copy of the data having an uncorrectable error from a secondary storage location such as a disk drive, tape drive, or the like.
Data recovery interrupts the flow and timing of an operation or transaction. The overhead (processing time, media mounts, and data transfers) required to perform data recovery may cause significant delays. In certain cases, data recovery may not be possible and may require that a particular request for data be aborted.
Generally, the communication protocol for the communication bus 108 allows a requester such as the peripheral device or CPU 102 to retrieve data from the memory array 110 using an open-ended request. In an open-ended request, the peripheral device or CPU 102 provides a starting address within the memory array 110 and the memory controller 106a continuously sends sequential data packets beginning at the starting address until the data requester signals the memory controller 106a to stop. In this manner, data may be rapidly transferred, because time is saved by not communicating an ending address.
Typically, the memory controller 106a retrieves data from the memory array 110 faster than the data can be transmitted over the communication bus 108 to the requester. Consequently, the memory controller 106a prefetches data from the memory array 110 into an internal First-In-First-Out (FIFO) buffer, discussed in greater detail below. As data is stored in the FIFO buffer, the memory controller 106a validates and corrects any correctable errors in the data. If an uncorrectable error is discovered, the memory controller 106a sends an interrupt signal 112 to the CPU 102.
Typically, all the prefetched data in the FIFO buffer is not transferred to the requester. The memory controller 106a prefetches more data than the requester desires because the ending address in the memory array 110 is undefined. In addition, the memory controller 106a prefetches data faster than the requester can receive the data. When the requester signals the memory controller 106a to stop sending data, the FIFO buffer may be about fifty percent or more filled with prefetched data that will not be used and is discarded. The prefetched data allows for higher data transfer rates between the memory controller 106 and the requester.
Conventional memory controllers 106a validate and correct prefetched data as the data is stored in the FIFO buffer. If data that includes an uncorrectable error is stored in the FIFO buffer, a conventional memory controller 106a immediately initiates a data recovery process. In this manner, conventional memory controllers 106a ensure that accurate data is stored in the FIFO buffer. Unfortunately, the prefetched data in the FIFO buffer is typically not all transmitted to a requester. Consequently, certain prefetched data that is never transferred to the requester may include uncorrectable errors that needlessly initiate error recovery. The additional processing overhead required to conduct error recovery for unused prefetched data slows the system 100.
Accordingly, what is needed is an improved apparatus, system, and method that overcome the problems and disadvantages of conventional systems 100. In particular, the improved apparatus, system, and method should identify prefetched data that contains an uncorrectable error and initiate error recovery for prefetched data that is actually used by a requesting device, module, or application. Furthermore, the apparatus, system, and method should identify the address in a memory array for data that contains an uncorrectable error to facilitate data recovery. Such an apparatus, system, and method is provided herein.