The present invention relates to data storage in data processing devices. More particularly, the present invention relates to methods and apparatus useful in data storage to detect and isolate data corruption due to buffer overwrites.
There is a large body of previous work regarding buffer and memory management and techniques useful for insuring memory integrity and detecting errors.
Cyclical Redundancy Checking (CRC), checksums, storage of xe2x80x9cmagic numbersxe2x80x9d in a data set, etc., are all known methods for attempting to detect memory corruption.
One situation in which memory corruption can occur is out-of-bounds memory overwrite. In many programming or operating system environments, it can be difficult to ensure that all machine-executed memory writes only write to memory allocated to the particular process or subroutine doing the writing. Consider a case where a physical memory is divided up into n separate buffers, B1 through Bn. Oftentimes, some or all of these buffers may be stored essentially contiguously (though not necessarily in order), one after the other, in the memory space.
In some logic system environments, it is difficult to ensure that every memory write to a buffer neither exceeds the buffer size nor falls outside a particular address range. In these systems, a write to buffer B5 may cross the buffer boundary and overwrite data in an adjacent buffer (such as B3). Oftentimes, this erroneous overwrite may not be detected until a subsequent attempt to access data in the overwritten buffer. At that point, it is difficult to determine which process caused the erroneous overwrite and even the attempt to do so may involve a possibly unsuccessful complex recreation of the error condition in a lab, followed by debugging.
The present invention may be understood in the context of a buffer or memory segment system in a memory storage (such as a RAM) or recording device (such as a disk drive). While the invention is discussed in terms of buffers, the invention has applications to similar memory structures, however named.
According to specific embodiments, the invention involves one or more buffers or segments having at least three parts: an initial area (usually that contains traditional link and system information, which may be referred to as a first header), a middle area that contains the data part of the buffer and possible other buffer structures, and a final area that contains essentially the same information as the initial area (which may be referred to as a second or redundant header).
According to the invention, after the data part of the buffer is written to, the buffer is checked by comparing relevant parts of the final portion to the first portion. If this compare indicates that the final portion is different from the first portion, an overwrite error may have occurred. If the overwrite data amount was longer than the second header, one or more buffers subsequent in the address space may have been erroneously overwritten. The overwrite error is thus both detected and isolated during the operation that caused the error.
This compare according to the invention is especially important for legacy algorithms that use arithmetic pointer calculations in a loop of some kind. In various embodiments, how frequently a compare is performed can be varied in different system designs.
In a further embodiment of the invention, a buffer can be repaired. This can particularly be done when a buffer is requested from a queue manager. In such a case, a validation (such as CRC, a checksum, or any other type of data integrity validation) is performed on the first and/or last parts of the buffer, and the section validated is then copied into the corrupted section. While this repair will not correct all overwrite errors, many simpler errors can be corrected and thereby the mean time between failures (MTBF) desirably will increase.
In a specific embodiment, the first portions and final portions of buffers according to the invention normally will be identical unless an overwrite error has occurred. This will lead to simpler design. However, it will be understood to practitioners in the art from the teachings herein that what is important is that a compare of the first and last portions should indicate if an overwrite has occurred. Thus data may be encoded or formatted somewhat differently in each portion, as long as the essential data is present or can be derived to perform a compare and/or a repair.
While the invention has particular applications in the field of legacy real-time embedded software systems, using the teachings provided herein, it will be understood by those of skill in the art, that the methods and/or apparatus according to the present invention can be advantageously used in other data storage situations. It also will be understood from the teachings herein that the invention can be adapted to many different memory systems and buffer structures, including structures where segments are, for example, not absolutely contiguous.
The invention will be better understood with reference to the following drawings and detailed descriptions. In different figures, similarly numbered items are intended to represent similar functions within the scope of the teachings provided herein. In some of the drawings and detailed descriptions below, the present invention is described in terms of the important independent embodiment of a multimedia message system. This should not necessarily be taken to limit the invention, which, using the teachings provided herein, can be applied to other data accessing situations.