This invention relates to the detection of invalid data elements stored in an array. More specifically, this invention pertains to a self-auditing protection method for a sorted array in which corrupt and/or duplicate data elements are detected and deleted during traversal of the sorted array.
In order to maximize efficiency and minimize delays associated with information retrieval, data elements within an array are typically "sorted" or arranged in numerical or alphabetical order. Many computer applications and operating systems include a built-in sorting program such as Quicksort, published in 1962 by C.A.R. Hoare, or computer sorting routines such as: (1) bubble sort; (2) insertion sort; (3) merge sort; (4) selection sort; and (5) shell sort. However, if the sorting process is prematurely terminated, before each data element has been moved to its appropriate array position, data elements may be duplicated or corrupted.
There are two existing techniques in the prior art which attempt to solve the above-identified problem. The first technique uses cyclic redundancy (check) code ("CRC") in conjunction with data elements in an array. In short, CRC auditing involves the calculation of a CRC value based upon a formula which uses the data contained in each array element as variables. The result of that calculation, the CRC value, is then appended onto the end of each data element. Various calculation methods can be used to generate the CRC value; however, the CRC calculation method is generally chosen to optimize error detection capability. In order to check for an error in the data element, the calculation is performed again and compared with the stored answer, the CRC value. If the two calculations are different, an error is detected.
For large data elements, generating the CRC value is time consuming and for small data elements, the CRC algorithm is susceptible to erroneous detection of duplicates. Further, either of these problems can be aggravated depending on the selected CRC algorithm.
Another technique involves the use of doubly-linked lists. Doubly-linked lists provide a way to manage data that is not stored sequentially in a computer. In essence, a doubly-linked list contains three parts, the data itself, a first number (ie. pointer) which identifies the location of the previous item on the list, and a second number (i.e. pointer)which identifies the location of the next item on the list. However, the use of linked lists requires a relatively complex resource allocation scheme and a complex auditing mechanism. In addition, the recovery of broken linkages (i.e. when the pointer does not properly identify the location of the next or previous item on the list) is time-consuming and requires complex data analysis to recover all possible breakages and resource losses. Furthermore, the process which is traversing the doubly-linked list cannot proceed if errors are encountered.
In many data intensive applications sorting techniques requiring the use of CRCs or doubly-linked lists are unacceptable. A new system for detecting corrupt data elements stored in a computer as an array of data elements is necessary.