Several known predictors of hard disk electromechanical failure include reallocated sectors, reallocated event counts, current pending sector counts, and medium errors.
When a physical problem is encountered when trying to read data from a hard disk, and multiple read attempts fail, the hard disk experiences a medium error. Medium errors can be classified as a “real medium error” or a “head failure”. A real medium failure indicates a marginal disk platter or a loss of proper magnetic properties. A head failure occurs when the read/write head has deteriorated. Conditions that may cause such an error are external conditions (e.g. dust) physically harming the disk head, imprecision in the physical write location, or improper alignment. A sudden power failure may also cause a medium error, but this is typically limited to one sector in most cases. Most medium errors are head failures or a defect on the magnetic medium.
Although medium errors have been studied as predictors of disk failure, only one particular count of medium errors have been used as a predictor of disk failures, and involved using a single threshold rather than an aggregate of the data. Traditionally, the one particular count of medium errors include an initial non-zero medium error count (NMEC) or a particular threshold NMEC. Thus, using only one particular count of medium errors, such as the NMEC, and a single threshold as a predictor of disk failures is an incomplete method of predicting disk failures with a limited predictive accuracy. Accordingly, what is needed to overcome the described shortcomings is a method for using a conditional Markov chain to model the evolution of medium errors until the death of the disk to more accurately monitor and predict disk failures.