Accurately predicting the reliability and remaining life of complex components is of paramount importance to designing, building, and fielding highly reliable computer systems. Sockets, interposers, Vertical Cavity Surface-Emitting Laser (VCSEL) arrays and other complex components each consist of many similar or substantially physically identical sub-components, all or most of which must work consistently for the system in which they are installed to work. Although physically substantially identical, in some instances the subcomponents may perform different functions and be subjected to different operating conditions (e.g. AC voltage for signal, DC voltage for power). Because subcomponents are exposed to differing operating conditions, they may also be expected to have differing reliability expectations.
The traditional approach for the reliability analysis of complex components assumes that the sub-components have independent and identically distributed failures. After performing accelerated tests, the results are fitted to a statistical distribution, extrapolating distribution parameters, confidence intervals and trends from the data. Only the time to first sub-component failure is considered. As a result, current methods for assessing the reliability and remaining life of complex components have been inadequate. Specifically, prior methods analyze the complex component as a whole, estimating the reliability from accelerated tests that record the time to first failure using component level reliability analysis tools such as Weibull and Lognormal distribution plotting. In the context used by prior methods a failure is defined as an instance wherein a critical sub-component is out of a predetermined range for acceptable operation. Such an approach has required an otherwise unnecessarily large sample size and longer test time requirements. By ignoring information ascertainable from a system-view of the complex component, useful information about actual field reliability can be overlooked and may lead into making incorrect assessments of reliability and reliability trends. The useful information that is ignored may include intermittent and recovery behavior, multiple subcomponent failures, and trends which combined may provide data about field reliability and remaining life.
For example, in a prior approach to estimating the reliability of a complex component, accelerated tests were performed, and the time-to-first sub-component failure was recorded for each complex component tested. The resulting data was plotted using Weibull or Lognormal distributions, and the distribution parameters were estimated by least squares, Maximum Likelihood Estimate (MLE), or other methods. Any subsequent failures, changes, intermittent behavior, and/or degradation of the first failed sub-component and other sub-components were ignored. Because of the key information that is ignored in this prior approach and given that sub-components can have time dependent and time-varying behavior, potential advantages of competing technologies is not accurately determined.
There is therefore a need for a method of accurately establishing a reliability profile for complex components that does not ignore the variations among the individual sub-components, sub-component functions and performance.