This invention relates generally to methods for establishing criteria to automatically recognize hardware performance problems. Various configurations of the present invention are useful for monitoring mechanical devices, including, but not limited to, aircraft engines.
Most complex mechanical devices include sensors at discrete points within the device that provide data to allow confirmation of normal performance and/or recognition of aberrant behavior signaling a need for maintenance. For example, a car may include an odometer, temperature gage, battery gage, and other sensors that can be used to recognize impending failures. (A battery gage, for example, can indicate a failure of an alternator.) One known practice is to generate trends of sensed measurements and to use these trends to recognize the beginning stage of a problem. For example, one can compute a gas mileage obtained for a car at each fill-up and watch the trend of this mileage over time. Generally speaking, problems can be suggested when a variable (for example, gas usage per unit mileage) exceeds an identified usage, whenever the same variable begins to trend towards a predefined limit, or whenever the same variable shows a sudden, sustained shift in value.
Reviewing these trends for a single device, such as a car, is not particularly labor-intensive and thus can be done manually. However, monitoring hundreds or thousands of machines (such as a fleet of aircraft or all networked computers in a medium to large company) can be very labor-intensive and error-prone. It is difficult for an analyst to review hundreds of device trends, day after day, and the analyst's concentration may be lost at various times during a review. It has been reported, for example, that such problems sometimes occur in hospitals, where employees spend long hours reviewing x-rays to detect medical problems such as breast cancer.
Another difficulty that sometimes arises is measurement noise that results from inaccurate sensors, varying environmental conditions, sensing at irregular intervals, etc. Such problems are well-known to anyone who has monitored the gas mileage of his or her own car, especially when data is used from partial tank fill-ups.
In some known methods for analyzing trends, a computer is used to monitor a large amount of equipment. The computer is used to perform calculations, generate plots of parameter trends, and determine whether any parameters have exceeded predetermined or otherwise specified limits. Smoothing, such as exponential smoothing or the use of running averages, is sometimes used to reduce variations in data that result from sensor noise. In some cases, computer programs are provided and used to identify trends that should be reviewed by a human analyst. Tests used by these programs may include comparing raw trended values to upper and lower limits, comparing a smoothed value to (possibly different) limits established for the smoothed traces, and comparing a difference between a current raw trended value and a prior smoothed value to a “shift limit” to thereby recognize sudden changes in performance. The programs may require that a trend shift be repeated by two or more readings before being enunciated to reduce the probability of false alarms resulting from sensor noise. An enunciation of a potential problem is often referred to as an “alert.” Alerts are used to highlight trends that should be used by a human analyst to check for equipment malfunctions. The use of alerts can reduce the need for routine manual review of many trends that do not exhibit anomalous behavior.
Alerts limits must be carefully selected in some applications. For example, if alert limits are set too loosely, faults that might otherwise have been detected by trend analysis may go undetected. On the other hand, alerts that are set too tightly may result in a loss of productivity and increase the probability that problems may go undetected due to loss of concentration on the part of a human analyst.
Some alerts are associated with so-called “hard limits,” i.e., bounds that must not be breached. For example, a device being monitored may have an absolute operation temperature limit. In this case, if a sensor value exceeds the temperature limit, it is obvious that an alert must be triggered. However, beyond this application of common sense, trial and error has been used for the selection of limits used for enunciating alerts.