Disclosed is a system and method for the analysis of event data that enables analysts to create user specified datasets in a dynamic fashion. Performance, equipment and system safety, reliability, and significant event analysis utilizes failure or performance data that are composed in part of time-based records. These data identify the temporal occurrence of performance changes that may necessitate unscheduled intervention like maintenance events, or other actions to mitigate or compensate for the observed changes. The criteria used to prompt a failure or performance record can range from complete loss of function to subtle changes in performance parameters that are known to be precursors of more severe events. These specific criteria applied to any explicit specific application and this invention is relevant to this type of data taxonomy.
The concept of failure or performance is not universally defined. In the case of machinery by way of example only, it depends on the equipment or system itself, its mission or role, the applied monitoring technology, and the risk appetite of the system owner. Subsequently, reliability data or failure or performance data is more accurately labeled as “event” data that can relate to safety incidents, insurance, situations requiring maintenance or operational changes, the occurrence of precursor conditions from condition monitoring, external effects, security (stock bond, mutual fund, etc.) fluctuations and other significant events that influence operations or other decisions. Reliability data is defined as temporal based information collected at a sub-component, component, sub-system, system and other possible categorical levels that documents performance levels achieving pre-defined values or ranges. The methods, systems and related data analyzed according to the present invention can relate to, e.g., facilities such as processing or manufacturing plants, industries (e.g., medical, airline, social media, telecommunications, oil and gas, chemicals, hydrocarbon processing, pharmaceutical, biotechnology) securities markets, weather, housing and commercial real estate markets, and any other application where data may be collected for analysis of failure or performance.
Systems and methods disclosed herein may also be applied to non-physical or generalized systems where the concept of maintenance, intervention, control or analogous concepts may or may not apply. Examples of such non-physical systems are stock value fluctuations, economic indicators, and the occurrence of weather events. This type of data is called significant event data and defined as temporal-based data records reflecting either changes in or the achievement of specific pre-set values.
In addition to event identification, another fundamental aspect of reliability refers to the description detail used to record the event characteristics. A classification structure or taxonomy is an extremely valuable aspect of reliability measurement in that it records the reason or reasons why the failure or performance occurred. Standard failure or performance classification systems exist but their use is dependent on regulatory and commercial management directions. Maintaining accurate failure or performance data requires discipline that it can have considerable benefits. The term failure or performance data as used herein not only means data associated with the failure or performance and/or reliability of equipment and machinery, but one skilled in the art will also understand that failure or performance data means data associated with any measure or classification that fails to meet performance, reliability thresholds or other criteria of interest. In other words, when data “fails to meet” it can be because it was lower or higher than a threshold. Moreover, robust failure or performance descriptions detailing the system, subsystems, equipment, financial instrument, insurance product, purchasing criteria, safety criteria, internet search criteria, security criteria, component, and failure or performance and/or reliability mode, for example, can help analysts identify systemic failure or performances and create data-driven programs such as reliability improvement. However, failure or performance and significant event taxonomies and their use vary by company and sometimes by location inside the same company.
In addition to data taxonomy considerations and the temporal recording of reliability and significant events, another data attribute using these characteristics is the value of data elements at time of data recordation. This type of data is referred to as condition monitoring that, for example could be vibration or pressure readings of a pump recorded at specified time, wheel brake pad thicknesses recorded during inspections, reactor vessel thicknesses recorded during unit overhauls, daily stock values, and any other value of interest. The recording times may or may not be at fixed intervals.
The most accurate analysis of equipment and system reliability requires data and expert insights on how to identify systematic patterns in failure or performance data. It is the identification and subsequent analysis of these relatively minor failure or performances that can prevent the large catastrophic events, e.g., resulting in loss, injury, and/or devaluation in equipment, money, value, personnel, systems or other interests. This statement is supported by the root cause analysis of large failure or performance events. The post accident analysis shoes that many accidents are the end results of a sequence of less severe, often seemingly innocuous events that together in tandem enabled or allowed the large failure or performance to occur. This is also seen in the technical analysis of stocks or other financial instruments when key support levels are violated or when companies announce hiring freezes or layoffs causing a ripple effect. It is a common conclusion in these reports that the major failure or performance would not have occurred if any one of the precursor events had been prevented or otherwise had not occurred or occurred at the levels that caused or otherwise resulted in the effect It will be understood by one skilled in the art that the opposite is also true, e.g., when stocks reach new highs then support levels tend to increase.
In this context, analysis of failure or performance/event data, in any taxonomy, represents only a subset of the possible ways failure or performances can be identified. Given any failure or performance classification method and operational system, the failure or performance analysts need a dynamic system and method to look at reliability data from as many perspectives as possible to scan for possible systematic failure or performance sequences that, if continued or allowed to continue, may eventually precipitate a large failure or performance event or an event that suggests or otherwise requires a decision to be made, the latter which will at least be understood in relation to economic or financial performance related data.
Analysts need tools that enable them to look at failure or performance event relationships and reliability changes by the failure or performance mode, component, equipment, subsystem, system and other perspectives in a dynamic fashion. Analysis from these perspectives, based on the given process, equipment, and failure or performance classifications represents a best practice in reliability analysis and measurement.
Analysts tools for using historical events to identify patterns in failure or performance and significant event data relies on a combination of deterministic methods, and statistical tools, and reliability models. For example, the simple plots of the time between failure or performances (or events) as a function of failure or performance number can visually show analysts unique insights showing systematic patterns in failure or performances events identifying failure or performance mechanisms not anticipated by the classification taxonomy. For example, if this plot shows a sinusoidal-like pattern in failure or performance data, further analysis may indicate that the failure or performances mainly occurred within one hour of shift changes. The fix may be either the adoption of new shift transfer procedures, additional staff training on transfer responsibilities, or both. The time between failure or performance plot is the insight mechanism that elucidates the operational/organization inefficiencies and the continued analysis using this plot can show if this resolution measures taken were effective.
Another set of tools that are effective in systemic failure or performance identification are in the field of statistical trend analysis. These tools use the time between failure or performance data and the analysis interval to compute the statistically derived probability that the time between failure or performances (or events) is getting smaller (deterioration trend) or larger (improvement trend.) Both types of trends are easily identified given large data sets, but both types of trend can also be statistically identified with a smaller number of failure or performances. For example consider a situation where there are 5 failure or performances in the early part of the analysis interval and no failure or performances for the remainder of the time. This situation is emblematic of a case where the problem was identified early and fixed—no additional failure or performances. Statistical trend analysis could recognize the lack of failure or performances over the relatively long remainder of the analysis period and compute a high probability of an improvement trend. Conversely, if the same sequence of 5 failure or performances occurred at the end of the analysis period a deterioration trend might be shown. The timing of the failure or performances, not just between successive events but also the position of these failure or performances within the analysis period, is valuable information component to identify event trends.
The trend analysis of failure or performances addressed by this invention is a valuable tool to assess the validity of the data sets within the user-defined time interval and within the user-defined groups to be applied to standard reliability methods such as Weibull Analysis. The primary assumptions that often applied to industrial data are that the failure or performance or event data are “independent” and “identically distributed.” These assumptions are represented in the reliability literature as: IID, however similar assumptions can be made for non-industrial data, e.g., financial and other similar data for which trend analysis of failure or performance criteria may be desired.
Data are independent if there is no association between the data values. In practice however, this assumption can be false. For example, consider this case study: A pump initially failed due to excessive leaking of a seal and was repaired immediately. The next week another seal failed. Seal failure or performances continued to plague the unit. About a month later the motor bearings needed to be replaced. When the bearings were replaced, the alignment of the motor, shaft and coupling were checked and found to be beyond specifications. The unit was realigned, placed back into service and the frequency of seal failure or performances dropped nearly to zero. The apparent cause of the seal and bearing failure or performances was poor alignment. The mis-alignment wore out the bearings and caused excessive vibrations that caused the series of seal failure or performances.
Identically distributed data means the probability distribution from which the “time between failure or performances” are derived is not changing. For failure or performance data where time or some other related variable, such as cycles is used, this means the same probability distribution form can be used to model the failure or performance frequency for the time period under consideration. This assumption implies that the chronological order of the data does not contain any information. In practice the chronological order can contain very important information regarding the future reliability or status of the system.
Consider for example the two systems' failure or performance data in the following table:
Time Between FailuresFailure NumberSystem #1System #21105022040330304402055010Mean Time Between Failure3030Standard Deviation15.815.8
System #1 shows that the time between failure or performances is increasing with failure or performance number or showing a clear improvement trend. System #2 shows a systematic decrease in the time between failure or performances with failure or performance number or exhibits a deterioration trend. This information is obtained from observing the chronological order in which the failure or performances or events occurred. Yet the mean time between failure or performances and standard deviation of the two very different systems are the same. This example illustrated the importance of examining the chronological order of the failure or performance or events that is an important part of this invention.
There are a several situations that in reality would cause failure or performances or events to be related or not identically distributed. There can be complex inter-system relationships caused internal and external factors that are not always identified, understood, or modeled by the analyst. It is this simple fact that makes the testing of the data for trends a prudent initial phase in the analysis of reliability or event data.
The statistical trend analysis components of this invention are developed to test the data as defined by the analyst for the existence of trends or patterns. If no trend is identified for a specific group then the data is validated as best as possible within the user-defined to be HD. The subsequent optimal interval and maintenance decision support analyses are then technically justified. In the practical analysis of failure or performance and event data, these analysis sections are nearly always relevant since there are safety, environmental and financial costs and for doing and not doing inspections. In the practical application of this invention, there is often insufficient data to statistically justify the IID assumptions which makes the statistical analysis of trends, the inspection interval and decision support the analysis of inspection interval optimization is technically justified. This invention provides analysts with practical tools to address these issues.
This invention provides analysts with a dynamic system and method for the trend analysis of value-based data e.g. condition monitoring data and event based data e.g. failure or performance data. The analyst can enter data in simply formatted data files that can be created in spreadsheet and/or exported to this system from other database programs.
The data values are entered using the taxonomy of the system under study and no data definition conversions are required. The condition monitoring data is compiled and only data values that are within a user-specified time interval are entered into the analysis. The analysts can then combine component of trend data elements to observe trends for a combination of components.
The analyst enters two threshold values where the time of the combined data groups achievement of these values is important. The system automatically computes the forecasted times when the group will achieved these values in terms of actual dates. The forecasting methods applied to the user-specified groups are linear, quadratic, and cubic polynomial fits to the group data. Other forecasting techniques could be applied and the methods used in this and other embodiments are representative of the forecasting methodologies that may be applied to the dynamic, user-specified data groupings of value based data.
For event-based data such as reliability or failure or performance data, the same novel dynamic grouping functionality of component IDs into user-specified groups is applied. Statistical trend analysis techniques are applied to the data groups to compute in the most preferred embodiment up to four estimates of the probability of the existence of a trend. For failure or performance or event-based data, a trend is noted as either improvement where the time between failure or performances (events) is statistically increasing or deterioration where the time between failure or performances (events) is statistically decreasing. The user can visually see the group plots of the time between failure or performances (events) superimposed on three other trend identification methods to assist in the decision process.
Four statistical probably tests are also provided to aid the analyst in identified the existence or non-existence of a trend, These tests represent examples of generally accepted methods for trend identification but other trend identification methods and embodiments may be used alone, supplement or replacement those disclosed herein without departing from the breadth and scope of the invention disclosed herein.
For data groups that have been determined where no trend exists, the invention enables the analyst to compute, e.g., optimal inspection, analysis, or decision intervals and compare the risk associated between strategies, e.g., two maintenance intervals. The inspection and maintenance models used in the preferred embodiment are intended to be representative and other models are within the scope of the invention disclosed herein.
The inspection model produces results for four standard models used in reliability and event analysis: Exponential, Normal, Weibull, and Lognormal probability distributions. Optimal inspection (analysis) results are computed using each of these models to provide the analyst with a range of outcomes. This approach is used since the dynamic application of data groupings by the analyst plus the lack of sufficient data may preclude the determination of the technically best model that fits the data. In practice, reliability results expressed in terms of a range are acceptable in many situations.
Graphical plots of optimal inspection curves as a function of inspection interval also provide the analysts with a visual understanding of the sensitivity of the results to test interval changes. The plots can often supply the interval information at the level of detail practically required in most situations.
In a preferred embodiment, the optimal maintenance support decision include relative cost factors for testing, repair, loss of productivity due to failure or performance, and fixed cost. These four number sum to unity. While these factors are used in the preferred embodiment with respect to a manufacturing environment, one skilled in the art will readily understand that different models may be incorporated into the system and method, and other factors may be accordingly employed without departing from the scope and breadth of the invention disclosed and claimed herein.
The general functional structure of this invention is shown in FIG. 1. A data files is accessed and based on its format [100], the software is directed either to the condition-based or failure or performance/event data modules. Discussing the condition-based operations first the user specifies a time interval over which all analysis will be undertaken in [200]. The next module [300] presents the analyst with a listing of all detailed component IDs that have condition-based data with the prescribed time interval. At this point the user then selects the desired grouping of the basic component data into larger groups that will be analyzed going forward as a single, combined dataset. In [400] the user performs data visualization, trend and predictive analyses. In [500] the analyst can combine component ID if desired to be displayed on the same plot as separate variables and output this information if desired. At any time during the analyses done in [400] and [500] the analyst may return to [300] to re-group the components or to [200] to analyze data over a different time interval.
The failure or performance data is filtered based on the time interval entered in [600]. All data values within the prescribed interval are entered into memory and the user is present with a summary listing of all component IDs that are available to analysis. The user then combines component ID data that is to be aggregated into larger analysis groups in [700]. This is a simple, but powerful function to combine failure or performance/event data to study the reliability or event frequency of failure or performance modes, subsystems or systems comprised of many components. At this point the user can select the trend analysis [800], optimal preventive maintenance interval [900], and the maintenance decision support modules [1000]. The trend analysis modules enables the analyst to print the graphical and quantitative results directly. However, the graphics module [1100] is used to show details, e.g., the unavailability, cost (price), and risk curves as a function of inspection interval. At any time the analyst may either return to enter a new time interval [600] or apply new component ID groupings in [700]. The dynamic nature of this invention refers to this seamless ability: the re-selection of new component ID groupings.