The present embodiments relate to automatic prediction of failures in a distributed system comprising a plurality of complex machines.
Complex technical apparatuses and systems (e.g., imaging systems in radiology or other medical domains) are monitored with respect to possible failures, deficiencies, breakdowns or malfunctions. These medical systems are usually monitored in the context of an event management process, which generally aims at detecting failures as early as possible in order to avoid future failures. Normally, well-defined events (e.g., failure notifications) will be sent to a service operator and will be analyzed by the same.
However, operators are often overwhelmed by their tasks to recover the system failure under time pressure. Therefore, it is necessary to detect failures as soon as and as definite as possible and to avoid wrong or incomplete failure notifications.
In state of the art systems, failure patterns for a specific, particular device are detected. Known service management systems such as, for example, “HP OpenView” and “ECS Designer” include functionalities in order to detect complex failure states. These systems typically use different statistical approaches, like regression and/or classification procedures or specific data mining procedures (e.g., expectation maximization, survival analysis for the prediction of failures). However, these systems mainly refer to offline-analysis-systems or to systems, which are operated separately and are not integrated in the context of an existing IT-architecture for the respective machines to be monitored.