The present invention relates generally to the field of alarm management and more particularly to determining the root cause of one or more alarms within a system.
It is important for many modern systems such as data centers or telecommunications systems to be able to detect and report problems which occur during the course of operation. Due to the size and complexity of large telecommunication systems, many different types of problems can occur. For example, hardware issues such as power failures, overheating, or device failures can result in operational errors such as decreased performance, dropped data packets, or a component on a network becoming unreachable. Generally, an operator is tasked with determining which problems are the result of other problems, and which problems are the root cause which must be addressed.
Modern network equipment and smart infrastructure devices are increasingly designed to generate large numbers of alarms and events which represent problems within the system. As a result, operators can be presented with thousands of alarms an hour which they need to sort into maintenance tickets that can be dealt with by maintenance teams.