The present invention relates to a method, system and computer program product for monitoring data events using multiple calendars.
In many computing systems large amounts of data events are continually monitored and analysed, often for the purpose of determining errors or failures within the system being monitored. For example a telecommunications network such as a 3G or 4G mobile phone network will have very large numbers of components that are continually being monitored by analytical systems. For example there may be millions of subscribers to the networks and many thousands of mobile phone masts which define local cells for connecting to a subscriber's mobile device. Information such as data latency, numbers of subscribers per cell, dropout rates and so on is continually gathered and analysed in order to detect any failures or performance degradation within the overall system. Data events can be something such as a numerical measurement which is being continually monitored for example or could be a discrete event such as a non-working component that has failed, for example.
Data events will often have normal working parameters that are set or generated over time which can be used to determine whether an unusual data event has occurred. For example, in a mobile phone network, connection rates from each cell may be monitored continually with a numerical range set as the normal working parameter for the cell, such as 90% to 98%. When a data event is received that falls outside this range than an alert will be generated that can be considered to be an error state that may need further investigation. So if a data event for a specific cell indicates that the connection rate has fallen to 85%, this can be considered to indicate that the cell is performing sub-optimally, and one or more components within the system are not functioning as they should be.
The problem with such systems is that since the analytical systems monitor a very large number of components, the generation of false positives within the analytical system is a serious drain on resources and can lead to unnecessary responses to otherwise normal working conditions of the components that are being monitored. A change in the operating conditions of one or more components may not be due to any failures within the system being monitored, but if data events are received that appear to be outside the defined normal working conditions, then this will trigger one or more alarms that will then result in actions being taken, that may be unnecessary and will consume resources.