Exemplary embodiments relate generally to the field of telecommunications networks, and more specifically, to identifying silent failures in telecommunications networks and diagnosing troubles that caused the silent failures.
A telecommunications network generally includes multiple network elements, such as switches and routers, functionally coupled via a suitable communications network. The network elements are typically manufactured with alarms to indicate that a portion of the network element has failed. For example, routers commonly include alarms for detecting port failures and card failures. These alarms enable maintenance personnel and/or automated maintenance systems to easily determine the source of a failure and to efficiently resolve the failure.
Alarms are generally limited to identifying those failures that the manufacturer chooses. In many cases, alarms are only included for fatal errors that result in the complete failure of a network element. Any failures at the network elements that do not result in an alarm are commonly referred to as “silent failures.” Silent failures can result in a number of problems that adversely affects customer traffic, such as packet loss or a reduction of two-way traffic into one-way traffic. Since silent failures by definition do not generate alarms, silent failures are conventionally detected by customers who manually monitor their own network performance. This is especially problematic during off-hours when the customer may not be actively monitoring network performance. For example, a silent failure may occur at a business on late Friday afternoon and not be discovered by the customer until Monday morning, thereby allowing the network problems to endure through the entire weekend at the business's detriment.
When a customer detects a decrease in network performance (e.g., a reduction in data transmission rates), the customer typically contacts its corresponding service provider. The service provider may then manually deploy personnel to perform a variety of diagnostic tests in order to discover the cause of the decrease in network performance. In many cases, until these tests are completed, the service provider is unaware whether the decrease in network performance is caused by a silent failure (i.e., a failure at the service provider's network elements) or by actions on the customer's side. Performing these tests are generally time consuming and can lead to significant downtime for the customer.