The Evolved Packet Core (EPC) mobile network, as defined by 3GPP, consists of multiple functional elements like S-GW (Serving Gateway) 11, P-GW (PDN-GW, Packet Data Network Gateway) 12, MME (Mobility Management Entity) 13 and HSS (Home Subscriber Server) 14, as shown in FIG. 1. The EPC is connected via the S-GW 11 to eNBs (eNodeBs) 15 which in turn are connected to mobile terminals 16. Further, the EPC is connected via the P-GW 12 to external networks 17, like e.g. the internet (IP network), service networks or the like.
Currently, each element collects own troubleshooting and diagnostics data as part of normal runtime operation. When a fault happens, the operator is instructed to collect so-called standard symptom data and send that data to the manufacturer/vendor of the elements for more detailed analysis. There are also tools developed in-house by the manufacturer/vendor for abstracting the real interfaces and commands used for the data collection per network element, exposing a more unified interface towards the operator. The actual data contained in the collected package is often very detailed and specific per network element software and hardware. This is made possible since the application has access to all the necessary information about the runtime environment, to the level of hardware registers and embedded software (for information like versions, packet counters, etc).
For the post-mortem troubleshooting to be effective, the data must be collected as close to the actual fault event as possible, in order to make sure that no data has been overwritten or lost since the fault.
In the legacy network deployments, the network elements have been quite standalone boxes, having management (north bound) interfaces via (de-facto) standard interfaces towards NMS (Network Management System).
In the cloud deployments and the emerging ETSI NFV (European Telecommunications Standards Institute Network Function Virtualization) framework which is commonly used as the reference, the application access to the information about runtime infrastructure will be limited and there are more possibilities offered by the management domain for harmonized fault management.
FIG. 2 is a block diagram illustrating the ETSI NFV architectural framework as defined in document [2].
Network Functions Virtualization envisages the implementation of NFs as software-only entities that run over the NFV Infrastructure (NFVI). FIG. 2 illustrates the high-level NFV framework. As such, three main working domains are identified in NFV:                Virtualized Network Function 21, as the software implementation of a network function which is capable of running over the NFVI.        NFV Infrastructure (NFVI) 22, including the diversity of physical resources and how these can be virtualized. NFVI supports the execution of the VNFs.        NFV Management and Orchestration 23, which covers the orchestration and lifecycle management of physical and/or software resources that support the infrastructure virtualization, and the lifecycle management of VNFs. NFV Management and Orchestration focuses on all virtualization-specific management tasks necessary in the NFV framework.        
In this NFV context, it can be seen that new elements—VNF Manager 24 and NFV Orchestrator 25—have emerged along with existing EMS (Element Management System) 26 and OSS/BSS (Operating Support System/Business Support System) 27 solutions. These offer also possibilities for better capabilities for post-mortem analysis. Currently, the VNFM 24 (Virtual Network Function Manager) (e.g. Nokia CAM (Cloud Application Manager)) is managing only same vendors VNFs, but this can be seen as something that will need to change—already now there are real use cases where the VNFM 24 would need to manage at least partially 3rd party components integrated with, for example, Nokia solution.
High availability (resiliency (which is the ability of the NFV framework to limit disruption and return to normal or at a minimum acceptable service delivery level in the face of a fault, failure, or an event that disrupts the normal operation) and failure tolerance/recovery) are mandatory in the telecom environment, and moreover, with emergence of Voice over LTE (VoLTE), the requirements towards recovery times are very strict (at least <1 s, the shorter the better). The resiliency needed in the network elements consists usually of multiple layers. The VNFM 24/VIM (Virtualized Infrastructure Manager) 28 monitor the physical resources and virtual machines running on top of the infrastructure, and the VNF itself monitors the application status with some specialized built-in service. The latter part is essential for stateful 2N or N+M active-standby replication solutions, which are needed in order to reach the required recovery times without losing sessions in the operation. Having multiple layers acting on their own for specific sets of faults on the other hand makes it more complicated to be able to automate symptom data collection.
The main problem in the cloud deployment compared to legacy vendor-specific deployments is that it can no longer be assumed that the application is able to collect all the information regarding the runtime infrastructure for the post-mortem analysis. Information about host layer (hardware and software including hypervisor, host operating system and virtual networking) is vital for understanding many faults which are visible for the operator in the actual application layer (VNF).
Especially, in elements like gateways handling the user plane traffic and which are therefore very sensitive to throughput and latency/jitter, the host layer has potentially huge effect on the performance of the application. This data must be available for later analysis, and it must be possible to correlate this information with the application specific data.
The ETSI NFV Management and Orchestration (MANO) does not currently cover troubleshooting as it seems to have been thought to be a matter of implementation rather than a matter of specification.
In this invention, it is anyway proposed that the centralized and automatic data collection mechanism should be working also in the multi-vendor environment even if the data content remains network element specific.