With the advent of network virtualization and software-defined networking (SDN), new issues are surfacing in the field of network troubleshooting. The number of physical or virtual nodes and entities has increased by leaps and bounds in a software-defined datacenter (SDDC) network. Virtual switches, virtual routers, edge gateways, distributed firewalls, load-balancers, etc., are just some of the new nodes that add to the complexity of network troubleshooting. As a result, network troubleshooting has become even more painful and cumbersome. Two of the major issues in SDDC network troubleshooting are lack of proper network topology diagrams and lack of a proactive network health monitoring system.
Traditional networking provides a basic topology diagram to provide an idea of which virtual machine (VM) port or kernel port is located on which host. The basic topology may also indicate the virtual switch/virtual distributed switches and physical network interface card (PNIC) that these ports are connected to. However, in a SDN there are a lot more entities coming into the picture and understanding how these entities are tied together is a key to network troubleshooting.
In a majority of SDDC network issues escalated by customers, there is need for a proper topology diagram for efficient troubleshooting. Currently, there are no tools for generating the topology for SDDC networks. The collection of topology information from customers is done purely manually, which opens up a lot of opportunities for introducing errors.
There is a lot of back and forth with customers to get the customers' network topology information or diagrams. Even after putting in so much time and effort there is always some missing topology information. Sometimes, the information shared by the customer is incorrect. Even after collecting all the information from the customer, it is possible that the SDDC engineers may interpret the information incorrectly as there is no standard way to represent the topology. Some customers don't even have a topology diagram to share with the SDDC support personnel. The support personnel have to look at their environment and manually build it on their own environment. The user may also misconfigure the topology. It is possible that users might sometime connect VMs/logical switches/distributed logical routers incorrectly since they do not have a way to visualize the topology.
The errors introduced by manual efforts leads to missing pieces of critical topology data. This wastes a lot of engineering time and resources in the attempts to get proper topology information form customers. The errors introduced by manual efforts also lead to misdiagnoses of network problems reported by the customers.
Another major issue in SDDC network troubleshooting is the lack of proactive network health monitoring system. The support personnel come to the realization that a network issue has occurred only after the damage is done. There are no tools to proactively monitor the health of the network overlay links. For instance, parameters such as reachability of packets between any two VMs running on a network platform, latency variation of logical links between any two VMs, and maximum transmission unit (MTU) variation of logical links between any two VMs cannot be currently monitored.