Administrating a software defined data center (SDDC) requires increased visibility into the operations of individual virtual machines. Log analysis has become a critical tool in the monitoring of datacenter resources such as servers and applications. System logs are an important source of such information, but often do not contain enough identifying information. For instance, clones of virtual machines running on the same host output system log information that contains the exact same machine name identifiers, making it impossible to distinguish which virtual machines the logs are coming from. Other important information in the context of an SDDC is also missing from system log messages. Additionally, managing and configuring the logging preferences of individual virtual machines in a large SDDC can be a difficult task.
Typically logs generated by servers and applications are sent to a central logging server, which has been pre-configured to analyze logs received from these servers and applications. The context encoded in the logs such as facility, level, server application name and identification, instance, etc., which play a significant part in analyzing and classifying these logs, are typically attached by the application or the server that generates the log. The logs are then sent over a transport protocol such as UDP which is lossy, or TCP which is fairly resource consuming but loss less, to the pre-configured log server. As the datacenter operation deploys multiple applications and servers, these applications and servers could have varying configuration and logging context metadata, and be programmed with different log server destinations.
From a datacenter's operation point of view, these logs have only the context that the application or the server has encoded. There is no other information attached to the logs, apart from the network address on the log packets that the datacenter operator can correlate. In the event of a misconfiguration on the application or the server, all the logs from that application or server could be lost.
The existing log services correlate data only after the logs are received by the log servers in the raw format as sent by the application or server that generated the logs. There is currently no mechanism to embellish the log data from the source itself in an agentless form so as to provide more context to the log server. The existing log services focus more on processing a large amount of data using large data infrastructure and applying load balancers to consume the received data, rather than making the data itself more context rich so as to avoid a lot of the post processing for data correlation.