Virtual computing instances (VCIs), such as virtual machines (VMs), virtual workloads, data compute nodes, clusters, and containers, among others, have been introduced to lower data center capital investment in facilities and operational expenses and reduce energy consumption. A VCI is a software implementation of a computer that executes application software analogously to a physical computer. VCIs have the advantage of not being bound to physical resources, which allows VCIs to be moved around and scaled to meet changing demands of an enterprise without affecting the use of the enterprise's applications. VCIs can be deployed on a hypervisor provisioned with a pool of computing resources (e.g., processing resources, memory resources, etc.). There are currently a number of different configuration profiles for hypervisors on which VCIs may be deployed.
However, largescale visualized infrastructure may have many (e.g., thousands) of VMs running on many physical machines. High availability requirements provide system administrators with little time to diagnose or bring down parts of infrastructure for maintenance. Fault-tolerant features ensure the virtualized computing infrastructure continues to operate when problems arise, by may generate many intermediate states that have to be reconciled and addressed. As such identifying, debugging, and resolving failures and performance issues for virtualized computing environments can be challenging.
Many software and hardware components generate log messages (log data) via interfaces (e.g., front end applications, storage applications, networking applications and the like) to facilitate technical support and trouble shooting. All these interfaces may be interlinked to each other in more complex way either via peer to peer or via client/server way, by hundreds of sync and async processes, services, daemons interacting with each other rather in serial, forked and/or parallel ways. However, over an entire virtualized computing infrastructure, massive amounts of unstructured log data (giga-bytes of unstructured log data per day) may be generated continuously by every endpoint (e.g., end device, end component and so on) of the virtualized computing infrastructure. As such, storing, finding information, searching, analyzing within the log data that identifies the problems of virtualized computing infrastructure, can be difficult, due to the overwhelming volume of unstructured log data to be analyzed.